Skip to yearly menu bar Skip to main content


Poster

Stop Regressing: The Unreasonable Effectiveness of Classification in Deep Reinforcement Learning

Jesse Farebrother · Jordi Orbay · Quan Vuong · Adrien Ali Taiga · Yevgen Chebotar · Ted Xiao · Alexander Irpan · Aleksandra Faust · Pablo Samuel Castro · Sergey Levine · Aviral Kumar · Rishabh Agarwal


Abstract:

Deep reinforcement learning (RL) heavily relies on value functions parameterized by neural networks. These value networks are typically trained using a mean squared error regression loss to match target values computed using a previous snapshot of this network. However, scaling these regression-based methods to large networks, such as high-capacity Transformers, has proven challenging. In contrast, supervised deep learning has seen tremendous success by leveraging cross-entropy classification losses, known for their reliable training even for massive networks. Motivated by this discrepancy, we investigate whether value-based RL can also be improved simply by using a cross-entropy classification loss in place of regression. We explore several approaches for framing value-based RL as a classification problem and demonstrate that cross-entropy losses significantly improve the performance and scalability of both offline and online RL, across single-task and multi-task settings, on Atari 2600 games, robotic manipulation, and language agent problems. Our analysis suggests that these gains arise from classification mitigating several issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, the simple change of using a cross-entropy loss yields substantial scalability improvements in deep RL.

Live content is unavailable. Log in and register to view live content