Timezone: »

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Edoardo Cetin · Philip Ball · Stephen Roberts · Oya Celiktutan

Tue Jul 19 03:30 PM -- 05:30 PM (PDT) @ Hall E #832

Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. We show that this new visual deadly triad causes unstable training and premature convergence to degenerate solutions, a phenomenon we name catastrophic self-overfitting. Based on our analysis, we propose A-LIX, a method providing adaptive regularization to the encoder's gradients that explicitly prevents the occurrence of catastrophic self-overfitting using a dual objective. By applying A-LIX, we significantly outperform the prior state-of-the-art on the DeepMind Control and Atari benchmarks without any data augmentation or auxiliary losses.

Author Information

Edoardo Cetin (King's College London)

Ph.D. student at King’s College London. My goal is to advance the sample-efficiency, stability, and practicality of deep reinforcement learning for enabling autonomous agents to solve meaningful real-world problems. I strive to make contributions that can have a long-term impact, exploring the very fundamental aspects of modern deep learning.

Philip Ball (University of Oxford)
Stephen Roberts (University of Oxford)
Oya Celiktutan (King's College London)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors