Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

On Overconservatism in Offline Reinforcement Learning

Karush Suri · Florian Shkurti


Prior successes in offline learning are highlighted by its adaptability to novel scenarios. One of the key reasons behind this aspect is conservatism, the act of underestimating an agent’s expected value estimates. Recent work, on the other hand, has noted that overconservatism often cripples learning of meaningful behaviors. To that end, the paper asks the question when does overconservatism hurt offline learning? The proposed answer understands conservatism in light of conjugate space and empirical instabilities. In the case of former, agents implicitly aim at learning complex high entropic distributions. As for the latter, overconservatism arises as a consequence of provably inaccurate approximations. Based on theoretical evidence, we address overconservatism through the lens of dynamic control. A feedback controller tunes the learned value estimates by virtue of direct dynamics in the compact latent space. We validate our theoretical insights in an empirical study of aerial control tasks.

Chat is not available.