Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ICML 2021 Workshop on Unsupervised Reinforcement Learning

When Does Overconservatism Hurt Offline Learning?

Karush Suri · Florian Shkurti


Abstract:

Prior successes in offline learning are highlighted by its adaptability to novel scenarios. One of the key reasons behind this aspect is conservatism, the act of underestimating an agent’s expected value estimates. Recent work, on the other hand, has noted that overconservatism often cripples learning of meaningful behaviors. To that end, the paper asks the question when does overconservatism hurt offline learning? The proposed answer understands conservatism in light of conjugate space and empirical instabilities. In the case of former, agents implicitly aim at learning complex high entropic distributions. As for the latter, overconservatism arises as a consequence of provably inaccurate approximations. Based on theoretical evidence, we address overconservatism through the lens of dynamic control. A feedback controller tunes the learned value estimates by virtue of direct dynamics in the compact latent space. In an empirical study of aerial control tasks on the CF2X quadcopter, we validate our theoretical insights and demonstrate efficacious transfer of offline policies to novel scenarios.

Chat is not available.