Timezone: »

DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare

Tue Jun 11 03:05 PM -- 03:10 PM (PDT) @ Room 104

Many reinforcement learning tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \textit{DeepMDP}, a Markov Decision Process (MDP) parameterized by neural networks that is able to recover these representations. We mathematically develop several desirable notions of similarity between the original MDP and the DeepMDP based on two main objectives: (1) modeling the dynamics of an MDP, and (2) learning a useful abstract representation of the states of an MDP. While the motivation for each of these notions is distinct, we find that they are intimately related. Specifically, we derive tractable training objectives of the DeepMDP components which simultaneously and provably encourage \textit{all} notions of similarity. We validate our theoretical findings by showing that we are able to learn DeepMDPs and recover the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari domain leads to large performance improvements.

Author Information

Carles Gelada (Google Brain)
Saurabh Kumar (Google Brain)
Jacob Buckman (Johns Hopkins University)
Ofir Nachum (Google Brain)
Marc Bellemare (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors