Timezone: »

DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare

Tue Jun 11 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #108

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

Author Information

Carles Gelada (Google Brain)
Saurabh Kumar (Google Brain)
Jacob Buckman (Johns Hopkins University)
Ofir Nachum (Google Brain)
Marc Bellemare (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors