Timezone: »

Exploration-Driven Representation Learning in Reinforcement Learning
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric

Learning reward-agnostic representations is an emerging paradigm in reinforcement learning. These representations can be leveraged for several purposes ranging from reward shaping to skill discovery. Nevertheless, in order to learn such representations, existing methods often rely on assuming uniform access to the state space. With such a privilege, the agent’s coverage of the environment can be limited which hurts the quality of the learned representations. In this work, we introduce a method that explicitly couples representation learning with exploration when the agent is not provided with a uniform prior over the state space. Our method learns representations that constantly drive exploration while the data generated by the agent’s exploratory behavior drives the learning of better representations. We empirically validate our approach in goal-achieving tasks, demonstrating that the learned representation captures the dynamics of the environment, leads to more accurate value estimation, and to faster credit assignment, both when used for control and for reward shaping. Finally, the exploratory policy that emerges from our approach proves to be successful at continuous navigation tasks with sparse rewards.

Author Information

Akram Erraqabi (University of Montreal)
Mingde Zhao (Mila & McGill University)

I am a first-year PhD student at Mila and McGill university, with focus on reinforcement learning and meta-learning.

Marlos C. Machado (DeepMind)
Yoshua Bengio (Mila - Quebec AI Institute)
Sainbayar Sukhbaatar (Facebook)
Ludovic Denoyer (Criteo)
Alessandro Lazaric (Facebook AI Research)

More from the Same Authors