Timezone: »

Recursive History Representations for Unsupervised Reinforcement Learning in Multiple-Environments
Mirco Mutti · Pietro Maldini · Riccardo De Santi · Marcello Restelli
Event URL: https://openreview.net/forum?id=UWXydcMsV2 »

In recent years, the area of Unsupervised Reinforcement Learning (URL) has gained particular relevance. In this setting, an agent is pre-trained in an environment with reward-free interactions, often through a maximum state entropy objective that drives the agent towards a uniform coverage of the state space. It has been shown that this pre-training phase leads to significant performance improvements in downstream tasks later given to the agent to solve. The multiple-environments version of this setting introduces the problem of controlling the performance trade-offs in the environment class and leads to the following question: Can we build Pareto optimal policies for multiple-environments URL? In this work, we answer this question by proposing a novel non-Markovian policy architecture to be trained with the maximum state entropy objective. This architecture showcases significant empirical advantages when compared to state-of-the-art Markovian agents.

Author Information

Mirco Mutti (Politecnico di Milano, Università di Bologna)
Pietro Maldini (Polytechnic Institute of Milan)
Riccardo De Santi (ETH Zurich)
Marcello Restelli (Politecnico di Milano)

More from the Same Authors