Timezone: »

 
Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments
Pietro Maldini · Mirco Mutti · Riccardo De Santi · Marcello Restelli
Event URL: https://openreview.net/forum?id=mKQMuWGvxe7 »

In recent years, the area of Unsupervised Reinforcement Learning (URL) has gained particular relevance as a way to foster generalization of reinforcement learning agents. In this setting, the agent's policy is first pre-trained in an unknown environment via reward-free interactions, often through a pure exploration objective that drives the agent towards a uniform coverage of the state space. It has been shown that this pre-training leads to improved efficiency in downstream supervised tasks later given to the agent to solve. When dealing with the unsupervised pre-training in multiple environments one should also account for potential trade-offs in the exploration performance within the set of environments, which leads to the following question: Can we pre-train a policy that is simultaneously optimal in all the environments? In this work, we address this question by proposing a novel non-Markovian policy architecture to be pre-trained with the common maximum state entropy objective. This architecture showcases significant empirical advantages when compared to state-of-the-art Markovian agents for URL.

Author Information

Pietro Maldini (Polytechnic Institute of Milan)
Mirco Mutti (Politecnico di Milano, Università di Bologna)
Riccardo De Santi (ETH Zurich)
Marcello Restelli (Politecnico di Milano)

More from the Same Authors