Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments

Pietro Maldini · Mirco Mutti · Riccardo De Santi · Marcello Restelli


Abstract:

In recent years, the area of Unsupervised Reinforcement Learning (URL) has gained particular relevance as a way to foster generalization of reinforcement learning agents. In this setting, the agent's policy is first pre-trained in an unknown environment via reward-free interactions, often through a pure exploration objective that drives the agent towards a uniform coverage of the state space. It has been shown that this pre-training leads to improved efficiency in downstream supervised tasks later given to the agent to solve. When dealing with the unsupervised pre-training in multiple environments one should also account for potential trade-offs in the exploration performance within the set of environments, which leads to the following question: Can we pre-train a policy that is simultaneously optimal in all the environments? In this work, we address this question by proposing a novel non-Markovian policy architecture to be pre-trained with the common maximum state entropy objective. This architecture showcases significant empirical advantages when compared to state-of-the-art Markovian agents for URL.

Chat is not available.