Timezone: »
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal. We formalize the problem as a Constrained Markov Decision Process (CMDP) where the goal is to find policies that maximize diversity, characterized by an intrinsic diversity reward, while remaining near-optimal with respect to the extrinsic reward of the MDP. We also analyze how recently proposed robustness and discrimination rewards perform and find that they are sensitive to the initialization of the procedure and may converge to sub-optimal solutions. To alleviate this, we propose new explicit diversity rewards that aim to minimize the correlation between the Successor Features of the policies in the set. We compare the different diversity mechanisms in the DeepMind Control Suite and find that the type of explicit diversity we are proposing is important to discover distinct behavior, like for example different locomotion patterns.
Author Information
Tom Zahavy (DeepMind)
Brendan O'Donoghue (DeepMind)
Andre Barreto (DeepMind)
Sebastian Flennerhag (University of Manchester)
Vlad Mnih (Google Deepmind)
Satinder Singh (DeepMind)
More from the Same Authors
-
2021 : Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning »
Víctor Campos · Pablo Sprechmann · Steven Hansen · Andre Barreto · Steven Kapturowski · Alex Vitvitskyi · Adrià Puigdomenech Badia · Charles Blundell -
2021 : Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2023 : Structured State Space Models for In-Context Reinforcement Learning »
Christopher Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 Poster: Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization »
Brendan O'Donoghue -
2023 Oral: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2023 Poster: ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs »
Ted Moskovitz · Brendan O'Donoghue · Vivek Veeriah · Sebastian Flennerhag · Satinder Singh · Tom Zahavy -
2023 Poster: Human-Timescale Adaptation in an Open-Ended Task Space »
Jakob Bauer · Kate Baumli · Feryal Behbahani · Avishkar Bhoopchand · Natalie Bradley-Schmieg · Michael Chang · Natalie Clay · Adrian Collister · Vibhavari Dasagi · Lucy Gonzalez · Karol Gregor · Edward Hughes · Sheleem Kashem · Maria Loks-Thompson · Hannah Openshaw · Jack Parker-Holder · Shreya Pathak · Nicolas Perez-Nieves · Nemanja Rakicevic · Tim Rocktäschel · Yannick Schroecker · Satinder Singh · Jakub Sygnowski · Karl Tuyls · Sarah York · Alexander Zacherl · Lei Zhang -
2022 Workshop: Decision Awareness in Reinforcement Learning »
Evgenii Nikishin · Pierluca D'Oro · Doina Precup · Andre Barreto · Amir-massoud Farahmand · Pierre-Luc Bacon -
2022 Poster: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Oral: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2021 Poster: Online Limited Memory Neural-Linear Bandits with Likelihood Matching »
Ofir Nabati · Tom Zahavy · Shie Mannor -
2021 Spotlight: Online Limited Memory Neural-Linear Bandits with Likelihood Matching »
Ofir Nabati · Tom Zahavy · Shie Mannor -
2021 Poster: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2021 Spotlight: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2020 Poster: What Can Learned Intrinsic Rewards Capture? »
Zeyu Zheng · Junhyuk Oh · Matteo Hessel · Zhongwen Xu · Manuel Kroiss · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2019 Oral: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2018 Poster: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Oral: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Poster: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Oral: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2017 Poster: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris -
2017 Talk: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris