Timezone: »
We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch.
Author Information
Markus Wulfmeier (DeepMind)
Dushyant Rao (DeepMind)
Roland Hafner (DeepMind)
Thomas Lampe (DeepMind)
Abbas Abdolmaleki (DeepMind)
Tim Hertweck (DeepMind)
Michael Neunert (Google DeepMind)
Dhruva Tirumala Bukkapatnam (DeepMind)
Noah Siegel (DeepMind)
Nicolas Heess (DeepMind)
Martin Riedmiller (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Data-efficient Hindsight Off-policy Option Learning »
Wed. Jul 21st 04:00 -- 06:00 PM Room
More from the Same Authors
-
2021 : Is Bang-Bang Control All You Need? »
Tim Seyde · Igor Gilitschenski · Wilko Schwarting · Bartolomeo Stellato · Martin Riedmiller · Markus Wulfmeier · Daniela Rus -
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · AdriĆ Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · AdriĆ Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Poster: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2022 Spotlight: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2021 : RL + Robotics Panel »
George Konidaris · Jan Peters · Martin Riedmiller · Angela Schoellig · Rose Yu · Rupam Mahmood -
2021 Poster: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Spotlight: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2020 : QA for invited talk 6 Heess »
Nicolas Heess -
2020 : Invited talk 6 Heess »
Nicolas Heess -
2020 Poster: CoMic: Complementary Task Learning & Mimicry for Reusable Skills »
Leonard Hasenclever · Fabio Pardo · Raia Hadsell · Nicolas Heess · Josh Merel -
2020 Poster: Stabilizing Transformers for Reinforcement Learning »
Emilio Parisotto · Francis Song · Jack Rae · Razvan Pascanu · Caglar Gulcehre · Siddhant Jayakumar · Max Jaderberg · Raphael Lopez Kaufman · Aidan Clark · Seb Noury · Matthew Botvinick · Nicolas Heess · Raia Hadsell -
2020 Poster: A distributional view on multi-objective policy optimization »
Abbas Abdolmaleki · Sandy Huang · Leonard Hasenclever · Michael Neunert · Francis Song · Martina Zambelli · Murilo Martins · Nicolas Heess · Raia Hadsell · Martin Riedmiller -
2019 : Nicolas Heess: TBD »
Nicolas Heess -
2019 Poster: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2019 Oral: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2018 Poster: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2018 Oral: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2018 Poster: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2018 Poster: Graph Networks as Learnable Physics Engines for Inference and Control »
Alvaro Sanchez-Gonzalez · Nicolas Heess · Jost Springenberg · Josh Merel · Martin Riedmiller · Raia Hadsell · Peter Battaglia -
2018 Poster: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2018 Oral: Graph Networks as Learnable Physics Engines for Inference and Control »
Alvaro Sanchez-Gonzalez · Nicolas Heess · Jost Springenberg · Josh Merel · Martin Riedmiller · Raia Hadsell · Peter Battaglia -
2018 Oral: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2017 Poster: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Talk: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu