Timezone: »
To rapidly learn a new task, it is often essential for agents to explore efficiently - especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent's task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.
Author Information
Luisa Zintgraf (University of Oxford)
Leo Feng (Mila)
Cong Lu (University of Oxford)
Maximilian Igl (University of Oxford)
Kristian Hartikainen (UC Berkeley)
Katja Hofmann (Microsoft)
Shimon Whiteson (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Fri. Jul 23rd 12:40 -- 12:45 AM Room
More from the Same Authors
-
2021 : Revisiting Design Choices in Offline Model Based Reinforcement Learning »
Cong Lu · Philip Ball · Jack Parker-Holder · Michael A Osborne · Stephen Roberts -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Interactively Learning Preference Constraints in Linear Bandits »
David Lindner · Sebastian Tschiatschek · Katja Hofmann · Andreas Krause -
2022 Spotlight: Interactively Learning Preference Constraints in Linear Bandits »
David Lindner · Sebastian Tschiatschek · Katja Hofmann · Andreas Krause -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2021 : Towards Human-like and Collaborative AI in Video Games »
Katja Hofmann -
2021 Poster: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Poster: Breaking the Deadly Triad with a Target Network »
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson -
2021 Poster: TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL »
Clément Romac · Rémy Portelas · Katja Hofmann · Pierre-Yves Oudeyer -
2021 Spotlight: TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL »
Clément Romac · Rémy Portelas · Katja Hofmann · Pierre-Yves Oudeyer -
2021 Poster: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2021 Poster: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Poster: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Spotlight: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2021 Spotlight: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Oral: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation »
Sam Devlin · Raluca Georgescu · Ida Momennejad · Jaroslaw Rzepecki · Evelyn Zuniga · Gavin Costello · Guy Leroy · Ali Shaw · Katja Hofmann -
2021 Poster: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation »
Sam Devlin · Raluca Georgescu · Ida Momennejad · Jaroslaw Rzepecki · Evelyn Zuniga · Gavin Costello · Guy Leroy · Ali Shaw · Katja Hofmann -
2021 Spotlight: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2020 : Panel discussion »
Kavya Srinet · Katja Hofmann · Yoav Artzi · Alex Kearney · Julia Hockenmaier -
2020 : Open-ended environments for advancing RL Q&A »
Max Jaderberg · Katja Hofmann -
2020 : The NetHack Learning Environment Q&A »
Tim Rocktäschel · Katja Hofmann -
2020 Workshop: Workshop on Learning in Artificial Open Worlds »
Arthur Szlam · Katja Hofmann · Ruslan Salakhutdinov · Noboru Kuno · William Guss · Kavya Srinet · Brandon Houghton -
2020 : Opening remarks »
Katja Hofmann -
2020 : Q&A with Katja Hoffman »
Katja Hofmann · Luisa Zintgraf · Rika Antonova · Sarath Chandar · Shagun Sodhani -
2020 : Challenges & Opportunities in Lifelong Reinforcement Learning by Katja Hoffman »
Katja Hofmann · Rika Antonova · Luisa Zintgraf -
2020 Poster: Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation »
Shangtong Zhang · Bo Liu · Hengshuai Yao · Shimon Whiteson -
2020 Poster: Deep Coordination Graphs »
Wendelin Boehmer · Vitaly Kurin · Shimon Whiteson -
2020 Poster: GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values »
Shangtong Zhang · Bo Liu · Shimon Whiteson -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Poster: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Poster: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2019 Oral: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2018 Poster: Fourier Policy Gradients »
Matthew Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Oral: Fourier Policy Gradients »
Matthew Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Poster: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Poster: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Poster: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh -
2018 Poster: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: Tighter Variational Bounds are Not Necessarily Better »
Tom Rainforth · Adam Kosiorek · Tuan Anh Le · Chris Maddison · Maximilian Igl · Frank Wood · Yee-Whye Teh -
2018 Oral: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Oral: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 : Panel Discussion »
Balaraman Ravindran · Chelsea Finn · Alessandro Lazaric · Katja Hofmann · Marc Bellemare -
2017 Poster: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson