Timezone: »
While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.
Author Information
Hao Liu (Salesforce Research, UC Berkeley)
Richard Socher (Salesforce)
Caiming Xiong (Salesforce)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Taming MAML: Efficient unbiased meta-reinforcement learning »
Wed. Jun 12th 07:05 -- 07:10 PM Room Hall B
More from the Same Authors
-
2021 : Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2022 Poster: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation »
Junnan Li · DONGXU LI · Caiming Xiong · Steven Hoi -
2022 Spotlight: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation »
Junnan Li · DONGXU LI · Caiming Xiong · Steven Hoi -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Poster: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Don’t Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Spotlight: Don’t Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2020 Poster: Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills »
Victor Campos · Alexander Trott · Caiming Xiong · Richard Socher · Xavier Giro-i-Nieto · Jordi Torres -
2019 Poster: Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting »
Xilai Li · Yingbo Zhou · Tianfu Wu · Richard Socher · Caiming Xiong -
2019 Poster: On the Generalization Gap in Reparameterizable Reinforcement Learning »
Huan Wang · Stephan Zheng · Caiming Xiong · Richard Socher -
2019 Oral: Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting »
Xilai Li · Yingbo Zhou · Tianfu Wu · Richard Socher · Caiming Xiong -
2019 Oral: On the Generalization Gap in Reparameterizable Reinforcement Learning »
Huan Wang · Stephan Zheng · Caiming Xiong · Richard Socher