Timezone: »

Taming MAML: Efficient unbiased meta-reinforcement learning
Hao Liu · Richard Socher · Caiming Xiong

Wed Jun 12 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #38

While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.

Author Information

Hao Liu (Salesforce Research, UC Berkeley)
Richard Socher (Salesforce)
Caiming Xiong (Salesforce)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors