Timezone: »
Spotlight
Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning
Yunhao Tang
Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance $\Theta(N)$ which linearly depends on the sample size $N$ of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias $\mathcal{O}(1/\sqrt{N})$ and variance $\mathcal{O}(1/N)$; (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.
Author Information
Yunhao Tang (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning »
Wed. Jul 20th through Thu the 21st Room Hall E #829
More from the Same Authors
-
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Towards a better understanding of representation dynamics under TD-learning »
Yunhao Tang · Remi Munos -
2023 Poster: Fast Rates for Maximum Entropy Exploration »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Yunhao Tang · Michal Valko · Pierre Menard -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation »
Mark Rowland · Yunhao Tang · Clare Lyle · Remi Munos · Marc Bellemare · Will Dabney -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Anna Harutyunyan · Remi Munos · Bernardo Avila Pires · Michal Valko -
2023 Poster: The Edge of Orthogonality: A Simple View of What Makes BYOL Tick »
Pierre Richemond · Allison Tam · Yunhao Tang · Florian Strub · Bilal Piot · Feilx Hill -
2023 Poster: VA-learning as a more efficient alternative to Q-learning »
Yunhao Tang · Remi Munos · Mark Rowland · Michal Valko -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 Poster: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2022 Oral: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Reinforcement Learning for Integer Programming: Learning to Cut »
Yunhao Tang · Shipra Agrawal · Yuri Faenza -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov