Timezone: »
Poster
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an $\varepsilon$-optimal policy with probability $1-\delta$ under the settings of an infinite-horizon linear MDP, generative model, and G-optimal design. We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. Furthermore, we propose a practical VWLS algorithm for value-based deep RL, Deep Variance Weighting (DVW). Our experiments demonstrate that DVW improves the performance of popular value-based deep RL algorithms on a set of MinAtar benchmarks.
Author Information
Toshinori Kitamura (The University of Tokyo)
Tadashi Kozuno (Omron Sinic X)
Yunhao Tang (Google DeepMind)
Nino Vieillard (Google Brain)
Michal Valko (Google DeepMind / Inria / MVA)
Wenhao Yang (Peking University)
Jincheng Mei (Google DeepMind)
Pierre Menard (ENS Lyon)
Mohammad Gheshlaghi Azar (Google DeepMind)
Remi Munos (DeepMind)
Olivier Pietquin (Google DeepMind)
Matthieu Geist (Google)
Csaba Szepesvari (DeepMind/University of Alberta)
Wataru Kumagai (The University of Tokyo)
Yutaka Matsuo (University of Tokyo)
More from the Same Authors
-
2021 : Finding the Near Optimal Policy via Reductive Regularization in MDPs »
Wenhao Yang · Xiang Li · Guangzeng Xie · Zhihua Zhang -
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 : A functional mirror ascent view of policy gradient methods with function approximation »
Sharan Vaswani · Olivier Bachem · Simone Totaro · Matthieu Geist · Marlos C. Machado · Pablo Samuel Castro · Nicolas Le Roux -
2021 : Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning »
Omar Darwiche Domingues · Corentin Tallec · Remi Munos · Michal Valko -
2021 : Offline Reinforcement Learning as Anti-Exploration »
Shideh Rezaeifar · Robert Dadashi · Nino Vieillard · Léonard Hussenot · Olivier Bachem · Olivier Pietquin · Matthieu Geist -
2023 Poster: Stochastic Gradient Succeeds for Bandits »
Jincheng Mei · Zixin Zhong · Bo Dai · Alekh Agarwal · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: A Connection between One-Step RL and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Sergey Levine · Ruslan Salakhutdinov -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Half-Hop: A graph upsampling approach for slowing down message passing »
Mehdi Azabou · Venkataramana Ganesh · Shantanu Thakoor · Chi-Heng Lin · Lakshmi Sathidevi · Ran Liu · Michal Valko · Petar Veličković · Eva Dyer -
2023 Poster: Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments »
Daniel Jarrett · Corentin Tallec · Florent Altché · Thomas Mesnard · Remi Munos · Michal Valko -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Towards a better understanding of representation dynamics under TD-learning »
Yunhao Tang · Remi Munos -
2023 Poster: Revisiting Simple Regret: Fast Rates for Returning a Good Arm »
Yao Zhao · Connor J Stephens · Csaba Szepesvari · Kwang-Sung Jun -
2023 Oral: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games »
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He -
2023 Poster: The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation »
Philip Amortila · Nan Jiang · Csaba Szepesvari -
2023 Poster: End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization »
Shohei Taniguchi · Masahiro Suzuki · Yusuke Iwasawa · Yutaka Matsuo -
2023 Poster: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Fast Rates for Maximum Entropy Exploration »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Yunhao Tang · Michal Valko · Pierre Menard -
2023 Poster: Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes »
Chuhan Xie · Wenhao Yang · Zhihua Zhang -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation »
Mark Rowland · Yunhao Tang · Clare Lyle · Remi Munos · Marc Bellemare · Will Dabney -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Anna Harutyunyan · Remi Munos · Bernardo Avila Pires · Michal Valko -
2023 Poster: The Edge of Orthogonality: A Simple View of What Makes BYOL Tick »
Pierre Richemond · Allison Tam · Yunhao Tang · Florian Strub · Bilal Piot · Feilx Hill -
2023 Poster: VA-learning as a more efficient alternative to Q-learning »
Yunhao Tang · Remi Munos · Mark Rowland · Michal Valko -
2022 Poster: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Poster: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Poster: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2022 Poster: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Oral: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2022 Oral: Large Batch Experience Replay »
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson -
2022 Oral: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Spotlight: Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Léonard Hussenot · Damien Vincent · Sertan Girgin · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2022 Poster: Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning »
Yunhao Tang -
2022 Spotlight: Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning »
Yunhao Tang -
2022 Poster: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2022 Spotlight: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games »
Mathieu Lauriere · Sarah Perrin · Sertan Girgin · Paul Muller · Ayush Jain · Theophile Cabannes · Georgios Piliouras · Julien Perolat · Romuald Elie · Olivier Pietquin · Matthieu Geist -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre Menard · Alexandra Carpentier -
2021 Spotlight: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre Menard · Alexandra Carpentier -
2021 Poster: Fast active learning for pure exploration in reinforcement learning »
Pierre Menard · Omar Darwiche Domingues · Anders Jonsson · Emilie Kaufmann · Edouard Leurent · Michal Valko -
2021 Poster: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre Menard · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: Fast active learning for pure exploration in reinforcement learning »
Pierre Menard · Omar Darwiche Domingues · Anders Jonsson · Emilie Kaufmann · Edouard Leurent · Michal Valko -
2021 Oral: UCB Momentum Q-learning: Correcting the bias without forgetting »
Pierre Menard · Omar Darwiche Domingues · Xuedong Shang · Michal Valko -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Poster: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Oral: Hyperparameter Selection for Imitation Learning »
Léonard Hussenot · Marcin Andrychowicz · Damien Vincent · Robert Dadashi · Anton Raichuk · Sabela Ramos · Nikola Momchev · Sertan Girgin · Raphael Marinier · Lukasz Stafiniak · Emmanuel Orsini · Olivier Bachem · Matthieu Geist · Olivier Pietquin -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2021 Spotlight: Offline Reinforcement Learning with Pseudometric Learning »
Robert Dadashi · Shideh Rezaeifar · Nino Vieillard · Léonard Hussenot · Olivier Pietquin · Matthieu Geist -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Fast computation of Nash Equilibria in Imperfect Information Games »
Remi Munos · Julien Perolat · Jean-Baptiste Lespiau · Mark Rowland · Bart De Vylder · Marc Lanctot · Finbarr Timbers · Daniel Hennes · Shayegan Omidshafiei · Audrunas Gruslys · Mohammad Gheshlaghi Azar · Edward Lockhart · Karl Tuyls -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Reinforcement Learning for Integer Programming: Learning to Cut »
Yunhao Tang · Shipra Agrawal · Yuri Faenza -
2020 Poster: Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning »
Zhaohan Guo · Bernardo Avila Pires · Bilal Piot · Jean-Bastien Grill · Florent Altché · Remi Munos · Mohammad Gheshlaghi Azar -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 Poster: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Oral: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Poster: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin -
2019 Poster: Learning from a Learner »
alexis jacq · Matthieu Geist · Ana Paiva · Olivier Pietquin -
2019 Oral: A Theory of Regularized Markov Decision Processes »
Matthieu Geist · Bruno Scherrer · Olivier Pietquin -
2019 Oral: Learning from a Learner »
alexis jacq · Matthieu Geist · Ana Paiva · Olivier Pietquin -
2018 Poster: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Oral: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Oral: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Poster: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Talk: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Poster: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Poster: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Poster: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos -
2017 Talk: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Talk: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Talk: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos