Timezone: »
Spotlight
Breaking the Deadly Triad with a Target Network
Shangtong Zhang · Hengshuai Yao · Shimon Whiteson
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning,
function approximation,
and bootstrapping simultaneously.
In this paper,
we investigate the target network as a tool for breaking the deadly triad,
providing theoretical support for the conventional wisdom that a target network stabilizes training.
We first propose and analyze a novel target network update rule which augments the commonly used Polyak-averaging style update with two projections.
We then apply the target network and ridge regularization in several divergent algorithms and show their convergence to regularized TD fixed points.
Those algorithms
are off-policy with linear function approximation and bootstrapping,
spanning both policy evaluation and control, as well as
both discounted and average-reward settings.
In particular,
we provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
Author Information
Shangtong Zhang (University of Oxford)
Hengshuai Yao (Huawei Technologies)
Shimon Whiteson (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Breaking the Deadly Triad with a Target Network »
Thu. Jul 22nd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2023 Poster: On the Convergence of SARSA with Linear Function Approximation »
Shangtong Zhang · Remi Tachet des Combes · Romain Laroche -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2021 Poster: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Poster: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Spotlight: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Poster: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Oral: Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning »
Shariq Iqbal · Christian Schroeder · Bei Peng · Wendelin Boehmer · Shimon Whiteson · Fei Sha -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning »
Tarun Gupta · Anuj Mahajan · Bei Peng · Wendelin Boehmer · Shimon Whiteson -
2020 Poster: Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation »
Shangtong Zhang · Bo Liu · Hengshuai Yao · Shimon Whiteson -
2020 Poster: Deep Coordination Graphs »
Wendelin Boehmer · Vitaly Kurin · Shimon Whiteson -
2020 Poster: GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values »
Shangtong Zhang · Bo Liu · Shimon Whiteson -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Poster: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: Fast Context Adaptation via Meta-Learning »
Luisa Zintgraf · Kyriacos Shiarlis · Vitaly Kurin · Katja Hofmann · Shimon Whiteson -
2019 Poster: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2019 Oral: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2018 Poster: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Oral: Fourier Policy Gradients »
Mattie Fellows · Kamil Ciosek · Shimon Whiteson -
2018 Poster: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Poster: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: TACO: Learning Task Decomposition via Temporal Alignment for Control »
Kyriacos Shiarlis · Markus Wulfmeier · Sasha Salter · Shimon Whiteson · Ingmar Posner -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson