Timezone: »
Policies for partially observed Markov decision processes can be efficiently learned by imitating expert policies generated using asymmetric information. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and as a result may encourage actions that are sub-optimal or unsafe under partial information. To address this issue, we derive an update which, when applied iteratively to an expert, maximizes the expected reward of the trainee's policy. Using this update, we construct a computationally efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and trainee policies. We then show that A2D allows the trainee to safely imitate the modified expert, and outperforms policies learned either by imitating a fixed expert or through direct reinforcement learning.
Author Information
Andrew Warrington (University of Oxford)
Jonathan Lavington (University of British Columbia)
Adam Scibior (University of British Columbia)
Mark Schmidt (University of British Columbia)
Frank Wood (University of British Columbia)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Robust Asymmetric Learning in POMDPs »
Wed. Jul 21st 12:00 -- 12:20 AM Room
More from the Same Authors
-
2023 : Visual Chain-of-Thought Diffusion Models »
William Harvey · Frank Wood -
2023 : Scaling Graphically Structured Diffusion Models »
Christian Weilbach · William Harvey · Hamed Shirzad · Frank Wood -
2023 Oral: Uncertain Evidence in Probabilistic Models and Stochastic Simulators »
Andreas Munk · Alexander Mead · Frank Wood -
2023 Poster: Graphically Structured Diffusion Models »
Christian Weilbach · William Harvey · Frank Wood -
2023 Poster: Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad · Mark Schmidt · Nicolas Le Roux -
2023 Poster: Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning »
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Khan Emtiyaz · Mark Schmidt -
2023 Oral: Graphically Structured Diffusion Models »
Christian Weilbach · William Harvey · Frank Wood -
2023 Poster: Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence »
Julie Nutini · Issam Laradji · Mark Schmidt -
2023 Poster: Uncertain Evidence in Probabilistic Models and Stochastic Simulators »
Andreas Munk · Alexander Mead · Frank Wood -
2021 Poster: Tractable structured natural-gradient descent using local parameterizations »
Wu Lin · Frank Nielsen · Khan Emtiyaz · Mark Schmidt -
2021 Spotlight: Tractable structured natural-gradient descent using local parameterizations »
Wu Lin · Frank Nielsen · Khan Emtiyaz · Mark Schmidt -
2020 Poster: Handling the Positive-Definite Constraint in the Bayesian Learning Rule »
Wu Lin · Mark Schmidt · Mohammad Emtiyaz Khan -
2020 Poster: All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference »
Rob Brekelmans · Vaden Masrani · Frank Wood · Greg Ver Steeg · Aram Galstyan -
2019 Poster: Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations »
Wu Lin · Mohammad Emtiyaz Khan · Mark Schmidt -
2019 Poster: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2019 Oral: Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations »
Wu Lin · Mohammad Emtiyaz Khan · Mark Schmidt -
2019 Oral: Amortized Monte Carlo Integration »
Adam Golinski · Frank Wood · Tom Rainforth -
2018 Poster: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2018 Oral: Deep Variational Reinforcement Learning for POMDPs »
Maximilian Igl · Luisa Zintgraf · Tuan Anh Le · Frank Wood · Shimon Whiteson -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt