Timezone: »
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.
Author Information
Scott Fujimoto (McGill University)
Herke van Hoof (McGill University)
David Meger (McGill University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Addressing Function Approximation Error in Actor-Critic Methods »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #86
More from the Same Authors
-
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2021 Poster: A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation »
Scott Fujimoto · David Meger · Doina Precup -
2021 Spotlight: A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation »
Scott Fujimoto · David Meger · Doina Precup -
2019 Poster: Off-Policy Deep Reinforcement Learning without Exploration »
Scott Fujimoto · David Meger · Doina Precup -
2019 Poster: GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects »
Edward Smith · Scott Fujimoto · Adriana Romero Soriano · David Meger -
2019 Oral: GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects »
Edward Smith · Edward Smith · Scott Fujimoto · Adriana Romero Soriano · Scott Fujimoto · Adriana Romero Soriano · David Meger · David Meger -
2019 Oral: Off-Policy Deep Reinforcement Learning without Exploration »
Scott Fujimoto · David Meger · Doina Precup -
2018 Poster: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Oral: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau