Timezone: »
Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.
Author Information
Scott Fujimoto (McGill University)
David Meger (McGill University)
Doina Precup (McGill University / DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Off-Policy Deep Reinforcement Learning without Exploration »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #38
More from the Same Authors
-
2021 : Randomized Least Squares Policy Optimization »
Haque Ishfaq · Zhuoran Yang · Andrei Lupu · Viet Nguyen · Lewis Liu · Riashat Islam · Zhaoran Wang · Doina Precup -
2021 : Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case »
Gandharv Patil · Prashanth L.A. · Doina Precup -
2023 : On learning history-based policies for controlling Markov decision processes »
Gandharv Patil · Aditya Mahajan · Doina Precup -
2023 : An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets »
Nikhil Murali Vemgal · Elaine Lau · Doina Precup -
2023 : Accelerating exploration and representation learning with offline pre-training »
Bogdan Mazoure · Jake Bruce · Doina Precup · Rob Fergus · Ankit Anand -
2023 Poster: Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2022 Workshop: Decision Awareness in Reinforcement Learning »
Evgenii Nikishin · Pierluca D'Oro · Doina Precup · Andre Barreto · Amir-massoud Farahmand · Pierre-Luc Bacon -
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Poster: Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification »
Leo Schwinn · Leon Bungert · An Nguyen · RenĂ© Raab · Falk Pulsmeyer · Doina Precup · Bjoern Eskofier · Dario Zanca -
2022 Spotlight: Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification »
Leo Schwinn · Leon Bungert · An Nguyen · RenĂ© Raab · Falk Pulsmeyer · Doina Precup · Bjoern Eskofier · Dario Zanca -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards »
Susan Amin · Maziar Gomrokchi · Hossein Aboutalebi · Harsh Satija · Doina Precup -
2021 Poster: A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation »
Scott Fujimoto · David Meger · Doina Precup -
2021 Spotlight: A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation »
Scott Fujimoto · David Meger · Doina Precup -
2021 Spotlight: Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards »
Susan Amin · Maziar Gomrokchi · Hossein Aboutalebi · Harsh Satija · Doina Precup -
2021 Poster: Preferential Temporal Difference Learning »
Nishanth Anand · Doina Precup -
2021 Spotlight: Preferential Temporal Difference Learning »
Nishanth Anand · Doina Precup -
2020 : Panel Discussion »
Eric Eaton · Martha White · Doina Precup · Irina Rish · Harm van Seijen -
2020 Workshop: 4th Lifelong Learning Workshop »
Shagun Sodhani · Sarath Chandar · Balaraman Ravindran · Doina Precup -
2020 Poster: Interference and Generalization in Temporal Difference Learning »
Emmanuel Bengio · Joelle Pineau · Doina Precup -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2020 : Mentoring Panel: Doina Precup, Deborah Raji, Anima Anandkumar, Angjoo Kanazawa and Sinead Williamson (moderator). »
Doina Precup · Inioluwa Raji · Angjoo Kanazawa · Sinead A Williamson · Animashree Anandkumar -
2020 : Invited Talk: Doina Precup on Building Knowledge for AI Agents with Reinforcement Learning »
Doina Precup -
2019 Workshop: Workshop on Multi-Task and Lifelong Reinforcement Learning »
Sarath Chandar · Shagun Sodhani · Khimya Khetarpal · Tom Zahavy · Daniel J. Mankowitz · Shie Mannor · Balaraman Ravindran · Doina Precup · Chelsea Finn · Abhishek Gupta · Amy Zhang · Kyunghyun Cho · Andrei A Rusu · Facebook Rob Fergus -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects »
Edward Smith · Scott Fujimoto · Adriana Romero Soriano · David Meger -
2019 Oral: GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects »
Edward Smith · Edward Smith · Scott Fujimoto · Adriana Romero Soriano · Scott Fujimoto · Adriana Romero Soriano · David Meger · David Meger -
2018 Poster: Addressing Function Approximation Error in Actor-Critic Methods »
Scott Fujimoto · Herke van Hoof · David Meger -
2018 Poster: Convergent Tree Backup and Retrace with Function Approximation »
Ahmed Touati · Pierre-Luc Bacon · Doina Precup · Pascal Vincent -
2018 Oral: Addressing Function Approximation Error in Actor-Critic Methods »
Scott Fujimoto · Herke van Hoof · David Meger -
2018 Oral: Convergent Tree Backup and Retrace with Function Approximation »
Ahmed Touati · Pierre-Luc Bacon · Doina Precup · Pascal Vincent -
2017 Workshop: Reinforcement Learning Workshop »
Doina Precup · Balaraman Ravindran · Pierre-Luc Bacon