Timezone: »
Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety or resources). In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can damage either the system or the agent in a way that breaks the learning process itself. In this work, we model the problem of learning with constraints as a Constrained Markov Decision Process and provide a new on-policy formulation for solving it. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. Through this, we define a safe policy improvement method which maximizes returns while ensuring that the constraints are satisfied at every step. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training. We also highlight the computational advantages of this approach. The effectiveness of our approach is demonstrated on safe navigation tasks and in safety-constrained versions of MuJoCo environments, with deep neural networks.
Author Information
Harsh Satija (McGill University)
Philip Amortila (McGill University)
Joelle Pineau (McGill University / Facebook)
More from the Same Authors
-
2023 : Fostering Women's Leadership in the Realm of Emerging Trends and Technologies »
Joelle Pineau · Rihab Gorsane · Pascale FUNG -
2023 : Joelle Pineau - A culture of open and reproducible research, in the era of large AI generative models »
Joelle Pineau -
2023 Panel: The Societal Impacts of AI »
Sanmi Koyejo · Samy Bengio · Ashia Wilson · Kirikowhai Mikaere · Joelle Pineau -
2022 Workshop: Responsible Decision Making in Dynamic Environments »
Virginie Do · Thorsten Joachims · Alessandro Lazaric · Joelle Pineau · Matteo Pirotta · Harsh Satija · Nicolas Usunier -
2021 Workshop: ICML 2021 Workshop on Unsupervised Reinforcement Learning »
Feryal Behbahani · Joelle Pineau · Lerrel Pinto · Roberta Raileanu · Aravind Srinivas · Denis Yarats · Amy Zhang -
2021 Poster: Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards »
Susan Amin · Maziar Gomrokchi · Hossein Aboutalebi · Harsh Satija · Doina Precup -
2021 Poster: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation »
Jongmin Lee · Wonseok Jeon · Byung-Jun Lee · Joelle Pineau · Kee-Eung Kim -
2021 Spotlight: Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards »
Susan Amin · Maziar Gomrokchi · Hossein Aboutalebi · Harsh Satija · Doina Precup -
2021 Spotlight: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation »
Jongmin Lee · Wonseok Jeon · Byung-Jun Lee · Joelle Pineau · Kee-Eung Kim -
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Rosanne Liu · Ryan Lowe · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Online Learned Continual Compression with Adaptive Quantization Modules »
Lucas Caccia · Eugene Belilovsky · Massimo Caccia · Joelle Pineau -
2020 Poster: Interference and Generalization in Temporal Difference Learning »
Emmanuel Bengio · Joelle Pineau · Doina Precup -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2019 Oral: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Oral: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 : Lifelong Learning - Panel Discussion »
Sergey Levine · Joelle Pineau · Balaraman Ravindran · Andrei A Rusu -
2017 : Joelle Pineau: A few modest insights from my lifelong learning »
Joelle Pineau