Timezone: »
Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety or resources). In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can damage either the system or the agent in a way that breaks the learning process itself. In this work, we model the problem of learning with constraints as a Constrained Markov Decision Process and provide a new on-policy formulation for solving it. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. Through this, we define a safe policy improvement method which maximizes returns while ensuring that the constraints are satisfied at every step. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training. We also highlight the computational advantages of this approach. The effectiveness of our approach is demonstrated on safe navigation tasks and in safety-constrained versions of MuJoCo environments, with deep neural networks.
Author Information
Harsh Satija (McGill University)
Philip Amortila (McGill University)
Joelle Pineau (McGill University / Facebook)
More from the Same Authors
-
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Ryan Lowe · Rosanne Liu · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Online Learned Continual Compression with Adaptive Quantization Modules »
Lucas Caccia · Eugene Belilovsky · Massimo Caccia · Joelle Pineau -
2020 Poster: Interference and Generalization in Temporal Difference Learning »
Emmanuel Bengio · Joelle Pineau · Doina Precup -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2019 Oral: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Oral: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio