Timezone: »
Deep neural networks coupled with fast simulation and improved computational speeds have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H-infinity control methods, we note that both modeling errors and differences in training and test scenarios can just be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -- that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper, Walker2d and Ant) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
Author Information
Lerrel Pinto (Carnegie Mellon University)
James Davidson (Google Brain)
Rahul Sukthankar (Google Research)
Abhinav Gupta (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Robust Adversarial Reinforcement Learning »
Mon. Aug 7th 08:30 AM -- 12:00 PM Room Gallery #4
More from the Same Authors
-
2020 : Neural Dynamic Policies for End-to-End Sensorimotor Learning »
Abhinav Gupta -
2021 Poster: PixelTransformer: Sample Conditioned Signal Generation »
Shubham Tulsiani · Abhinav Gupta -
2021 Spotlight: PixelTransformer: Sample Conditioned Signal Generation »
Shubham Tulsiani · Abhinav Gupta -
2020 Poster: Learning Robot Skills with Temporal Variational Inference »
Tanmay Shankar · Abhinav Gupta -
2019 Poster: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Oral: Learning Latent Dynamics for Planning from Pixels »
Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson -
2019 Poster: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta -
2019 Oral: Self-Supervised Exploration via Disagreement »
Deepak Pathak · Dhiraj Gandhi · Abhinav Gupta