Timezone: »
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method’s superior performance on a variety of autonomous driving tasks in simulation.
Author Information
Adam Villaflor (Carnegie Mellon University)
Zhe Huang (Carnegie Mellon University)
Swapnil Pande (Carnegie Mellon University)
John Dolan (Carnegie Mellon University)
Jeff Schneider (CMU/Uber)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning »
Thu. Jul 21st through Fri the 22nd Room Hall E #903
More from the Same Authors
-
2022 : Paper 5: Safe Reinforcement Learning with Probabilistic Control Barrier Functions for Ramp Merging »
Soumith Udatha · John Dolan -
2022 : Q/A: Jeff Schneider »
Jeff Schneider -
2022 : Invited Speaker: Jeff Schneider »
Jeff Schneider -
2022 : Jeff Schneider »
Jeff Schneider -
2017 Poster: Equivariance Through Parameter-Sharing »
Siamak Ravanbakhsh · Jeff Schneider · Barnabás Póczos -
2017 Talk: Equivariance Through Parameter-Sharing »
Siamak Ravanbakhsh · Jeff Schneider · Barnabás Póczos