Timezone: »
We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.
Author Information
Yecheng Jason Ma (University of Pennsylvania)
Andrew Shen (University of Melbourne)
Dinesh Jayaraman (University of Pennsylvania)
Osbert Bastani (University of Pennsylvania)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching »
Thu. Jul 21st 06:00 -- 06:05 PM Room Room 301 - 303
More from the Same Authors
-
2021 : Robust Generalization of Quadratic Neural Networks via Function Identification »
Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning »
Wanqiao Xu · Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning »
Wanqiao Xu · Kan Xu · Hamsa Bastani · Osbert Bastani -
2021 : Improving Human Decision-Making with Machine Learning »
Hamsa Bastani · Osbert Bastani · Wichinpong Sinchaisri -
2021 : Improving Human Decision-Making with Machine Learning »
Hamsa Bastani · Osbert Bastani · Wichinpong Sinchaisri -
2023 : TRAC: Trustworthy Retrieval Augmented Chatbot »
Shuo Li · Sangdon Park · Insup Lee · Osbert Bastani -
2023 : TRAC: Trustworthy Retrieval Augmented Chatbot »
Shuo Li · Sangdon Park · Insup Lee · Osbert Bastani -
2023 Poster: PAC Prediction Sets for Large Language Models of Code »
Adam Khakhar · Stephen Mell · Osbert Bastani -
2023 Poster: LIV: Language-Image Representations and Rewards for Robotic Control »
Yecheng Jason Ma · Vikash Kumar · Amy Zhang · Osbert Bastani · Dinesh Jayaraman -
2023 Poster: Robust Subtask Learning for Compositional Generalization »
Kishor Jothimurugan · Steve Hsu · Osbert Bastani · Rajeev Alur -
2022 : Spotlight Presentations »
Adrian Weller · Osbert Bastani · Jake Snell · Tal Schuster · Stephen Bates · Zhendong Wang · Margaux Zaffran · Danielle Rasooly · Varun Babbar -
2022 Poster: Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming »
Chuan Wen · Jianing Qian · Jierui Lin · Jiaye Teng · Dinesh Jayaraman · Yang Gao -
2022 Spotlight: Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming »
Chuan Wen · Jianing Qian · Jierui Lin · Jiaye Teng · Dinesh Jayaraman · Yang Gao -
2022 Poster: Understanding Robust Generalization in Learning Regular Languages »
Soham Dan · Osbert Bastani · Dan Roth -
2022 Spotlight: Understanding Robust Generalization in Learning Regular Languages »
Soham Dan · Osbert Bastani · Dan Roth -
2022 Poster: Sequential Covariate Shift Detection Using Classifier Two-Sample Tests »
Sooyong Jang · Sangdon Park · Insup Lee · Osbert Bastani -
2022 Spotlight: Sequential Covariate Shift Detection Using Classifier Two-Sample Tests »
Sooyong Jang · Sangdon Park · Insup Lee · Osbert Bastani -
2021 Poster: Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings »
Kan Xu · Xuanyi Zhao · Hamsa Bastani · Osbert Bastani -
2021 Poster: State Relevance for Off-Policy Evaluation »
Simon Shen · Yecheng Jason Ma · Omer Gottesman · Finale Doshi-Velez -
2021 Spotlight: Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings »
Kan Xu · Xuanyi Zhao · Hamsa Bastani · Osbert Bastani -
2021 Spotlight: State Relevance for Off-Policy Evaluation »
Simon Shen · Yecheng Jason Ma · Omer Gottesman · Finale Doshi-Velez -
2021 Poster: Keyframe-Focused Visual Imitation Learning »
Chuan Wen · Jierui Lin · Jianing Qian · Yang Gao · Dinesh Jayaraman -
2021 Spotlight: Keyframe-Focused Visual Imitation Learning »
Chuan Wen · Jierui Lin · Jianing Qian · Yang Gao · Dinesh Jayaraman -
2020 Poster: Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings »
Jesse Zhang · Brian Cheung · Chelsea Finn · Sergey Levine · Dinesh Jayaraman -
2020 Poster: Robust and Stable Black Box Explanations »
Hima Lakkaraju · Nino Arsov · Osbert Bastani -
2020 Poster: Generating Programmatic Referring Expressions via Program Synthesis »
Jiani Huang · Calvin Smith · Osbert Bastani · Rishabh Singh · Aws Albarghouthi · Mayur Naik -
2019 Poster: Learning Neurosymbolic Generative Models via Program Synthesis »
Halley R Young · Osbert Bastani · Mayur Naik -
2019 Oral: Learning Neurosymbolic Generative Models via Program Synthesis »
Halley R Young · Osbert Bastani · Mayur Naik