Timezone: »
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that optimizing the expected policy advantage in the learned model under an exploration distribution is sufficient for policy computation, resulting in a significant boost in computational efficiency compared to traditional planning methods. Additionally, the unified objective uses a value moment matching term for model fitting, which is aligned with the model's usage during policy computation. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains compared to existing MBRL methods through simulated benchmarks.
Author Information
Anirudh Vemula (Aurora Innovation)
Yuda Song (Carnegie Mellon University)
Aarti Singh (Carnegie Mellon University)
J. Bagnell (Aurora Innovation)
Sanjiban Choudhury (Cornell University)
More from the Same Authors
-
2020 : Contributed Talk: Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2022 : Threshold Bandit Problem with Link Assumption between Pulls and Duels »
Keshav Narayan · Aarti Singh -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 Poster: Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality »
Dhruv Malik · Conor Igoe · Yuanzhi Li · Aarti Singh -
2023 Poster: Inverse Reinforcement Learning without Reinforcement Learning »
Gokul Swamy · David Wu · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2022 : Threshold Bandit Problem with Link Assumption between Pulls and Duels »
Keshav Narayan · Aarti Singh -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Oral: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 Poster: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Spotlight: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Poster: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2021 Spotlight: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2018 Poster: Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information »
Yichong Xu · Hariank Muthakana · Sivaraman Balakrishnan · Aarti Singh · Artur Dubrawski -
2018 Oral: Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information »
Yichong Xu · Hariank Muthakana · Sivaraman Balakrishnan · Aarti Singh · Artur Dubrawski -
2017 Poster: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Talk: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Poster: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing -
2017 Talk: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing