Timezone: »
Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making, which, however, may fail catastrophically if the model is inaccurate. Although there are several existing methods dedicated to combating the model error, the potential of the single forward model is still limited. In this paper, we propose to additionally construct a backward dynamics model to reduce the reliance on accuracy in forward model predictions. We develop a novel method, called Bidirectional Model-based Policy Optimization (BMPO) to utilize both the forward model and backward model to generate short branched rollouts for policy optimization. Furthermore, we theoretically derive a tighter bound of return discrepancy, which shows the superiority of BMPO against the one using merely the forward model. Extensive experiments demonstrate that BMPO outperforms state-of-the-art model-based methods in terms of sample efficiency and asymptotic performance.
Author Information
Hang Lai (Shanghai Jiao Tong University)
Jian Shen (Shanghai Jiao Tong University)
Weinan Zhang (Shanghai Jiao Tong University)
Yong Yu (Shanghai Jiao Tong University)
More from the Same Authors
-
2023 Poster: GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models »
Hanjing Wang · Man-Kit Sit · Congjie He · Ying Wen · Weinan Zhang · Jun Wang · Yaodong Yang · Luo Mai -
2022 Poster: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Spotlight: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2020 Poster: Multi-Agent Determinantal Q-Learning »
Yaodong Yang · Ying Wen · Jun Wang · Liheng Chen · Kun Shao · David Mguni · Weinan Zhang -
2019 Poster: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2019 Oral: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2018 Poster: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu -
2018 Poster: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu