Timezone: »
Model-based reinforcement learning (RL) achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a ``global'' dynamics model to fit the state-action visitation distribution for all historical policies. However, in this paper, we find that learning a global dynamics model does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts. We then propose a novel model-based RL method, named \textit{Policy-adaptation Model-based Actor-Critic (PMAC)}, which learns a policy-adapted dynamics model based on a policy-adaptation mechanism. This mechanism dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PMAC achieves state-of-the-art asymptotic performance and almost two times higher sample efficiency than prior model-based methods.
Author Information
xiyao wang (University of Maryland, College Park)
Wichayaporn Wongkamjan (Department of Computer Science, University of Maryland, College Park)
Furong Huang (University of Maryland)
More from the Same Authors
-
2022 : Everyone Matters: Customizing the Dynamics of Decision Boundary for Adversarial Robustness »
Yuancheng Xu · Yanchao Sun · Furong Huang -
2022 : Certifiably Robust Multi-Agent Reinforcement Learning against Adversarial Communication »
Yanchao Sun · Ruijie Zheng · Parisa Hassanzadeh · Yongyuan Liang · Soheil Feizi · Sumitra Ganesh · Furong Huang -
2022 : Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning »
Yongyuan Liang · Yanchao Sun · Ruijie Zheng · Furong Huang -
2022 Poster: Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework »
Jiahao Su · Wonmin Byeon · Furong Huang -
2022 Spotlight: Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework »
Jiahao Su · Wonmin Byeon · Furong Huang