Timezone: »

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
xiyao wang · Wichayaporn Wongkamjan · Furong Huang
Event URL: https://openreview.net/forum?id=2rlTW4fXFdf »

Model-based reinforcement learning (RL) achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a ``global'' dynamics model to fit the state-action visitation distribution for all historical policies. However, in this paper, we find that learning a global dynamics model does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts. We then propose a novel model-based RL method, named \textit{Policy-adaptation Model-based Actor-Critic (PMAC)}, which learns a policy-adapted dynamics model based on a policy-adaptation mechanism. This mechanism dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PMAC achieves state-of-the-art asymptotic performance and almost two times higher sample efficiency than prior model-based methods.

Author Information

xiyao wang (University of Maryland, College Park)
Wichayaporn Wongkamjan (Department of Computer Science, University of Maryland, College Park)
Furong Huang (University of Maryland)

More from the Same Authors