Timezone: »

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du

Thu Jul 27 04:30 PM -- 06:00 PM (PDT) @ Exhibit Hall 1 #430

Generalization in Reinforcement Learning (RL) aims to train an agent during training that generalizes to the target environment. In this work, we first point out that RL generalization is fundamentally different from the generalization in supervised learning, and fine-tuning on the target environment is necessary for good test performance. Therefore, we seek to answer the following question: how much can we expect pre-training over training environments to be helpful for efficient and effective fine-tuning? On one hand, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, we show that pre-training can be indeed helpful in the non-asymptotic regime by designing a policy collection-elimination (PCE) algorithm and proving a distribution-dependent regret bound that is independent of the state-action space. We hope our theoretical results can provide insight towards understanding pre-training and generalization in RL.

Author Information

Haotian Ye (Peking University, Stanford University)
Haotian Ye

I am an incoming CS Ph.D. at Stanford University. Previously, I majored in Data Science (Math + Computer Science) at Yuanpei College, Peking University. I am fortunate to be advised by Professor Liwei Wang in School of Electronics Engineering and Computer Sciences. I am interested in making Machine Learning tools more powerful and interpretable in practice through theories, algorithms design, and better implementation, especially in scientific problems. If you are interested in working together, please feel free to contact me!

Xiaoyu Chen (Peking University)
Liwei Wang (Peking University)
Simon Du (University of Washington)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors