Timezone: »
Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations. We empirically observe that an agent trained on many variations (a generalist) tends to learn faster at the beginning, yet its performance plateaus at a less optimal level for a long time. In contrast, an agent trained only on a few variations (a specialist) can often achieve high returns under a limited computational budget. To have the best of both worlds, we propose a novel generalist-specialist training framework. Specifically, we first train a generalist on all environment variations; when it fails to improve, we launch a large population of specialists with weights cloned from the generalist, each trained to master a selected small subset of variations. We finally resume the training of the generalist with auxiliary rewards induced by demonstrations of all specialists. In particular, we investigate the timing to start specialist training and compare strategies to learn generalists with assistance from specialists.We show that this framework pushes the envelope of policy learning on several challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.
Author Information
Zhiwei Jia (University of California, San Diego)
Xuanlin Li (UCSD)
Zhan Ling (UCSD)
Shuang Liu (University of California, San Diego)
Yiran Wu (University of California, San Diego)
Hao Su (UCSD)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Improving Policy Optimization with Generalist-Specialist Learning »
Wed. Jul 20th 03:55 -- 04:00 PM Room Room 307
More from the Same Authors
-
2021 : Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2023 : Situated Interaction with Real-Time State Conditioning of Language Models »
Sunny Panchal · Guillaume Berger · Antoine Mercier · Cornelius Böhm · Florian Dietrichkeit · Xuanlin Li · Reza Pourreza · Pulkit Madan · Apratim Bhattacharyya · Mingu Lee · Mark Todorovich · Ingo Bax · Roland Memisevic -
2023 Poster: Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization »
Stone Tao · Xiaochen Li · Tongzhou Mu · Zhiao Huang · Yuzhe Qin · Hao Su -
2023 Poster: Reparameterized Policy Learning for Multimodal Trajectory Optimization »
Zhiao Huang · Litian Liang · Zhan Ling · Xuanlin Li · Chuang Gan · Hao Su -
2023 Oral: Reparameterized Policy Learning for Multimodal Trajectory Optimization »
Zhiao Huang · Litian Liang · Zhan Ling · Xuanlin Li · Chuang Gan · Hao Su -
2023 Poster: On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline »
Nicklas Hansen · Zhecheng Yuan · Yanjie Ze · Tongzhou Mu · Aravind Rajeswaran · Hao Su · Huazhe Xu · Xiaolong Wang -
2022 Poster: Temporal Difference Learning for Model Predictive Control »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2022 Spotlight: Temporal Difference Learning for Model Predictive Control »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2020 Poster: Information-Theoretic Local Minima Characterization and Regularization »
Zhiwei Jia · Hao Su