Timezone: »
Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.
Author Information
Jin Zhang (Tsinghua University)
Jianhao Wang (Tsinghua University)
Hao Hu (Tsinghua University)
Tong Chen (Tsinghua University)
Yingfeng Chen (NetEase Fuxi AI Lab)
Changjie Fan (NetEase Fuxi AI Lab)
Chongjie Zhang (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration »
Fri. Jul 23rd 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2023 : Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning »
Jinyi Liu · Yi Ma · Jianye Hao · Yujing Hu · Yan Zheng · Tangjie Lv · Changjie Fan -
2023 Poster: Offline Meta Reinforcement Learning with In-Distribution Online Adaptation »
Jianhao Wang · Jin Zhang · Haozhe Jiang · Junyu Zhang · Liwei Wang · Chongjie Zhang -
2022 Poster: On the Role of Discount Factor in Offline Reinforcement Learning »
Hao Hu · yiqin yang · Qianchuan Zhao · Chongjie Zhang -
2022 Spotlight: On the Role of Discount Factor in Offline Reinforcement Learning »
Hao Hu · yiqin yang · Qianchuan Zhao · Chongjie Zhang -
2022 Poster: Self-Organized Polynomial-Time Coordination Graphs »
Qianlan Yang · Weijun Dong · Zhizhou Ren · Jianhao Wang · Tonghan Wang · Chongjie Zhang -
2022 Poster: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Poster: Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning »
Jiahui Li · Kun Kuang · Baoxiang Wang · Furui Liu · Long Chen · Changjie Fan · Fei Wu · Jun Xiao -
2022 Spotlight: Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning »
Jiahui Li · Kun Kuang · Baoxiang Wang · Furui Liu · Long Chen · Changjie Fan · Fei Wu · Jun Xiao -
2022 Spotlight: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Spotlight: Self-Organized Polynomial-Time Coordination Graphs »
Qianlan Yang · Weijun Dong · Zhizhou Ren · Jianhao Wang · Tonghan Wang · Chongjie Zhang -
2021 Poster: Generalizable Episodic Memory for Deep Reinforcement Learning »
Hao Hu · Jianing Ye · Guangxiang Zhu · Zhizhou Ren · Chongjie Zhang -
2021 Spotlight: Generalizable Episodic Memory for Deep Reinforcement Learning »
Hao Hu · Jianing Ye · Guangxiang Zhu · Zhizhou Ren · Chongjie Zhang -
2020 Poster: ROMA: Multi-Agent Reinforcement Learning with Emergent Roles »
Tonghan Wang · Heng Dong · Victor Lesser · Chongjie Zhang -
2020 Poster: Q-value Path Decomposition for Deep Multiagent Reinforcement Learning »
Yaodong Yang · Jianye Hao · Guangyong Chen · Hongyao Tang · Yingfeng Chen · Yujing Hu · Changjie Fan · Zhongyu Wei