Timezone: »
Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse. Recent methods of learning from demonstrations have shown to be promising in overcoming exploration difficulties but typically require considerable high-quality demonstrations that are difficult to collect. We propose to effectively leverage available demonstrations to guide exploration through enforcing occupancy measure matching between the learned policy and current demonstrations, and develop a novel Policy Optimization from Demonstration (POfD) method. We show that POfD induces implicit dynamic reward shaping and brings provable benefits for policy improvement. Furthermore, it can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks, even when the demonstrations are few and imperfect.
Author Information
Bingyi Kang (National University of Singapore)
Zequn Jie (Tencent AI Lab)
Jiashi Feng (National University of Singapore)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Policy Optimization with Demonstrations »
Fri. Jul 13th 04:15 -- 07:00 PM Room Hall B #16
More from the Same Authors
-
2023 Poster: Bag of Tricks for Training Data Extraction from Language Models »
Weichen Yu · Tianyu Pang · Qian Liu · Chao Du · Bingyi Kang · Yan Huang · Min Lin · Shuicheng YAN -
2021 Poster: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Spotlight: CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection »
Hanshu YAN · Jingfeng Zhang · Gang Niu · Jiashi Feng · Vincent Tan · Masashi Sugiyama -
2021 Poster: Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing »
Kaixin Wang · Kuangqi Zhou · Qixin Zhang · Jie Shao · Bryan Hooi · Jiashi Feng -
2021 Spotlight: Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing »
Kaixin Wang · Kuangqi Zhou · Qixin Zhang · Jie Shao · Bryan Hooi · Jiashi Feng -
2020 Poster: Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation »
Jian Liang · Dapeng Hu · Jiashi Feng -
2018 Poster: WSNet: Compact and Efficient Networks Through Weight Sampling »
Xiaojie Jin · Yingzhen Yang · Ning Xu · Jianchao Yang · Nebojsa Jojic · Jiashi Feng · Shuicheng Yan -
2018 Oral: WSNet: Compact and Efficient Networks Through Weight Sampling »
Xiaojie Jin · Yingzhen Yang · Ning Xu · Jianchao Yang · Nebojsa Jojic · Jiashi Feng · Shuicheng Yan -
2018 Poster: Understanding Generalization and Optimization Performance of Deep CNNs »
Pan Zhou · Jiashi Feng -
2018 Oral: Understanding Generalization and Optimization Performance of Deep CNNs »
Pan Zhou · Jiashi Feng