Timezone: »

 
Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang

Optimism in the face of uncertainty is a principled approach for provably efficient exploration for reinforcement learning in tabular and linear settings. However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL). To address this problem, we propose an Optimistic Exploration algorithm with Backward Bootstrapped Bonus (OEB3) for DRL. We construct an UCB-bonus indicating the uncertainty of Q-functions. The UCB-bonus is further utilized to estimate an optimistic Q-value, which encourages the agent to explore the scarcely visited states and actions to reduce uncertainty. In the estimation of Q-function, we adopt an episodic backward update strategy to propagate the future uncertainty to the estimated Q-function consistently. Experiments show that OEB3 outperforms several state-of-the-art exploration approaches 49 Atari games.

Author Information

Chenjia Bai (Harbin Institute of Technology)
Lingxiao Wang (Northwestern University)
Lei Han (Tencent AI Lab)
Jianye Hao (Tianjin University)
Animesh Garg (University of Toronto, Vector Institute, Nvidia)
Peng Liu (Harbin Institute of Technology)
Zhaoran Wang (Northwestern U)

More from the Same Authors