Skip to yearly menu bar Skip to main content


Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin · Hengshuai Yao · Linglong Kong · Kaiwen Wu · Yaoliang Yu

Pacific Ballroom #102

Keywords: [ Theory and Algorithms ] [ Deep Reinforcement Learning ]


In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.

Live content is unavailable. Log in and register to view live content