Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Mixture of Step Returns in Bootstrapped DQN

PoHan Chiang · Hsuan-Kung Yang · Zhang-Wei Hong · Chun-Yi Lee


The concept of utilizing multi-step returns for updating value functions has long been adopted in the reinforcement learning domain. Conventional methods such as TD-lambda further extend this concept and use a single target value equivalent to an exponential average of different step returns. Nevertheless, different backup lengths provide diverse advantages in terms of bias and variance of value estimates, convergence speeds, and learning behaviors of the agent. Integrating step returns into a single target sacrifices the advantages offered by different step return targets. In order to address this issue, we propose Mixture Bootstrapped DQN (MB-DQN) and employ different backup lengths for different bootstrapped heads. MB-DQN enables diversity of the target values that is unavailable in approaches relying only on a single target value. In this paper, we first comprehensively highlight the motivational insights through a simple maze environment. Then, in order to validate the effectiveness of MB-DQN, we perform experiments on various Atari 2600 benchmark environments, and demonstrate the performance improvement of MB-DQN over the baseline methods. Finally, we provide ablation analyses to verify MB-DQN in a set of analytical cases.

Chat is not available.