Skip to yearly menu bar Skip to main content


Dead-ends and Secure Exploration in Reinforcement Learning

Mehdi Fatemi · Shikhar Sharma · Harm van Seijen · Samira Ebrahimi Kahou

Pacific Ballroom #112

Keywords: [ Theory and Algorithms ]


Many interesting applications of reinforcement learning (RL) involve MDPs that include numerous dead-end" states. Upon reaching a dead-end state, the agent continues to interact with the environment in a dead-end trajectory before reaching an undesired terminal state, regardless of whatever actions are chosen. The situation is even worse when existence of many dead-end states is coupled with distant positive rewards from any initial state (we term this as Bridge Effect). Hence, conventional exploration techniques often incur prohibitively many training steps before convergence. To deal with the bridge effect, we propose a condition for exploration, called security. We next establish formal results that translate the security condition into the learning problem of an auxiliary value function. This new value function is used to capany" given exploration policy and is guaranteed to make it secure. As a special case, we use this theory and introduce secure random-walk. We next extend our results to the deep RL settings by identifying and addressing two main challenges that arise. Finally, we empirically compare secure random-walk with standard benchmarks in two sets of experiments including the Atari game of Montezuma's Revenge.

Live content is unavailable. Log in and register to view live content