Timezone: »
Many interesting applications of reinforcement learning (RL) involve MDPs that include many dead-end" states. Upon reaching a dead-end state, the agent continues to interact with the environment in a dead-end trajectory before reaching a terminal state, but cannot collect any positive reward, regardless of whatever actions are chosen by the agent. The situation is even worse when existence of many dead-end states is coupled with distant positive rewards from any initial state (it is called Bridge Effect). Hence, conventional exploration techniques often incur prohibitively large training steps before convergence. To deal with the bridge effect, we propose a condition for exploration, called security. We next establish formal results that translate the security condition into the learning problem of an auxiliary value function. This new value function is used to cap
any" given exploration policy and is guaranteed to make it secure. As a special case, we use this theory and introduce secure random-walk. We next extend our results to the deep RL settings by identifying and addressing two main challenges that arise. Finally, we empirically compare secure random-walk with standard benchmarks in two sets of experiments including the Atari game of Montezuma's Revenge.
Author Information
Mehdi Fatemi (Microsoft Research)
Shikhar Sharma (Microsoft Research)
Harm van Seijen (Microsoft Research)
Samira Ebrahimi Kahou (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Dead-ends and Secure Exploration in Reinforcement Learning »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #112
More from the Same Authors
-
2022 Poster: Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods »
Yi Wan · Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Sarath Chandar · Harm van Seijen -
2022 Spotlight: Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods »
Yi Wan · Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Sarath Chandar · Harm van Seijen -
2021 Poster: Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks »
Sungryull Sohn · Sungtae Lee · Jongwook Choi · Harm van Seijen · Mehdi Fatemi · Honglak Lee -
2021 Spotlight: Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks »
Sungryull Sohn · Sungtae Lee · Jongwook Choi · Harm van Seijen · Mehdi Fatemi · Honglak Lee -
2020 : Panel Discussion »
Eric Eaton · Martha White · Doina Precup · Irina Rish · Harm van Seijen -
2017 : Achieving Above-Human Performance on Ms. Pac-Man by Reward Decomposition »
Harm van Seijen