Timezone: »

Bridging RL Theory and Practice with the Effective Horizon
Cassidy Laidlaw · Stuart Russell · Anca Dragan

Fri Jul 28 05:30 PM -- 05:45 PM (PDT) @
Event URL: https://openreview.net/forum?id=oPJm6zAlVK »

Deep reinforcement learning (RL) works impressively in some environments and fails catastrophically in others. Ideally, RL theory should be able to provide an understanding of why this is, i.e. bounds predictive of practical performance. Unfortunately, current theory does not quite have this ability. We compare standard deep RL algorithms to prior sample complexity bounds by introducing a new dataset, BRIDGE. It consists of 155 MDPs from common deep RL benchmarks, along with their corresponding tabular representations, which enables us to exactly compute instance-dependent bounds. We find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does. When actions with the highest Q-values under the random policy also have the highest Q-values under the optimal policy—i.e., when it is optimal to act greedily with respect to the random's policy Q function—deep RL tends to succeed; when they don't, deep RL tends to fail. We generalize this property into a new complexity measure of an MDP that we call the effective horizon, which roughly corresponds to how many steps of lookahead search would be needed in that MDP in order to identify the next optimal action, when leaf nodes are evaluated with random rollouts. Using BRIDGE, we show that the effective horizon-based bounds are more closely reflective of the empirical performance of PPO and DQN than prior sample complexity bounds across four metrics. We also show that, unlike existing bounds, the effective horizon can predict the effects of using reward shaping or a pre-trained exploration policy.

Author Information

Cassidy Laidlaw (University of California Berkeley)
Cassidy Laidlaw

I’m a third-year PhD student studying computer science at the University of California, Berkeley. I’m interested in human-AI cooperation, reinforcement learning theory, and robustness and uncertainty in machine learning. I received my BS in computer science and mathematics from the University of Maryland in 2018. My PhD is currently funded by a National Defense Science and Engineering Graduate (NDSEG) Fellowship and I am also the recipient of an Open Phil AI Fellowship.

Stuart Russell (UC Berkeley)
Anca Dragan (University of California, Berkeley)

More from the Same Authors