Timezone: »
Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive exponential information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) realizability holds, 2) the batch algorithm observes the exact reward and transition functions, and 3) the batch algorithm is given the best a priori data distribution for the problem class. Our work introduces a new `oracle + batch algorithm' framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.
Author Information
Andrea Zanette (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL »
Thu. Jul 22nd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2019 Poster: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Oral: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2018 Poster: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2018 Oral: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill