Timezone: »

Towards Optimal Exploration in Linear Markov Decision Processes: Covering and Dimension-Optimal PAC
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson

Thu Jul 21 08:05 AM -- 08:10 AM (PDT) @ None
Efficient exploration strategies are a core and indispensable primitive in virtually all provably efficient reinforcement learning algorithms. In this work, we provide such strategies for linear MDPs. We develop an algorithm which efficiently traverses a linear MDP, collecting samples in any given ``feature direction'', and enjoys a sample complexity scaling optimally in the (linear MDP equivalent of the) maximal state visitation probability. This algorithmic tool yields a computationally efficient reward-free reinforcement learning algorithm with sample complexity scaling as $\mathcal{O}(d^2/\epsilon^2)$. We show a matching lower bound of $\Omega(d^2/\epsilon^2)$ on PAC RL in linear MDPs, implying that, in contrast to the tabular setting, reward-free RL is no harder than PAC RL in linear MDPs. To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal $d$ dependence in linear MDPs, even in the single-reward PAC setting. We also apply our sample collection strategy to efficiently obtain ``well-conditioned'' covariates---covariates with lower bounded minimum eigenvalue---which, to our knowledge, is the first approach able to provably collect ``full-rank'' data in linear MDPs.

Author Information

Andrew Wagenmaker (University of Washington)
Yifang Chen (University of Washington)
Max Simchowitz (MIT)
Simon Du (University of Washington)
Kevin Jamieson (University of Washington)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors