Timezone: »

Spotlight
Towards Optimal Exploration in Linear Markov Decision Processes: Covering and Dimension-Optimal PAC
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson

Thu Jul 21 08:05 AM -- 08:10 AM (PDT) @ None
Efficient exploration strategies are a core and indispensable primitive in virtually all provably efficient reinforcement learning algorithms. In this work, we provide such strategies for linear MDPs. We develop an algorithm which efficiently traverses a linear MDP, collecting samples in any given feature direction'', and enjoys a sample complexity scaling optimally in the (linear MDP equivalent of the) maximal state visitation probability. This algorithmic tool yields a computationally efficient reward-free reinforcement learning algorithm with sample complexity scaling as $\mathcal{O}(d^2/\epsilon^2)$. We show a matching lower bound of $\Omega(d^2/\epsilon^2)$ on PAC RL in linear MDPs, implying that, in contrast to the tabular setting, reward-free RL is no harder than PAC RL in linear MDPs. To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal $d$ dependence in linear MDPs, even in the single-reward PAC setting. We also apply our sample collection strategy to efficiently obtain well-conditioned'' covariates---covariates with lower bounded minimum eigenvalue---which, to our knowledge, is the first approach able to provably collect full-rank'' data in linear MDPs.