Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

On the Sample Complexity of Average-reward MDPs

Yujia Jin

Abstract: In this work we study the sample complexity for solving average-reward Markov decision processes (AMDPs), under a generative model access and mixing time bound on all stationary policies. Given an AMDP with mixing time bound $t_{mix}$ and $A_{tot}$ total state-action pairs, we present two methods for finding approximately-optimal stationary policy, altogether obtaining an upper bound of $\widetilde{O}(A_{tot}t_{mix}/\epsilon^2\cdot\min(t_{mix}, 1/\epsilon))$ in sample complexity. We also provide a sample complexity lower bound of $\Omega(A_{tot}t_{mix}/\epsilon^2)$ oblivious samples. This work makes progress toward designing new algorithms with better sample complexity for solving AMDPs and points to the final open problem of closing the gap with the lower bound.

Chat is not available.