Poster
in
Workshop: Workshop on Reinforcement Learning Theory
On the Sample Complexity of Average-reward MDPs
Yujia Jin
Abstract:
In this work we study the sample complexity for solving average-reward Markov decision processes (AMDPs), under a generative model access and mixing time bound on all stationary policies. Given an AMDP with mixing time bound $t_{mix}$ and $A_{tot}$ total state-action pairs, we present two methods for finding approximately-optimal stationary policy, altogether obtaining an upper bound of $\widetilde{O}(A_{tot}t_{mix}/\epsilon^2\cdot\min(t_{mix}, 1/\epsilon))$ in sample complexity. We also provide a sample complexity lower bound of $\Omega(A_{tot}t_{mix}/\epsilon^2)$ oblivious samples. This work makes progress toward designing new algorithms with better sample complexity for solving AMDPs and points to the final open problem of closing the gap with the lower bound.
Chat is not available.