ICML On the Sample Complexity of Average-reward MDPs

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

On the Sample Complexity of Average-reward MDPs

Yujia Jin

[ Abstract ]

[ Visit Poster at Spot D0 in Virtual World ]

Abstract: In this work we study the sample complexity for solving average-reward Markov decision processes (AMDPs), under a generative model access and mixing time bound on all stationary policies. Given an AMDP with mixing time bound

t_{m i x}

$t_{mix}$ and

A_{t o t}

$A_{tot}$ total state-action pairs, we present two methods for finding approximately-optimal stationary policy, altogether obtaining an upper bound of

\tilde{O} (A_{t o t} t_{m i x} / ϵ^{2} \cdot min (t_{m i x}, 1 / ϵ))

$\widetilde{O}(A_{tot}t_{mix}/\epsilon^2\cdot\min(t_{mix}, 1/\epsilon))$ in sample complexity. We also provide a sample complexity lower bound of

Ω (A_{t o t} t_{m i x} / ϵ^{2})

$\Omega(A_{tot}t_{mix}/\epsilon^2)$ oblivious samples. This work makes progress toward designing new algorithms with better sample complexity for solving AMDPs and points to the final open problem of closing the gap with the lower bound.

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

On the Sample Complexity of Average-reward MDPs

Yujia Jin

Poster
in
Workshop: Workshop on Reinforcement Learning Theory