Timezone: »

 
Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
Sheng Zhang · Zhe Zhang · Siva Maguluri
The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts. To the best of our knowledge, we provide the first known finite sample guarantees using both constant and diminishing step sizes of (i) average-reward TD($\lambda$) with linear function approximation for policy evaluation and (ii) average-reward $Q$-learning in the tabular setting to find an optimal policy. A major challenge is that since the value functions are agnostic to an additive constant, the corresponding Bellman operators are no longer contraction mappings under any norm. We obtain the results for TD($\lambda$) by working in an appropriately defined subspace that ensures uniqueness of the solution. For $Q$-learning, we exploit the span seminorm contractive property of the Bellman operator and construct a novel Lyapunov function obtained by infimal convolution of the generalized Moreau envelope and the indicator function of a set.

Author Information

Sheng Zhang (Georgia Institute of Technology)
Zhe Zhang (Georgia Institute of Technology)
Siva Maguluri (Georgia Tech)

More from the Same Authors