Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case

Gandharv Patil · Prashanth L.A. · Doina Precup


In this paper, we study the finite-time behaviour of temporal difference (TD) learning algorithms when combined with tail-averaging, and present instance dependent bounds on the parameter error of the tail-averaged TD iterate. Our error bounds hold in expectation as well as with high probability, exhibit a sharper rate of decay for the initial error (bias), and are comparable with existing bounds in the literature.

Chat is not available.