Timezone: »

Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case
Gandharv Patil · Prashanth L.A. · Doina Precup

In this paper, we study the finite-time behaviour of temporal difference (TD) learning algorithms when combined with tail-averaging, and present instance dependent bounds on the parameter error of the tail-averaged TD iterate. Our error bounds hold in expectation as well as with high probability, exhibit a sharper rate of decay for the initial error (bias), and are comparable with existing bounds in the literature.

Author Information

Gandharv Patil (McGill Univesity)

PhD student at McGill University working on Reinforcement Learning and Stochastic Optimisation.

Prashanth L.A. (IIT Madras)
Doina Precup (McGill University / DeepMind)

More from the Same Authors