Timezone: »
Oral
Temporal Difference Learning as Gradient Splitting
Rui Liu · Alex Olshevsky
Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We provide an interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a fuller explanation of why temporal difference works, this interpretation also yields improved convergence times. We consider the setting with $1/\sqrt{T}$ step-size, where previous comparable finite-time convergence time bounds for temporal difference learning had the multiplicative factor $1/(1-\gamma)$ in front of the bound, with $\gamma$ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where $1/(1-\gamma)$ only multiplies an asymptotically negligible term.
Author Information
Rui Liu (Boston University)
Alex Olshevsky (Boston University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Temporal Difference Learning as Gradient Splitting »
Thu. Jul 22nd 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2020 Poster: Minimax Rate for Learning From Pairwise Comparisons in the BTL Model »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2019 Poster: Graph Resistance and Learning from Pairwise Comparisons »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2019 Oral: Graph Resistance and Learning from Pairwise Comparisons »
Julien Hendrickx · Alex Olshevsky · Venkatesh Saligrama -
2018 Poster: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama -
2018 Oral: Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers »
Yao Ma · Alex Olshevsky · Csaba Szepesvari · Venkatesh Saligrama