Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation

Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation
Chee Wee Phua - National ICT Australia, Australia Robert Fitch - National ICT Australia, Australia
Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to converge, but often do so slowly or are computationally expensive. In this paper, we propose to improve the convergence speed of piecewise linear function approximation by tracking the dynamics of the value function with the Kalman filter using a random-walk model. We cast this as a general framework in which we implement the TD, Q-Learning and MAXQ algorithms for different domains, and report empirical results demonstrating improved learning speed over previous methods.

Chee Wee Phua - National ICT Australia, Australia
Robert Fitch - National ICT Australia, Australia

Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to converge, but often do so slowly or are computationally expensive. In this paper, we propose to improve the convergence speed of piecewise linear function approximation by tracking the dynamics of the value function with the Kalman filter using a random-walk model. We cast this as a general framework in which we implement the TD, Q-Learning and MAXQ algorithms for different domains, and report empirical results demonstrating improved learning speed over previous methods.