Interpolation-based Q-learning |
---|

Csaba Szepesvari - Computer and Automation Research Institute of the Hungarian Academy of SciencesWilliam D. Smart - Department of Computer Science and Engineering, Washington University in St. Louis |

We consider a variant of Q-learning in continuous state spaces under the totalexpected discounted cost criterion combined with local function approximationmethods. Provided that the function approximator satisfies certaininterpolation properties, the resulting algorithm is shown to converge withprobability one. The limit function is shown to satisfy a fixed point equationof the Bellman type, where the fixed point operator depends on the stationarydistribution of the exploration policy and the function approximation method.The basic algorithm is extended in several ways. In particular, a variant ofthe algorithm is obtained that is shown to converge in probability to theoptimal Q function. Preliminary computer simulations are presented thatconfirm the validity of the approach. |