Timezone: »

Boosted Fitted Q-Iteration
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli

Sun Aug 06 06:24 PM -- 06:42 PM (PDT) @ C4.5

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, B-FQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached, regression problems become easier as iterations proceed. We study B-FQI both theoretically, providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.

Author Information

Samuele Tosatto (Politecnico di Milano)

Samuele Tosatto joined the Institute for Intelligent Autonomous Systems (IAS) at TU Darmstadt in May 2017 as a Ph.D. student. Samuele received his bachelor degree as well as his master degree in software engineering from Polytechnic University of Milan. During his studies he focused on machine learning and more in particular in reinforcement learning. His thesis, entitled “Boosted Fitted Q-Iteration", was written under the supervision of Prof. Marcello Restelli PhD Matteo Pirotta and Ing Carlo D'Eramo.

Matteo Pirotta (SequeL - Inria Lille - Nord Europe)
Carlo D'Eramo (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors