Timezone: »

Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning
Angelo Damiani · Giorgio Manganini · Alberto Maria Metelli · Marcello Restelli

Thu Jul 21 08:05 AM -- 08:10 AM (PDT) @ Hall G

We propose a novel formulation for the Inverse Reinforcement Learning (IRL) problem, which jointly accounts for the compatibility with the expert behavior of the identified reward and its effectiveness for the subsequent forward learning phase. Albeit quite natural, especially when the final goal is apprenticeship learning (learning policies from an expert), this aspect has been completely overlooked by IRL approaches so far.We propose a new model-free IRL method that is remarkably able to autonomously find a trade-off between the error induced on the learned policy when potentially choosing a sub-optimal reward, and the estimation error caused by using finite samples in the forward learning phase, which can be controlled by explicitly optimizing also the discount factor of the related learning problem. The approach is based on a min-max formulation for the robust selection of the reward parameters and the discount factor so that the distance between the expert's policy and the learned policy is minimized in the successive forward learning task when a finite and possibly small number of samples is available.Differently from the majority of other IRL techniques, our approach does not involve any planning or forward Reinforcement Learning problems to be solved. After presenting the formulation, we provide a numerical scheme for the optimization, and we show its effectiveness on an illustrative numerical case.

Author Information

Angelo Damiani (Gran Sasso Science Institute)
Giorgio Manganini (Gran Sasso Science Institute)

Giorgio Manganini is an Assistant Professor of Computer Science with Gran Sasso Science Institute, and previously a Senior Research Scientist and Principal Investigator with United Technologies Research Centre Ireland, in the Control and Decision Support Group. He received his MSc and PhD with Merit in Information Technology - Systems and Control at Politecnico di Milano, Italy, in 2012 and 2015. His main research interests revolve around Machine Learning, Optimization and Control, including Reinforcement Learning (off-policy, inverse, and multi-agent), advanced control algorithms, stochastic and randomized optimization. Applications encompass smart energy systems, buildings and grids, and lately aerospace. He is an author and reviewer in international journals and conference proceedings.

Alberto Maria Metelli (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors