Timezone: »
We consider the transfer of experience samples in reinforcement learning. Most of the previous works in this context focused on value-based settings, where transferring instances conveniently reduces to the transfer of (s,a,s',r) tuples. In this paper, we consider the more complex case of reusing samples in policy search methods, in which the agent is required to transfer entire trajectories between environments with different transition models. By leveraging ideas from multiple importance sampling, we propose robust gradient estimators that effectively achieve this goal, along with several techniques to reduce their variance. In the case where the transition models are known, we theoretically establish the robustness to the negative transfer for our estimators. In the case of unknown models, we propose a method to efficiently estimate them when the target task belongs to a finite set of possible tasks and when it belongs to some reproducing kernel Hilbert space. We provide empirical results to show the effectiveness of our estimators.
Author Information
Andrea Tirinzoni (Politecnico di Milano)
Mattia Salvini (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Transfer of Samples in Policy Search via Multiple Importance Sampling »
Thu Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2020 Poster: Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning »
Alberto Maria Metelli · Flavio Mazzolini · Lorenzo Bisi · Luca Sabbioni · Marcello Restelli -
2020 Poster: Sequential Transfer in Reinforcement Learning with a Generative Model »
Andrea Tirinzoni · Riccardo Poiani · Marcello Restelli -
2019 Poster: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Oral: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Poster: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2019 Oral: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2018 Poster: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Poster: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Oral: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Oral: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Oral: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2017 Poster: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli -
2017 Talk: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli