Timezone: »

Meta Learning MDPs with linear transition models
Robert Müller · Aldo Pacchiano · Jack Parker-Holder

We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. Under a task sharedness metric based on model proximity, we propose an algorithm that can meaningfully leverage learning in a set of sampled training tasks to quickly adapt to test tasks sampled from the same task distribution. We propose a biased version of the UC-MatrixRL algorithm~\cite{yang2019reinforcement}. The analysis leverages and extends results in the learning to learn linear regression and linear bandit setting to the more general case of MDP's with linear transition models. We study the effect of the bias on single task regret and expected regret over the task distribution. We prove that our algorithm provides significant improvements in the transfer regret for task distributions of low variance and high bias compared to learning the tasks in isolation. We outline and analyse two approaches to learn the bias.

Author Information

Robert Müller (TUM)
Aldo Pacchiano (UC Berkeley)
Jack Parker-Holder (University of Oxford)

More from the Same Authors