Timezone: »

 
Provably efficient exploration-free transfer RL for near-deterministic latent dynamics
Yao Liu · Dipendra Misra · Miro Dudik · Robert Schapire

Sample complexity and robustness are critical for applying reinforcement learning (RL) algorithms in real-world applications. We study the sample saving opportunities via transferring experience when the source domain is implemented by a simulator. Many real-world domains are well approximated by rich-observation models'' where the agent receives a high-dimensionalrich'' observation, which is however emitted from a compact latent state space. For such problems, designing simulators that can accurately model the emission process of the observations is challenging. In this paper, we address these issues by considering learning from abstract simulators that only model the latent state space and a deterministic approximation of the latent transition dynamics. We present a transfer RL algorithm POTAS that learns a policy robust to perturbation in the target domain, with a sample complexity that is independent of the size of state space (exploration-free), by leveraging an abstract simulator. We also present lower bounds showing that without the near-deterministic assumption, one cannot learn a robust policy from abstract simulators and also avoid dependence on the state space.

Author Information

Yao Liu (Stanford University)
Dipendra Misra (Microsoft)
Miro Dudik (Microsoft Research)
Miro Dudik

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxentpackage for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.

Robert Schapire (Microsoft Research)

More from the Same Authors