Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Provably efficient exploration-free transfer RL for near-deterministic latent dynamics

Yao Liu · Dipendra Misra · Miroslav Dudik · Robert Schapire


Sample complexity and robustness are critical for applying reinforcement learning (RL) algorithms in real-world applications. We study the sample saving opportunities via transferring experience when the source domain is implemented by a simulator. Many real-world domains are well approximated by rich-observation models'' where the agent receives a high-dimensionalrich'' observation, which is however emitted from a compact latent state space. For such problems, designing simulators that can accurately model the emission process of the observations is challenging. In this paper, we address these issues by considering learning from abstract simulators that only model the latent state space and a deterministic approximation of the latent transition dynamics. We present a transfer RL algorithm POTAS that learns a policy robust to perturbation in the target domain, with a sample complexity that is independent of the size of state space (exploration-free), by leveraging an abstract simulator. We also present lower bounds showing that without the near-deterministic assumption, one cannot learn a robust policy from abstract simulators and also avoid dependence on the state space.

Chat is not available.