Timezone: »

Provably efficient RL with Rich Observations via Latent State Decoding
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford

Tue Jun 11 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #208
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.

Author Information

Simon Du (Carnegie Mellon University)
Akshay Krishnamurthy (Microsoft Research)
Nan Jiang (University of Illinois at Urbana-Champaign)
Alekh Agarwal (Microsoft Research)
Miroslav Dudik (Microsoft Research)
Miroslav Dudik

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxent package for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.

John Langford (Microsoft Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors