Timezone: »

Provably efficient RL with Rich Observations via Latent State Decoding
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford

Tue Jun 11 05:10 PM -- 05:15 PM (PDT) @ Room 103
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.

Author Information

Simon Du (Carnegie Mellon University)
Akshay Krishnamurthy (Microsoft Research)
Nan Jiang (University of Illinois at Urbana-Champaign)
Alekh Agarwal (Microsoft Research)
Miroslav Dudik (Microsoft Research)
John Langford (Microsoft Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors