Timezone: »

 
Poster
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee · Alekh Agarwal · Christoph Dann · Tong Zhang

Thu Jul 27 04:30 PM -- 06:00 PM (PDT) @ Exhibit Hall 1 #429

POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling, we formulate a Hindsight Observable Markov Decision Process (HOMDP) as a POMDP where the latent states are revealed to the learner in hindsight and only during training. We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities.

Author Information

Jonathan Lee (Stanford University)
Alekh Agarwal (Microsoft Research)
Christoph Dann (Google)
Tong Zhang (HKUST)
Tong Zhang

Tong Zhang is a professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research interests are machine learning, big data and their applications. He obtained a BA in Mathematics and Computer Science from Cornell University, and a PhD in Computer Science from Stanford University. Before joining HKUST, Tong Zhang was a professor at Rutgers University, and worked previously at IBM, Yahoo as research scientists, Baidu as the director of Big Data Lab, and Tencent as the founding director of AI Lab. Tong Zhang was an ASA fellow and IMS fellow, and has served as the chair or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has served as associate editors in top machine learning journals such as PAMI, JMLR, and Machine Learning Journal.

More from the Same Authors