Timezone: »
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling, we formulate a Hindsight Observable Markov Decision Process (HOMDP) as a POMDP where the latent states are revealed to the learner in hindsight and only during training. We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities.
Author Information
Jonathan Lee (Stanford University)
Alekh Agarwal (Microsoft Research)
Christoph Dann (Google)
Tong Zhang (HKUST)

Tong Zhang is a professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research interests are machine learning, big data and their applications. He obtained a BA in Mathematics and Computer Science from Cornell University, and a PhD in Computer Science from Stanford University. Before joining HKUST, Tong Zhang was a professor at Rutgers University, and worked previously at IBM, Yahoo as research scientists, Baidu as the director of Big Data Lab, and Tencent as the founding director of AI Lab. Tong Zhang was an ASA fellow and IMS fellow, and has served as the chair or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has served as associate editors in top machine learning journals such as PAMI, JMLR, and Machine Learning Journal.
More from the Same Authors
-
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Efficient Exploration by HyperDQN in Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Hao Liang · Tong Zhang -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: Stochastic Gradient Succeeds for Bandits »
Jincheng Mei · Zixin Zhong · Bo Dai · Alekh Agarwal · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? »
Rui Yang · Yong LIN · Xiaoteng Ma · Hao Hu · Chongjie Zhang · Tong Zhang -
2023 Poster: Generalized Polyak Step Size for First Order Optimization with Momentum »
Xiaoyu Wang · Mikael Johansson · Tong Zhang -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: On the Convergence of Federated Averaging with Cyclic Client Participation »
Yae Jee Cho · PRANAY SHARMA · Gauri Joshi · Zheng Xu · Satyen Kale · Tong Zhang -
2023 Oral: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Weakly Supervised Disentangled Generative Causal Representation Learning »
Xinwei Shen · Furui Liu · Hanze Dong · Qing Lian · Zhitang Chen · Tong Zhang -
2023 Poster: Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes »
Chenlu Ye · Wei Xiong · Quanquan Gu · Tong Zhang -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Adversarially Trained Actor Critic for Offline Reinforcement Learning »
Ching-An Cheng · Tengyang Xie · Nan Jiang · Alekh Agarwal -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Oral: Adversarially Trained Actor Critic for Offline Reinforcement Learning »
Ching-An Cheng · Tengyang Xie · Nan Jiang · Alekh Agarwal -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Spotlight: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Poster: A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization »
Renzhe Xu · Xingxuan Zhang · Zheyan Shen · Tong Zhang · Peng Cui -
2022 Poster: Sparse Invariant Risk Minimization »
Xiao Zhou · Yong LIN · Weizhong Zhang · Tong Zhang -
2022 Poster: Model Agnostic Sample Reweighting for Out-of-Distribution Learning »
Xiao Zhou · Yong LIN · Renjie Pi · Weizhong Zhang · Renzhe Xu · Peng Cui · Tong Zhang -
2022 Poster: Probabilistic Bilevel Coreset Selection »
Xiao Zhou · Renjie Pi · Weizhong Zhang · Yong LIN · Zonghao Chen · Tong Zhang -
2022 Spotlight: A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization »
Renzhe Xu · Xingxuan Zhang · Zheyan Shen · Tong Zhang · Peng Cui -
2022 Spotlight: Probabilistic Bilevel Coreset Selection »
Xiao Zhou · Renjie Pi · Weizhong Zhang · Yong LIN · Zonghao Chen · Tong Zhang -
2022 Spotlight: Model Agnostic Sample Reweighting for Out-of-Distribution Learning »
Xiao Zhou · Yong LIN · Renjie Pi · Weizhong Zhang · Renzhe Xu · Peng Cui · Tong Zhang -
2022 Spotlight: Sparse Invariant Risk Minimization »
Xiao Zhou · Yong LIN · Weizhong Zhang · Tong Zhang -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 Poster: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization »
Rie Johnson · Tong Zhang -
2019 Poster: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Poster: $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression »
Hanlin Tang · Chen Yu · Xiangru Lian · Tong Zhang · Ji Liu -
2019 Oral: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression »
Hanlin Tang · Chen Yu · Xiangru Lian · Tong Zhang · Ji Liu -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2019 Tutorial: Causal Inference and Stable Learning »
Tong Zhang · Peng Cui -
2018 Poster: An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method »
Li Shen · Peng Sun · Yitong Wang · Wei Liu · Tong Zhang -
2018 Poster: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Poster: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Poster: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method »
Li Shen · Peng Sun · Yitong Wang · Wei Liu · Tong Zhang -
2018 Oral: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Graphical Nonconvex Optimization via an Adaptive Convex Relaxation »
Qiang Sun · Kean Ming Tan · Han Liu · Tong Zhang -
2018 Poster: Composite Functional Gradient Learning of Generative Adversarial Models »
Rie Johnson · Tong Zhang -
2018 Poster: Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization »
Jiaxiang Wu · Weidong Huang · Junzhou Huang · Tong Zhang -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Graphical Nonconvex Optimization via an Adaptive Convex Relaxation »
Qiang Sun · Kean Ming Tan · Han Liu · Tong Zhang -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Composite Functional Gradient Learning of Generative Adversarial Models »
Rie Johnson · Tong Zhang -
2018 Oral: Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization »
Jiaxiang Wu · Weidong Huang · Junzhou Huang · Tong Zhang -
2018 Poster: Safe Element Screening for Submodular Function Minimization »
Weizhong Zhang · Bin Hong · Lin Ma · Wei Liu · Tong Zhang -
2018 Poster: End-to-end Active Object Tracking via Reinforcement Learning »
Wenhan Luo · Peng Sun · Fangwei Zhong · Wei Liu · Tong Zhang · Yizhou Wang -
2018 Oral: End-to-end Active Object Tracking via Reinforcement Learning »
Wenhan Luo · Peng Sun · Fangwei Zhong · Wei Liu · Tong Zhang · Yizhou Wang -
2018 Oral: Safe Element Screening for Submodular Function Minimization »
Weizhong Zhang · Bin Hong · Lin Ma · Wei Liu · Tong Zhang -
2017 : Corralling a Band of Bandit Algorithms »
Alekh Agarwal -
2017 Poster: Projection-free Distributed Online Learning in Networks »
Wenpeng Zhang · Peilin Zhao · Wenwu Zhu · Steven Hoi · Tong Zhang -
2017 Talk: Projection-free Distributed Online Learning in Networks »
Wenpeng Zhang · Peilin Zhao · Wenwu Zhu · Steven Hoi · Tong Zhang -
2017 Poster: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford