Timezone: »
Oral
Provably efficient RL with Rich Observations via Latent State Decoding
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.
Author Information
Simon Du (Carnegie Mellon University)
Akshay Krishnamurthy (Microsoft Research)
Nan Jiang (University of Illinois at Urbana-Champaign)
Alekh Agarwal (Microsoft Research)
Miroslav Dudik (Microsoft Research)

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxent package for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.
John Langford (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #208
More from the Same Authors
-
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : Provably efficient exploration-free transfer RL for near-deterministic latent dynamics »
Yao Liu · Dipendra Misra · Miroslav Dudik · Robert Schapire -
2021 : A Spectral Approach to Off-Policy Evaluation for POMDPs »
Yash Nair · Nan Jiang -
2021 : Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 : Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions »
Audrey Huang · Nan Jiang -
2023 : Exposing Attention Glitches with Flip-Flop Language Modeling »
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang -
2023 : Exposing Attention Glitches with Flip-Flop Language Modeling »
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang -
2023 Workshop: Interactive Learning with Implicit Human Feedback »
Andi Peng · Akanksha Saran · Andreea Bobu · Tengyang Xie · Pierre-Yves Oudeyer · Anca Dragan · John Langford -
2023 Poster: Streaming Active Learning with Deep Neural Networks »
Akanksha Saran · Safoora Yousefi · Akshay Krishnamurthy · John Langford · Jordan Ash -
2023 Poster: Offline Learning in Markov Games with General Function Approximation »
Yuheng Zhang · Yu Bai · Nan Jiang -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation »
Philip Amortila · Nan Jiang · Csaba Szepesvari -
2023 Poster: Reinforcement Learning in Low-rank MDPs with Density Features »
Audrey Huang · Jinglin Chen · Nan Jiang -
2023 Tutorial: Discovering Agent-Centric Latent States in Theory and in Practice »
John Langford · Alex Lamb -
2023 Expo Talk Panel: Vowpal Wabbit: year in review and looking ahead in an LLM world »
John Langford · Byron Xu · Cheng Tan · Jack Gerrits · Lili Wu · Mark Rucker · Olga Vrousgou -
2022 Poster: Universal and data-adaptive algorithms for model selection in linear contextual bandits »
Vidya Muthukumar · Akshay Krishnamurthy -
2022 Poster: Adversarially Trained Actor Critic for Offline Reinforcement Learning »
Ching-An Cheng · Tengyang Xie · Nan Jiang · Alekh Agarwal -
2022 Spotlight: Universal and data-adaptive algorithms for model selection in linear contextual bandits »
Vidya Muthukumar · Akshay Krishnamurthy -
2022 Oral: Adversarially Trained Actor Critic for Offline Reinforcement Learning »
Ching-An Cheng · Tengyang Xie · Nan Jiang · Alekh Agarwal -
2022 Poster: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Oral: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2022 Poster: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 Spotlight: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 : Introduction »
John Langford -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Poster: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Spotlight: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Poster: Batch Value-function Approximation with Only Realizability »
Tengyang Xie · Nan Jiang -
2021 Poster: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Spotlight: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Spotlight: Batch Value-function Approximation with Only Realizability »
Tengyang Xie · Nan Jiang -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2021 Spotlight: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2021 Expo Workshop: Real World RL: Azure Personalizer & Vowpal Wabbit »
Sheetal Lahabar · Etienne Kintzler · Mark Rucker · Bogdan Mazoure · Qingyun Wu · Pavithra Srinath · Jack Gerrits · Olga Vrousgou · John Langford · Eduardo Salinas -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 : Representation learning and exploration in reinforcement learning - Akshay Krishnamurthy »
Akshay Krishnamurthy -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Workshop: Workshop on eXtreme Classification: Theory and Applications »
Anna Choromanska · John Langford · Maryam Majzoubi · Yashoteja Prabhu -
2020 Poster: Minimax Weight and Q-Function Learning for Off-Policy Evaluation »
Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2020 Poster: Doubly robust off-policy evaluation with shrinkage »
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik -
2020 Poster: Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning »
Dipendra Kumar Misra · Mikael Henaff · Akshay Krishnamurthy · John Langford -
2020 Poster: Reward-Free Exploration for Reinforcement Learning »
Chi Jin · Akshay Krishnamurthy · Max Simchowitz · Tiancheng Yu -
2020 Poster: Adaptive Estimator Selection for Off-Policy Evaluation »
Yi Su · Pavithra Srinath · Akshay Krishnamurthy -
2020 Poster: From Importance Sampling to Doubly Robust Policy Gradient »
Jiawei Huang · Nan Jiang -
2020 Poster: Private Reinforcement Learning with PAC and Regret Guarantees »
Giuseppe Vietri · Borja de Balle Pigem · Akshay Krishnamurthy · Steven Wu -
2019 : Miro Dudík (Microsoft Research) - Doubly Robust Off-policy Evaluation with Shrinkage »
Miroslav Dudik -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 : invited talk by John Langford (Microsoft Research): How do we make Real World Reinforcement Learning revolution? »
John Langford -
2019 Poster: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Poster: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Width Provably Matters in Optimization for Deep Linear Neural Networks »
Simon Du · Wei Hu -
2019 Poster: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Oral: Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments »
Kirthevasan Kandasamy · Willie Neiswanger · Reed Zhang · Akshay Krishnamurthy · Jeff Schneider · Barnabás Póczos -
2019 Poster: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Poster: Information-Theoretic Considerations in Batch Reinforcement Learning »
Jinglin Chen · Nan Jiang -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Information-Theoretic Considerations in Batch Reinforcement Learning »
Jinglin Chen · Nan Jiang -
2019 Oral: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2018 Poster: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Poster: Semiparametric Contextual Bandits »
Akshay Krishnamurthy · Steven Wu · Vasilis Syrgkanis -
2018 Poster: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: On the Power of Over-parametrization in Neural Networks with Quadratic Activation »
Simon Du · Jason Lee -
2018 Oral: Semiparametric Contextual Bandits »
Akshay Krishnamurthy · Steven Wu · Vasilis Syrgkanis -
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Poster: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms »
Yi Wu · Siddharth Srivastava · Nicholas Hay · Simon Du · Stuart Russell -
2018 Oral: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Poster: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2018 Oral: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2017 : Corralling a Band of Bandit Algorithms »
Alekh Agarwal -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford