Timezone: »
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold, else there are lower bounds exhibiting exponential error amplification (in the problem horizon) unless the data collection distribution has only a mild distribution shift relative to the target policy. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are. In particular, our methodology explores these ideas when using features from pre-trained neural networks, in the hope that these representations are powerful enough to permit sample efficient offline RL. Through extensive experiments on a range of tasks, we see that substantial error amplification does occur even when using such pre-trained representations (trained on the same task itself); we find offline RL is stable only under extremely mild distribution shift. The implications of these results, both from a theoretical and an empirical perspective, are that successful offline RL (where we seek to go beyond the low distribution shift regime) requires substantially stronger conditions beyond those which suffice for successful supervised learning.
Author Information
Ruosong Wang (Carnegie Mellon University)
Yifan Wu (Carnegie Mellon University)
Ruslan Salakhutdinov (Carnegie Mellen University)
Sham Kakade (University of Washington)
Sham Kakade is a Gordon McKay Professor of Computer Science and Statistics at Harvard University and a co-director of the recently announced Kempner Institute. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped in laying the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods, Conservative Policy Iteration, for reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award (2020), the IBM Pat Goldberg best paper award (in 2007), INFORMS Revenue Management and Pricing Prize (2014). He has been program chair for COLT 2011. Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. in computational neuroscience at the Gatsby Unit at University College London, under the supervision of Peter Dayan. He was a postdoc at the Dept. of Computer Science, University of Pennsylvania , where he broadened his studies to include computational game theory and economics from the guidance of Michael Kearns. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Instabilities of Offline RL with Pre-Trained Neural Representation »
Wed. Jul 21st 02:00 -- 02:05 PM Room
More from the Same Authors
-
2021 : Online Sub-Sampling for Reinforcement Learning with General Function Approximation »
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang -
2021 : A Short Note on the Relationship of Information Gain and Eluder Dimension »
Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2023 : Plan, Eliminate, and Track --- Language Models are Good Teachers for Embodied Agents. »
Yue Wu · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Yuanzhi Li · Tom Mitchell · Shrimai Prabhumoye -
2023 : SPRING: Studying Papers and Reasoning to play Games »
Yue Wu · Shrimai Prabhumoye · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Tom Mitchell · Yuanzhi Li -
2023 Poster: Graph Generative Model for Benchmarking Graph Neural Networks »
Minji Yoon · Yue Wu · John Palowitch · Bryan Perozzi · Ruslan Salakhutdinov -
2023 Poster: Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes »
Runlong Zhou · Ruosong Wang · Simon Du -
2022 Poster: Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs »
Tianwei Ni · Benjamin Eysenbach · Ruslan Salakhutdinov -
2022 Spotlight: Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs »
Tianwei Ni · Benjamin Eysenbach · Ruslan Salakhutdinov -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Towards Understanding and Mitigating Social Biases in Language Models »
Paul Liang · Chiyu Wu · LP Morency · Ruslan Salakhutdinov -
2021 Poster: Reasoning Over Virtual Knowledge Bases With Open Predicate Relations »
Haitian Sun · Patrick Verga · Bhuwan Dhingra · Ruslan Salakhutdinov · William Cohen -
2021 Spotlight: Reasoning Over Virtual Knowledge Bases With Open Predicate Relations »
Haitian Sun · Patrick Verga · Bhuwan Dhingra · Ruslan Salakhutdinov · William Cohen -
2021 Spotlight: Towards Understanding and Mitigating Social Biases in Language Models »
Paul Liang · Chiyu Wu · LP Morency · Ruslan Salakhutdinov -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Information Obfuscation of Graph Neural Networks »
Peiyuan Liao · Han Zhao · Keyulu Xu · Tommi Jaakkola · Geoff Gordon · Stefanie Jegelka · Ruslan Salakhutdinov -
2021 Poster: Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning »
Yue Wu · Shuangfei Zhai · Nitish Srivastava · Joshua M Susskind · Jian Zhang · Ruslan Salakhutdinov · Hanlin Goh -
2021 Poster: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2021 Spotlight: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2021 Spotlight: Information Obfuscation of Graph Neural Networks »
Peiyuan Liao · Han Zhao · Keyulu Xu · Tommi Jaakkola · Geoff Gordon · Stefanie Jegelka · Ruslan Salakhutdinov -
2021 Spotlight: Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning »
Yue Wu · Shuangfei Zhai · Nitish Srivastava · Joshua M Susskind · Jian Zhang · Ruslan Salakhutdinov · Hanlin Goh -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 : QA for invited talk 8 Kakade »
Sham Kakade -
2020 : Invited talk 8 Kakade »
Sham Kakade -
2020 Workshop: Workshop on Learning in Artificial Open Worlds »
Arthur Szlam · Katja Hofmann · Ruslan Salakhutdinov · Noboru Kuno · William Guss · Kavya Srinet · Brandon Houghton -
2020 Workshop: Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond »
Jian Tang · Le Song · Jure Leskovec · Renjie Liao · Yujia Li · Sanja Fidler · Richard Zemel · Ruslan Salakhutdinov -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 : Exploration, Policy Gradient Methods, and the Deadly Triad - Sham Kakade »
Sham Kakade -
2020 Poster: Soft Threshold Weight Reparameterization for Learnable Sparsity »
Aditya Kusupati · Vivek Ramanujan · Raghav Somani · Mitchell Wortsman · Prateek Jain · Sham Kakade · Ali Farhadi -
2020 Poster: Calibration, Entropy Rates, and Memory in Language Models »
Mark Braverman · Xinyi Chen · Sham Kakade · Karthik Narasimhan · Cyril Zhang · Yi Zhang -
2020 Poster: The Implicit and Explicit Regularization Effects of Dropout »
Colin Wei · Sham Kakade · Tengyu Ma -
2020 Poster: Provable Representation Learning for Imitation Learning via Bi-level Optimization »
Sanjeev Arora · Simon Du · Sham Kakade · Yuping Luo · Nikunj Umesh Saunshi -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang -
2020 Poster: Meta-learning for Mixed Linear Regression »
Weihao Kong · Raghav Somani · Zhao Song · Sham Kakade · Sewoong Oh -
2020 Test Of Time: Test of Time: Gaussian Process Optimization in the Bandit Settings: No Regret and Experimental Design »
Niranjan Srinivas · Andreas Krause · Sham Kakade · Matthias Seeger -
2019 : Keynote by Sham Kakade: Prediction, Learning, and Memory »
Sham Kakade -
2019 Poster: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Poster: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Oral: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Poster: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Poster: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Poster: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Talk: Opening Remarks »
Kamalika Chaudhuri · Ruslan Salakhutdinov -
2018 Poster: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Poster: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Oral: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Oral: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Poster: Structured Control Nets for Deep Reinforcement Learning »
Mario Srouji · Jian Zhang · Ruslan Salakhutdinov -
2018 Poster: Gated Path Planning Networks »
Lisa Lee · Emilio Parisotto · Devendra Singh Chaplot · Eric Xing · Ruslan Salakhutdinov -
2018 Oral: Structured Control Nets for Deep Reinforcement Learning »
Mario Srouji · Jian Zhang · Ruslan Salakhutdinov -
2018 Oral: Gated Path Planning Networks »
Lisa Lee · Emilio Parisotto · Devendra Singh Chaplot · Eric Xing · Ruslan Salakhutdinov -
2017 Workshop: Principled Approaches to Deep Learning »
Andrzej Pronobis · Robert Gens · Sham Kakade · Pedro Domingos -
2017 Poster: Toward Controlled Generation of Text »
Zhiting Hu · Zichao Yang · Xiaodan Liang · Ruslan Salakhutdinov · Eric Xing -
2017 Poster: Improved Variational Autoencoders for Text Modeling using Dilated Convolutions »
Zichao Yang · Zhiting Hu · Ruslan Salakhutdinov · Taylor Berg-Kirkpatrick -
2017 Talk: Improved Variational Autoencoders for Text Modeling using Dilated Convolutions »
Zichao Yang · Zhiting Hu · Ruslan Salakhutdinov · Taylor Berg-Kirkpatrick -
2017 Talk: Toward Controlled Generation of Text »
Zhiting Hu · Zichao Yang · Xiaodan Liang · Ruslan Salakhutdinov · Eric Xing -
2017 Poster: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan -
2017 Talk: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan