Timezone: »
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.
Author Information
Kefan Dong (Tsinghua University)
Yuping Luo (Princeton University)
Tianhe (Kevin) Yu (Stanford University)
Chelsea Finn (Stanford)
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for learning reward functions underlying behavior, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
Tengyu Ma (Stanford)
More from the Same Authors
-
2020 : MOPO: Model-based Offline Policy Optimization »
Tianhe (Kevin) Yu -
2021 : Label Noise SGD Provably Prefers Flat Global Minimizers »
Alex Damian · Tengyu Ma · Jason Lee -
2021 : Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature »
Kefan Dong · Jiaqi Yang · Tengyu Ma -
2021 : Model-based Offline Reinforcement Learning with Local Misspecification »
Kefan Dong · Ramtin Keramati · Emma Brunskill -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Visual Adversarial Imitation Learning using Variational Models »
Rafael Rafailov · Tianhe (Kevin) Yu · Aravind Rajeswaran · Chelsea Finn -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations »
Yuping Luo · Tengyu Ma -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Visual Tasks with Sparse Rewards »
Rafael Rafailov · Varun Kumar · Tianhe (Kevin) Yu · Avi Singh · mariano phielipp · Chelsea Finn -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Policy Architectures for Compositional Generalization in Control »
Allan Zhou · Vikash Kumar · Chelsea Finn · Aravind Rajeswaran -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning »
Annie Xie · Fahim Tajwar · Archit Sharma · Chelsea Finn -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2023 Poster: PaLM-E: An Embodied Multimodal Language Model »
Danny Driess · Pete Florence · Klaus Greff · Marc Toussaint · Igor Mordatch · Andy Zeng · Vincent Vanhoucke · Mehdi S. M. Sajjadi · Corey Lynch · Ayzaan Wahid · brian ichter · Fei Xia · Pierre Sermanet · Yevgen Chebotar · Jonathan Tompson · Wenlong Huang · Sergey Levine · Tianhe (Kevin) Yu · Karol Hausman · Quan Vuong · Aakanksha Chowdhery · Daniel Duckworth -
2023 Poster: Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning »
Evan Liu · Sahaana Suri · Tong Mu · Allan Zhou · Chelsea Finn -
2023 Poster: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature »
Eric Mitchell · Yoonho Lee · Alexander Khazatsky · Christopher Manning · Chelsea Finn -
2023 Oral: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature »
Eric Mitchell · Yoonho Lee · Alexander Khazatsky · Christopher Manning · Chelsea Finn -
2023 Tutorial: Recent Advances in the Generalization Theory of Neural Networks * »
Tengyu Ma · Alex Damian -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 : Panel discussion »
Steffen Schneider · Aleksander Madry · Alexei Efros · Chelsea Finn · Soheil Feizi -
2022 : Q/A: Chelsea Finn »
Chelsea Finn -
2022 : Invited Speaker: Chelsea Finn »
Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Invited Talk 3: Chelsea Finn »
Chelsea Finn -
2022 Poster: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Poster: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Poster: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Poster: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Poster: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Spotlight: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Poster: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Poster: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2022 Poster: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Oral: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Spotlight: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Poster: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Spotlight: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Oral: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Poster: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Spotlight: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Poster: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Oral: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Poster: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2021 Spotlight: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2020 : Invited Talk 11: Prof. Chelsea Finn from Stanford University »
Chelsea Finn -
2020 Poster: Goal-Aware Prediction: Learning to Model What Matters »
Suraj Nair · Silvio Savarese · Chelsea Finn -
2020 Poster: The Implicit and Explicit Regularization Effects of Dropout »
Colin Wei · Sham Kakade · Tengyu Ma -
2020 Poster: Provable Representation Learning for Imitation Learning via Bi-level Optimization »
Sanjeev Arora · Simon Du · Sham Kakade · Yuping Luo · Nikunj Umesh Saunshi -
2020 Poster: Individual Calibration with Randomized Forecasting »
Shengjia Zhao · Tengyu Ma · Stefano Ermon -
2020 Poster: Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings »
Jesse Zhang · Brian Cheung · Chelsea Finn · Sergey Levine · Dinesh Jayaraman -
2020 Poster: Understanding Self-Training for Gradual Domain Adaptation »
Ananya Kumar · Tengyu Ma · Percy Liang