Timezone: »
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL), accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy, which may simply be the behavior policy. We show that offline RL algorithms that learn such calibrated value functions lead to effective online fine-tuning, enabling us to take the benefits of offline initializations in online fine-tuning. In practice, Cal-QL can be implemented on top of the conservative Q learning (CQL) for offline RL within a one-line code change. Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 fine-tuning benchmark tasks that we study in this paper.
Author Information
Mitsuhiko Nakamoto (University of California, Berkeley)
Yuexiang Zhai (UC Berkeley)
Anikait Singh (Stanford University)
Max Sobol Mark (Computer Science Department, Stanford University)
Yi Ma (UC Berkeley)
Chelsea Finn (Stanford)
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for learning reward functions underlying behavior, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
Aviral Kumar (Indian Institute of Technology Bombay)
Final year undergraduate student at IIT Bombay, India. Interning at Google Brain Toronto. Will join UC Berkeley as a Ph.D. student starting Fall 2018.
Sergey Levine (University of Washington)
More from the Same Authors
-
2020 : DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Visual Adversarial Imitation Learning using Variational Models »
Rafael Rafailov · Tianhe (Kevin) Yu · Aravind Rajeswaran · Chelsea Finn -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Explore and Control with Adversarial Surprise »
Arnaud Fickinger · Natasha Jaques · Samyak Parajuli · Michael Chang · Nicholas Rhinehart · Glen Berseth · Stuart Russell · Sergey Levine -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Visual Tasks with Sparse Rewards »
Rafael Rafailov · Varun Kumar · Tianhe (Kevin) Yu · Avi Singh · mariano phielipp · Chelsea Finn -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Policy Architectures for Compositional Generalization in Control »
Allan Zhou · Vikash Kumar · Chelsea Finn · Aravind Rajeswaran -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Robust Calibration with Multi-domain Temperature Scaling »
Yaodong Yu · Stephen Bates · Yi Ma · Michael Jordan -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Effective Offline RL Needs Going Beyond Pessimism: Representations and Distributional Shift »
Xinyang Geng · Kevin Li · Abhishek Gupta · Aviral Kumar · Sergey Levine -
2022 : DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning »
Annie Xie · Fahim Tajwar · Archit Sharma · Chelsea Finn -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2023 : Deep Neural Networks Extrapolate Cautiously (Most of the Time) »
Katie Kang · Amrith Setlur · Claire Tomlin · Sergey Levine -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : Offline Goal-Conditioned RL with Latent States as Actions »
Seohong Park · Dibya Ghosh · Benjamin Eysenbach · Sergey Levine -
2023 : Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware »
Tony Zhao · Vikash Kumar · Sergey Levine · Chelsea Finn -
2023 : SCAFF-PD: Communication Efficient Fair and Robust Federated Learning »
Yaodong Yu · Sai Praneeth Karimireddy · Yi Ma · Michael Jordan -
2023 : Training Diffusion Models with Reinforcement Learning »
Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine -
2023 : Training Diffusion Models with Reinforcement Learning »
Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine -
2023 : Video-Guided Skill Discovery »
Manan Tomar · Dibya Ghosh · Vivek Myers · Anca Dragan · Matthew Taylor · Philip Bachman · Sergey Levine -
2023 : Direct Preference Optimization: Your Language Model is Secretly a Reward Model »
Rafael Rafailov · Archit Sharma · Eric Mitchell · Stefano Ermon · Christopher Manning · Chelsea Finn -
2023 : Training Diffusion Models with Reinforcement Learning »
Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine -
2023 : Keynote I: Detecting and Adapting to Distribution Shift »
Chelsea Finn -
2023 Poster: Jump-Start Reinforcement Learning »
Ikechukwu Uchendu · Ted Xiao · Yao Lu · Banghua Zhu · Mengyuan Yan · Joséphine Simon · Matthew Bennice · Chuyuan Fu · Cong Ma · Jiantao Jiao · Sergey Levine · Karol Hausman -
2023 Poster: A Connection between One-Step RL and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Sergey Levine · Ruslan Salakhutdinov -
2023 Poster: Adversarial Policies Beat Superhuman Go AIs »
Tony Wang · Adam Gleave · Tom Tseng · Kellin Pelrine · Nora Belrose · Joseph Miller · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Sergey Levine · Stuart Russell -
2023 Poster: Predictable MDP Abstraction for Unsupervised Model-Based RL »
Seohong Park · Sergey Levine -
2023 Oral: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature »
Eric Mitchell · Yoonho Lee · Alexander Khazatsky · Christopher Manning · Chelsea Finn -
2023 Oral: Adversarial Policies Beat Superhuman Go AIs »
Tony Wang · Adam Gleave · Tom Tseng · Kellin Pelrine · Nora Belrose · Joseph Miller · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Sergey Levine · Stuart Russell -
2023 Poster: Reinforcement Learning from Passive Data via Latent Intentions »
Dibya Ghosh · Chethan Bhateja · Sergey Levine -
2023 Poster: Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning »
Evan Liu · Sahaana Suri · Tong Mu · Allan Zhou · Chelsea Finn -
2023 Poster: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature »
Eric Mitchell · Yoonho Lee · Alexander Khazatsky · Christopher Manning · Chelsea Finn -
2023 Poster: Understanding the Complexity Gains of Single-Task RL with a Curriculum »
Qiyang Li · Yuexiang Zhai · Yi Ma · Sergey Levine -
2023 Oral: Reinforcement Learning from Passive Data via Latent Intentions »
Dibya Ghosh · Chethan Bhateja · Sergey Levine -
2023 Poster: PaLM-E: An Embodied Multimodal Language Model »
Danny Driess · Fei Xia · Mehdi S. M. Sajjadi · Corey Lynch · Aakanksha Chowdhery · Brian Ichter · Ayzaan Wahid · Jonathan Tompson · Quan Vuong · Tianhe (Kevin) Yu · Wenlong Huang · Yevgen Chebotar · Pierre Sermanet · Daniel Duckworth · Sergey Levine · Vincent Vanhoucke · Karol Hausman · Marc Toussaint · Klaus Greff · Andy Zeng · Igor Mordatch · Pete Florence -
2023 Poster: Efficient Online Reinforcement Learning with Offline Data »
Philip Ball · Laura Smith · Ilya Kostrikov · Sergey Levine -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 : Panel discussion »
Steffen Schneider · Aleksander Madry · Alexei Efros · Chelsea Finn · Soheil Feizi -
2022 : Q/A: Chelsea Finn »
Chelsea Finn -
2022 : Invited Speaker: Chelsea Finn »
Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Invited Talk 3: Chelsea Finn »
Chelsea Finn -
2022 Poster: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Poster: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Poster: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Spotlight: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Spotlight: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Poster: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Poster: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: Predicting Out-of-Distribution Error with the Projection Norm »
Yaodong Yu · Zitong Yang · Alexander Wei · Yi Ma · Jacob Steinhardt -
2022 Spotlight: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Poster: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2022 Poster: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Oral: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Spotlight: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Poster: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Spotlight: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Oral: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Poster: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Spotlight: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Poster: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Oral: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Poster: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2021 Spotlight: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2020 : Invited Talk 11: Prof. Chelsea Finn from Stanford University »
Chelsea Finn -
2020 Poster: Goal-Aware Prediction: Learning to Model What Matters »
Suraj Nair · Silvio Savarese · Chelsea Finn -
2020 Poster: On the Expressivity of Neural Networks for Deep Reinforcement Learning »
Kefan Dong · Yuping Luo · Tianhe (Kevin) Yu · Chelsea Finn · Tengyu Ma -
2020 Poster: Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings »
Jesse Zhang · Brian Cheung · Chelsea Finn · Sergey Levine · Dinesh Jayaraman -
2020 Poster: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks »
Zitong Yang · Yaodong Yu · Chong You · Jacob Steinhardt · Yi Ma -
2020 Poster: Deep Isometric Learning for Visual Recognition »
Haozhi Qi · Chong You · Xiaolong Wang · Yi Ma · Jitendra Malik