Timezone: »
Reinforcement learning algorithms are typically designed to learn a performant policy that can repeatedly and autonomously complete a task, typically starting from scratch. However, many real-world situations operate under a different set of assumptions: the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once, ideally as quickly as possible, and while leveraging some prior knowledge or experience. For example, imagine a robot that is exploring another planet, where it cannot get help or supervision from humans. If it needs to navigate to a crater that it has never seen before in search of water, it does not really need to acquire a policy for reaching craters reliably, it only needs to reach this particular crater once. It must do so without the benefit of episodic resets and tackle a new, unknown terrain, but it can leverage prior experience it acquired on Earth. We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task once while contending with some form of novelty in a single trial without interventions, given some prior data. In this setting, we find that algorithms designed for standard episodic reinforcement learning can struggle, as they have trouble recovering from novel states especially when informative rewards are not provided. Motivated by this observation, we also propose an algorithm, $Q$-weighted adversarial learning (QWALE), that addresses the dearth of supervision by employing a distribution matching strategy that leverages the agent's prior experience as guidance in novel situations. Our experiments on several single-life continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel, out-of-distribution states.
Author Information
Annie Chen (Stanford University)
Archit Sharma (Stanford University)
Sergey Levine (University of Washington)
Chelsea Finn (Stanford)
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for learning reward functions underlying behavior, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
More from the Same Authors
-
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Visual Adversarial Imitation Learning using Variational Models »
Rafael Rafailov · Tianhe (Kevin) Yu · Aravind Rajeswaran · Chelsea Finn -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Explore and Control with Adversarial Surprise »
Arnaud Fickinger · Natasha Jaques · Samyak Parajuli · Michael Chang · Nicholas Rhinehart · Glen Berseth · Stuart Russell · Sergey Levine -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Visual Tasks with Sparse Rewards »
Rafael Rafailov · Varun Kumar · Tianhe (Kevin) Yu · Avi Singh · mariano phielipp · Chelsea Finn -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Policy Architectures for Compositional Generalization in Control »
Allan Zhou · Vikash Kumar · Chelsea Finn · Aravind Rajeswaran -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Effective Offline RL Needs Going Beyond Pessimism: Representations and Distributional Shift »
Xinyang Geng · Kevin Li · Abhishek Gupta · Aviral Kumar · Sergey Levine -
2022 : DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning »
Annie Xie · Fahim Tajwar · Archit Sharma · Chelsea Finn -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Diversify and Disambiguate: Learning from Underspecified Data »
Yoonho Lee · Huaxiu Yao · Chelsea Finn -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 : Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models »
Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 : Panel discussion »
Steffen Schneider · Aleksander Madry · Alexei Efros · Chelsea Finn · Soheil Feizi -
2022 : Q/A: Chelsea Finn »
Chelsea Finn -
2022 : Invited Speaker: Chelsea Finn »
Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : Invited Talk 3: Chelsea Finn »
Chelsea Finn -
2022 Poster: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Poster: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Poster: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Spotlight: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Spotlight: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: Memory-Based Model Editing at Scale »
Eric Mitchell · Charles Lin · Antoine Bosselut · Christopher Manning · Chelsea Finn -
2022 Poster: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Spotlight: Improving Out-of-Distribution Robustness via Selective Augmentation »
Huaxiu Yao · Yu Wang · Sai Li · Linjun Zhang · Weixin Liang · James Zou · Chelsea Finn -
2022 Poster: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2022 Poster: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Oral: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Spotlight: A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning »
Archit Sharma · Rehaan Ahmad · Chelsea Finn -
2021 : Live Panel Discussion »
Thomas Dietterich · Chelsea Finn · Kamalika Chaudhuri · Yarin Gal · Uri Shalit -
2021 Poster: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Poster: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Spotlight: Offline Meta-Reinforcement Learning with Advantage Weighting »
Eric Mitchell · Rafael Rafailov · Xue Bin Peng · Sergey Levine · Chelsea Finn -
2021 Oral: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Poster: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Spotlight: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Poster: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Poster: Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning »
Jongwook Choi · Archit Sharma · Honglak Lee · Sergey Levine · Shixiang Gu -
2021 Spotlight: Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning »
Jongwook Choi · Archit Sharma · Honglak Lee · Sergey Levine · Shixiang Gu -
2021 Oral: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Poster: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2021 Spotlight: Deep Reinforcement Learning amidst Continual Structured Non-Stationarity »
Annie Xie · James Harrison · Chelsea Finn -
2020 : Invited Talk 11: Prof. Chelsea Finn from Stanford University »
Chelsea Finn -
2020 Poster: Goal-Aware Prediction: Learning to Model What Matters »
Suraj Nair · Silvio Savarese · Chelsea Finn -
2020 Poster: On the Expressivity of Neural Networks for Deep Reinforcement Learning »
Kefan Dong · Yuping Luo · Tianhe (Kevin) Yu · Chelsea Finn · Tengyu Ma -
2020 Poster: Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings »
Jesse Zhang · Brian Cheung · Chelsea Finn · Sergey Levine · Dinesh Jayaraman