Timezone: »
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
Author Information
Rasool Fakoor (Amazon)
Jonas Mueller (Amazon Web Services)
Kavosh Asadi (Brown University)
Pratik Chaudhari (University of Pennsylvania)
Alex Smola (Amazon)
More from the Same Authors
-
2021 : Multimodal AutoML on Structured Tables with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alex Smola -
2021 : Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback »
Ishaan Shah · David Halpern · Michael L. Littman · Kavosh Asadi -
2022 : Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alex Smola -
2022 : Back to the Basics: Revisiting Out-of-Distribution Detection Baselines »
Johnson Kuan · Jonas Mueller -
2023 : The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold »
Jialin Mao · Han Kheng Teoh · Itay Griniasty · Rahul Ramesh · Rubing Yang · Mark Transtrum · James Sethna · Pratik Chaudhari -
2023 : Budgeting Counterfactual for Offline RL »
Yao Liu · Pratik Chaudhari · Rasool Fakoor -
2023 : How to Cope with Gradual Data Drift? »
Rasool Fakoor · Jonas Mueller · Zachary Lipton · Pratik Chaudhari · Alex Smola -
2023 : Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors »
Jesse Cummings · Jonas Mueller · Elías Snorrason -
2023 : Estimating label quality and errors in semantic segmentation data via any model »
Vedang Lad · Jonas Mueller -
2023 : Detecting Errors in Numerical Data via any Regression Model »
Hang Zhou · Jonas Mueller · Mayank Kumar · Jane-Ling Wang · Jing Lei -
2023 : ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data »
Ulyana Tkachenko · Aditya Thyagarajan · Jonas Mueller -
2023 Workshop: New Frontiers in Learning, Control, and Dynamical Systems »
Valentin De Bortoli · Charlotte Bunne · Guan-Horng Liu · Tianrong Chen · Maxim Raginsky · Pratik Chaudhari · Melanie Zeilinger · Animashree Anandkumar -
2023 Poster: The Value of Out-of-Distribution Data »
Ashwin De Silva · Rahul Ramesh · Carey Priebe · Pratik Chaudhari · Joshua Vogelstein -
2023 Poster: RLSbench: Domain Adaptation Under Relaxed Label Shift »
Saurabh Garg · Nick Erickson · University of California James Sharpnack · Alex Smola · Sivaraman Balakrishnan · Zachary Lipton -
2023 Poster: A Picture of the Space of Typical Learnable Tasks »
Rahul Ramesh · Jialin Mao · Itay Griniasty · Rubing Yang · Han Kheng Teoh · Mark Transtrum · James Sethna · Pratik Chaudhari -
2023 Poster: Flexible Model Aggregation for Quantile Regression »
Rasool Fakoor · Taesup Kim · Jonas Mueller · Alexander Smola · Ryan Tibshirani -
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 : Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors »
Jonas Mueller -
2022 Poster: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Spotlight: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Poster: Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition »
Haotao Wang · Aston Zhang · Yi Zhu · Shuai Zheng · Mu Li · Alex Smola · Zhangyang “Atlas” Wang -
2022 Poster: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2022 Oral: Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition »
Haotao Wang · Aston Zhang · Yi Zhu · Shuai Zheng · Mu Li · Alex Smola · Zhangyang “Atlas” Wang -
2022 Spotlight: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2021 : Q&A Contributed Talk »
Jonas Mueller -
2021 : Contributed Talk: Multimodal AutoML on Structured Tables with Text Fields »
Jonas Mueller -
2021 Poster: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2021 Poster: An Information-Geometric Distance on the Space of Tasks »
Yansong Gao · Pratik Chaudhari -
2021 Spotlight: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2021 Spotlight: An Information-Geometric Distance on the Space of Tasks »
Yansong Gao · Pratik Chaudhari -
2020 : Panel Discussion »
Neil Lawrence · Mihaela van der Schaar · Alex Smola · Valerio Perrone · Jack Parker-Holder · Zhengying Liu -
2020 : "AutoGluon and Distillation" by Alex Smola »
Alex Smola -
2020 : 1.2 AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data »
Jonas Mueller -
2020 Poster: Educating Text Autoencoders: Latent Representation Guidance via Denoising »
Tianxiao Shen · Jonas Mueller · Regina Barzilay · Tommi Jaakkola -
2020 Poster: A Free-Energy Principle for Representation Learning »
Yansong Gao · Pratik Chaudhari -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2019 Oral: Deep Factors for Forecasting »
Yuyang Wang · Alex Smola · Danielle Robinson · Jan Gasthaus · Dean Foster · Tim Januschowski -
2019 Tutorial: A Tutorial on Attention in Deep Learning »
Alex Smola · Aston Zhang -
2018 Poster: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2017 Poster: Canopy --- Fast Sampling with Cover Trees »
Manzil Zaheer · Satwik Kottur · Amr Ahmed · Jose Moura · Alex Smola -
2017 Talk: Canopy --- Fast Sampling with Cover Trees »
Manzil Zaheer · Satwik Kottur · Amr Ahmed · Jose Moura · Alex Smola -
2017 Poster: Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data »
Manzil Zaheer · Amr Ahmed · Alex Smola -
2017 Talk: Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data »
Manzil Zaheer · Amr Ahmed · Alex Smola -
2017 Tutorial: Distributed Deep Learning with MxNet Gluon »
Alex Smola · Aran Khanna