Timezone: »
For machine learning systems to be reliable, we must understand their performance in unseen, out- of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of- distribution performance on variants of CIFAR- 10 & ImageNet, a synthetic pose estimation task derived from YCB objects, FMoW-WILDS satellite imagery classification, and wildlife classification in iWildCam-WILDS. The correlation holds across model architectures, hyperparameters, training set size, and training duration, and is more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.
Author Information
John Miller (University of California, Berkeley)
Rohan Taori (Stanford University)
Aditi Raghunathan (Stanford)
Shiori Sagawa (Stanford University)
Pang Wei Koh (Stanford University)
Vaishaal Shankar (UC Berkeley)
Percy Liang (Stanford University)
Yair Carmon (Tel Aviv University)
Ludwig Schmidt (Toyota Research Institute)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization »
Thu. Jul 22nd 04:00 -- 06:00 AM Room
More from the Same Authors
-
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 : LinkBERT: Language Model Pretraining with Document Link Knowledge »
Michihiro Yasunaga · Jure Leskovec · Percy Liang -
2022 : How well do contrastively trained models transfer? »
M. Moein Shariatnia · Rahim Entezari · Mitchell Wortsman · Olga Saukh · Ludwig Schmidt -
2022 : On the Connection between Pre-training Data Diversity and Robustness »
Vivek Ramanujan · Vivek Ramanujan · Thao Nguyen · Thao Nguyen · Ludwig Schmidt · Ali Farhadi · Ali Farhadi -
2023 : Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift »
Saurabh Garg · Amrith Setlur · Zachary Lipton · Sivaraman Balakrishnan · Virginia Smith · Aditi Raghunathan -
2023 : Why is SAM Robust to Label Noise? »
Christina Baek · Zico Kolter · Aditi Raghunathan -
2023 : Sharpness-Aware Minimization Enhances Feature Diversity »
Jacob Mitchell Springer · Vaishnavh Nagarajan · Aditi Raghunathan -
2023 : DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining »
Sang Michael Xie · Hieu Pham · Xuanyi Dong · Nan Du · Hanxiao Liu · Yifeng Lu · Percy Liang · Quoc Le · Tengyu Ma · Adams Wei Yu -
2023 : TMARS: Improving Visual Representations by Circumventing Text Feature Learning »
Pratyush Maini · Sachin Goyal · Zachary Lipton · Zico Kolter · Aditi Raghunathan -
2023 : Improving multimodal datasets with image captioning »
Thao Nguyen · · Gabriel Ilharco · Sewoong Oh · Ludwig Schmidt -
2023 : Retrieval-Augmented Multimodal Language Modeling »
Michihiro Yasunaga · Armen Aghajanyan · Weijia Shi · Rich James · Jure Leskovec · Percy Liang · Mike Lewis · Luke Zettlemoyer · Wen-tau Yih -
2023 : Lexinvariant Language Models »
Qian Huang · Eric Zelikman · Sarah Chen · Yuhuai Wu · Greg Valiant · Percy Liang -
2023 : PRODIGY: Enabling In-context Learning Over Graphs »
Qian Huang · Hongyu Ren · Peng Chen · Gregor Kržmanc · Daniel Zeng · Percy Liang · Jure Leskovec -
2023 : Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training »
Hong Liu · Zhiyuan Li · David Hall · Percy Liang · Tengyu Ma -
2023 Workshop: ES-FoMo: Efficient Systems for Foundation Models »
Julien Launay · Daniel Y Fu · Tri Dao · Daniel Hesslow · Beidi Chen · Azalia Mirhoseini · Percy Liang -
2023 : Aditi Raghunathan »
Aditi Raghunathan -
2023 Poster: Data Feedback Loops: Model-driven Amplification of Dataset Biases »
Rohan Taori · Tatsunori Hashimoto -
2023 Poster: DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule »
Maor Ivgi · Oliver Hinder · Yair Carmon -
2023 Poster: Contextual Reliability: When Different Features Matter in Different Contexts »
Gaurav Ghosal · Amrith Setlur · Daniel S Brown · Anca Dragan · Aditi Raghunathan -
2023 Poster: Robustness in Multimodal Learning under Train-Test Modality Mismatch »
Brandon McKinzie · Vaishaal Shankar · Joseph Cheng · Yinfei Yang · Jonathon Shlens · Alexander Toshev -
2023 Poster: Whose Opinions Do Language Models Reflect? »
Shibani Santurkar · Esin Durmus · Faisal Ladhak · Cinoo Lee · Percy Liang · Tatsunori Hashimoto -
2023 Poster: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Oral: Data Feedback Loops: Model-driven Amplification of Dataset Biases »
Rohan Taori · Tatsunori Hashimoto -
2023 Oral: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Oral: Whose Opinions Do Language Models Reflect? »
Shibani Santurkar · Esin Durmus · Faisal Ladhak · Cinoo Lee · Percy Liang · Tatsunori Hashimoto -
2023 Oral: Evaluating Self-Supervised Learning via Risk Decomposition »
Yann Dubois · Tatsunori Hashimoto · Percy Liang -
2023 Poster: Evaluating Self-Supervised Learning via Risk Decomposition »
Yann Dubois · Tatsunori Hashimoto · Percy Liang -
2023 Poster: Automatically Auditing Large Language Models via Discrete Optimization »
Erik Jones · Anca Dragan · Aditi Raghunathan · Jacob Steinhardt -
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: Out-of-Domain Robustness via Targeted Augmentations »
Irena Gao · Shiori Sagawa · Pang Wei Koh · Tatsunori Hashimoto · Percy Liang -
2023 Poster: Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond »
Itai Kreisler · Mor Shpigel Nacson · Daniel Soudry · Yair Carmon -
2023 Poster: One-sided Matrix Completion from Two Observations Per Row »
Steven Cao · Percy Liang · Greg Valiant -
2023 Poster: Retrieval-Augmented Multimodal Language Modeling »
Michihiro Yasunaga · Armen Aghajanyan · Weijia Shi · Richard James · Jure Leskovec · Percy Liang · Mike Lewis · Luke Zettlemoyer · Scott Yih -
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 : Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Yoonho Lee · Pang Wei Koh · Chelsea Finn -
2022 Poster: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Poster: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation »
Kendrick Shen · Robbie Jones · Ananya Kumar · Sang Michael Xie · Jeff Z. HaoChen · Tengyu Ma · Percy Liang -
2022 Spotlight: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Oral: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation »
Kendrick Shen · Robbie Jones · Ananya Kumar · Sang Michael Xie · Jeff Z. HaoChen · Tengyu Ma · Percy Liang -
2022 Poster: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2022 Poster: RECAPP: Crafting a More Efficient Catalyst for Convex Optimization »
Yair Carmon · Arun Jambulapati · Yujia Jin · Aaron Sidford -
2022 Spotlight: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2022 Spotlight: RECAPP: Crafting a More Efficient Catalyst for Convex Optimization »
Yair Carmon · Arun Jambulapati · Yujia Jin · Aaron Sidford -
2021 : Improving Robustness to Distribution Shifts: Methods and Benchmarks »
Shiori Sagawa -
2021 Poster: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Poster: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization »
Sang Michael Xie · Tengyu Ma · Percy Liang -
2021 Oral: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Oral: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization »
Sang Michael Xie · Tengyu Ma · Percy Liang -
2021 Poster: Outside the Echo Chamber: Optimizing the Performative Risk »
John Miller · Juan Perdomo · Tijana Zrnic -
2021 Poster: Break-It-Fix-It: Unsupervised Learning for Program Repair »
Michihiro Yasunaga · Percy Liang -
2021 Oral: Break-It-Fix-It: Unsupervised Learning for Program Repair »
Michihiro Yasunaga · Percy Liang -
2021 Spotlight: Outside the Echo Chamber: Optimizing the Performative Risk »
John Miller · Juan Perdomo · Tijana Zrnic -
2021 Poster: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Spotlight: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Poster: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2021 Poster: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Spotlight: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2021 Oral: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2020 : Keynote #3 Percy Liang »
Percy Liang -
2020 Poster: Neural Kernels Without Tangents »
Vaishaal Shankar · Alex Fang · Wenshuo Guo · Sara Fridovich-Keil · Jonathan Ragan-Kelley · Ludwig Schmidt · Benjamin Recht -
2020 Poster: Evaluating Machine Accuracy on ImageNet »
Vaishaal Shankar · Rebecca Roelofs · Horia Mania · Alex Fang · Benjamin Recht · Ludwig Schmidt -
2020 Poster: Concept Bottleneck Models »
Pang Wei Koh · Thao Nguyen · Yew Siang Tang · Stephen Mussmann · Emma Pierson · Been Kim · Percy Liang -
2020 Poster: Strategic Classification is Causal Modeling in Disguise »
John Miller · Smitha Milli · Moritz Hardt -
2020 Poster: Graph-based, Self-Supervised Program Repair from Diagnostic Feedback »
Michihiro Yasunaga · Percy Liang -
2020 Poster: Test-Time Training with Self-Supervision for Generalization under Distribution Shifts »
Yu Sun · Xiaolong Wang · Zhuang Liu · John Miller · Alexei Efros · Moritz Hardt -
2020 Poster: Understanding Self-Training for Gradual Domain Adaptation »
Ananya Kumar · Tengyu Ma · Percy Liang -
2020 Poster: Understanding and Mitigating the Tradeoff between Robustness and Accuracy »
Aditi Raghunathan · Sang Michael Xie · Fanny Yang · John Duchi · Percy Liang -
2020 Poster: An Investigation of Why Overparameterization Exacerbates Spurious Correlations »
Shiori Sagawa · aditi raghunathan · Pang Wei Koh · Percy Liang -
2020 Poster: Robustness to Spurious Correlations via Human Annotations »
Megha Srivastava · Tatsunori Hashimoto · Percy Liang -
2020 Poster: Feature Noise Induces Loss Discrepancy Across Groups »
Fereshte Khani · Percy Liang -
2020 Poster: DROCC: Deep Robust One-Class Classification »
Sachin Goyal · Aditi Raghunathan · Moksh Jain · Harsha Vardhan Simhadri · Prateek Jain -
2020 Poster: The Effect of Natural Distribution Shift on Question Answering Models »
John Miller · Karl Krauth · Benjamin Recht · Ludwig Schmidt -
2019 Workshop: Workshop on the Security and Privacy of Machine Learning »
Nicolas Papernot · Florian Tramer · Bo Li · Dan Boneh · David Evans · Somesh Jha · Percy Liang · Patrick McDaniel · Jacob Steinhardt · Dawn Song -
2019 Poster: Do ImageNet Classifiers Generalize to ImageNet? »
Benjamin Recht · Rebecca Roelofs · Ludwig Schmidt · Vaishaal Shankar -
2019 Oral: Do ImageNet Classifiers Generalize to ImageNet? »
Benjamin Recht · Rebecca Roelofs · Ludwig Schmidt · Vaishaal Shankar -
2018 Poster: On the Relationship between Data Efficiency and Error for Uncertainty Sampling »
Stephen Mussmann · Percy Liang -
2018 Poster: Fairness Without Demographics in Repeated Loss Minimization »
Tatsunori Hashimoto · Megha Srivastava · Hongseok Namkoong · Percy Liang -
2018 Oral: Fairness Without Demographics in Repeated Loss Minimization »
Tatsunori Hashimoto · Megha Srivastava · Hongseok Namkoong · Percy Liang -
2018 Oral: On the Relationship between Data Efficiency and Error for Uncertainty Sampling »
Stephen Mussmann · Percy Liang -
2017 Poster: World of Bits: An Open-Domain Platform for Web-Based Agents »
Tim Shi · Andrej Karpathy · Jim Fan · Jonathan Hernandez · Percy Liang -
2017 Poster: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou -
2017 Talk: World of Bits: An Open-Domain Platform for Web-Based Agents »
Tim Shi · Andrej Karpathy · Jim Fan · Jonathan Hernandez · Percy Liang -
2017 Poster: Developing Bug-Free Machine Learning Systems With Formal Mathematics »
Daniel Selsam · Percy Liang · David L Dill -
2017 Talk: Developing Bug-Free Machine Learning Systems With Formal Mathematics »
Daniel Selsam · Percy Liang · David L Dill -
2017 Talk: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou -
2017 Poster: Convexified Convolutional Neural Networks »
Yuchen Zhang · Percy Liang · Martin Wainwright -
2017 Poster: Understanding Black-box Predictions via Influence Functions »
Pang Wei Koh · Percy Liang -
2017 Talk: Convexified Convolutional Neural Networks »
Yuchen Zhang · Percy Liang · Martin Wainwright -
2017 Talk: Understanding Black-box Predictions via Influence Functions »
Pang Wei Koh · Percy Liang