Timezone: »
Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an \textit{exchange rate} between augmented and additional real data, we find that augmentations can provide nearly the same performance gains as additional data samples for in-domain generalization and even greater performance gains for out-of-distribution test sets. We also find that neural networks with hard-coded invariances underperform those with invariances learned via data augmentations. Our experiments suggest that these benefits to generalization arise from the additional stochasticity conferred by randomized augmentations, leading to flatter minima.
Author Information
Jonas Geiping (University of Maryland, College Park)
Gowthami Somepalli (University of Maryland, College Park)
Ravid Shwartz-Ziv (New York University)
Andrew Wilson (New York University)
Tom Goldstein (University of Maryland)
Micah Goldblum (New York University)
More from the Same Authors
-
2021 : Tabular Data: Deep Learning is Not All You Need »
Ravid Shwartz-Ziv · Amitai Armon -
2022 : Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning »
Yuxin Wen · Jonas Geiping · Liam Fowl · Hossein Souri · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch »
Hossein Souri · Liam Fowl · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations »
Polina Kirichenko · Polina Kirichenko · Pavel Izmailov · Andrew Wilson -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 : What Do We Maximize In Self-Supervised Learning? »
Ravid Shwartz-Ziv · Ravid Shwartz-Ziv · Randall Balestriero · Yann LeCun · Yann LeCun -
2023 : Understanding the Detrimental Class-level Effects of Data Augmentation »
Polina Kirichenko · Mark Ibrahim · Randall Balestriero · Diane Bouchacourt · Ramakrishna Vedantam · Hamed Firooz · Andrew Wilson -
2023 : Cramming: Training a Language Model on a single GPU in one day »
Jonas Geiping · Tom Goldstein -
2023 : Understanding Data Replication in Diffusion Models »
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 : Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2023 Oral: A Watermark for Large Language Models »
John Kirchenbauer · Jonas Geiping · Yuxin Wen · Jonathan Katz · Ian Miers · Tom Goldstein -
2023 Poster: Simple and Fast Group Robustness by Automatic Feature Reweighting »
Shikai Qiu · Andres Potapczynski · Pavel Izmailov · Andrew Wilson -
2023 Poster: User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems »
Marc Finzi · Anudhyan Boral · Andrew Wilson · Fei Sha · Leonardo Zepeda-Nunez -
2023 Poster: GOAT: A Global Transformer on Large-scale Graphs »
Kezhi Kong · Jiuhai Chen · John Kirchenbauer · Renkun Ni · C. Bayan Bruss · Tom Goldstein -
2023 Poster: A Watermark for Large Language Models »
John Kirchenbauer · Jonas Geiping · Yuxin Wen · Jonathan Katz · Ian Miers · Tom Goldstein -
2023 Poster: Function-Space Regularization in Neural Networks: A Probabilistic Perspective »
Tim G. J. Rudner · Sanyam Kapoor · Shikai Qiu · Andrew Wilson -
2023 Poster: Cramming: Training a Language Model on a single GPU in one day. »
Jonas Geiping · Tom Goldstein -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Poster: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Oral: Bayesian Model Selection, the Marginal Likelihood, and Generalization »
Sanae Lotfi · Pavel Izmailov · Gregory Benton · Micah Goldblum · Andrew Wilson -
2022 Spotlight: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Poster: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Spotlight: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Poster: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2022 Spotlight: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2021 : Paper Presentation 1: Analyzing the Security of Machine Learning for Algorithmic Trading »
Avi Schwarzschild · Micah Goldblum · Tom Goldstein -
2021 : Tabular Data: Deep Learning is Not All You Need »
Ravid Shwartz-Ziv -
2021 Workshop: ICML Workshop on Representation Learning for Finance and E-Commerce Applications »
Senthil Kumar · Sameena Shah · Joan Bruna · Tom Goldstein · Erik Mueller · Oleg Rokhlenko · Hongxia Yang · Jianpeng Xu · Oluwatobi O Olabiyi · Charese Smiley · C. Bayan Bruss · Saurabh H Nagrecha · Svitlana Vyetrenko -
2021 Poster: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2021 Poster: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2020 Poster: Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness »
Aounon Kumar · Alexander Levine · Tom Goldstein · Soheil Feizi -
2020 Poster: Certified Data Removal from Machine Learning Models »
Chuan Guo · Tom Goldstein · Awni Hannun · Laurens van der Maaten -
2020 Poster: Adversarial Attacks on Copyright Detection Systems »
Parsa Saadatpanah · Ali Shafahi · Tom Goldstein -
2020 Poster: Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks »
Micah Goldblum · Steven Reich · Liam Fowl · Renkun Ni · Valeriia Cherepanova · Tom Goldstein -
2020 Poster: The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent »
Karthik Abinav Sankararaman · Soham De · Zheng Xu · W. Ronny Huang · Tom Goldstein -
2019 Poster: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2019 Oral: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2018 Poster: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2018 Oral: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2017 Poster: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Talk: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Poster: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer -
2017 Talk: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer