Timezone: »
Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU.Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.
Author Information
Jonas Geiping (University of Maryland, College Park)
Tom Goldstein (University of Maryland)
More from the Same Authors
-
2022 : Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning »
Yuxin Wen · Jonas Geiping · Liam Fowl · Hossein Souri · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch »
Hossein Souri · Liam Fowl · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : How much Data is Augmentation Worth? »
Jonas Geiping · Gowthami Somepalli · Ravid Shwartz-Ziv · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2023 : Understanding Data Replication in Diffusion Models »
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 Oral: A Watermark for Large Language Models »
John Kirchenbauer · Jonas Geiping · Yuxin Wen · Jonathan Katz · Ian Miers · Tom Goldstein -
2023 Poster: GOAT: A Global Transformer on Large-scale Graphs »
Kezhi Kong · Jiuhai Chen · John Kirchenbauer · Renkun Ni · C. Bayan Bruss · Tom Goldstein -
2023 Poster: A Watermark for Large Language Models »
John Kirchenbauer · Jonas Geiping · Yuxin Wen · Jonathan Katz · Ian Miers · Tom Goldstein -
2023 Poster: Cramming: Training a Language Model on a single GPU in one day. »
Jonas Geiping · Tom Goldstein -
2022 Poster: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Spotlight: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Poster: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Spotlight: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Poster: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2022 Spotlight: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2021 : Paper Presentation 1: Analyzing the Security of Machine Learning for Algorithmic Trading »
Avi Schwarzschild · Micah Goldblum · Tom Goldstein -
2021 Workshop: ICML Workshop on Representation Learning for Finance and E-Commerce Applications »
Senthil Kumar · Sameena Shah · Joan Bruna · Tom Goldstein · Erik Mueller · Oleg Rokhlenko · Hongxia Yang · Jianpeng Xu · Oluwatobi O Olabiyi · Charese Smiley · C. Bayan Bruss · Saurabh H Nagrecha · Svitlana Vyetrenko -
2021 Poster: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2021 Poster: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2020 Poster: Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness »
Aounon Kumar · Alexander Levine · Tom Goldstein · Soheil Feizi -
2020 Poster: Certified Data Removal from Machine Learning Models »
Chuan Guo · Tom Goldstein · Awni Hannun · Laurens van der Maaten -
2020 Poster: Adversarial Attacks on Copyright Detection Systems »
Parsa Saadatpanah · Ali Shafahi · Tom Goldstein -
2020 Poster: Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks »
Micah Goldblum · Steven Reich · Liam Fowl · Renkun Ni · Valeriia Cherepanova · Tom Goldstein -
2020 Poster: The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent »
Karthik Abinav Sankararaman · Soham De · Zheng Xu · W. Ronny Huang · Tom Goldstein -
2019 Poster: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2019 Oral: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2018 Poster: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2018 Oral: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2017 Poster: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Talk: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Poster: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer -
2017 Talk: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer