Timezone: »
The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
Author Information
Jianfei Chen (Tsinghua University)
Lianmin Zheng (UC Berkeley)
Zhewei Yao (University of California, Berkeley)
Dequan Wang (UC Berkeley)
Ion Stoica (UC Berkeley)
Michael Mahoney (UC Berkeley)
Joseph E Gonzalez (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Wed. Jul 21st 02:00 -- 02:20 AM Room
More from the Same Authors
-
2021 : Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph E Gonzalez · Dan Klein · Yuandong Tian -
2023 : Improve Model Inference Cost with Image Gridding »
Shreyas Krishnaswamy · Lisa Dunlap · Lingjiao Chen · Matei Zaharia · James Zou · Joseph Gonzalez -
2023 : H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Re · Clark Barrett · Zhangyang “Atlas” Wang · Beidi Chen -
2023 : Fast Feature Selection with Fairness Constraints »
Francesco Quinzan · Rajiv Khanna · Moshik Hershcovitch · Sarel Cohen · Daniel Waddington · Tobias Friedrich · Michael Mahoney -
2023 Poster: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Oral: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Poster: Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes »
Liam Hodgkinson · Chris van der Heide · Fred Roosta · Michael Mahoney -
2023 Poster: Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching »
Ilgee Hong · Sen Na · Michael Mahoney · Mladen Kolar -
2023 Poster: Learning Physical Models that Can Respect Conservation Laws »
Derek Hansen · Danielle Robinson · Shima Alizadeh · Gaurav Gupta · Michael Mahoney -
2023 Poster: A Three-regime Model of Network Pruning »
Yefan Zhou · Yaoqing Yang · Arin Chang · Michael Mahoney -
2023 Poster: The Wisdom of Hindsight Makes Language Models Better Instruction Followers »
Tianjun Zhang · Fangchen Liu · Justin Wong · Pieter Abbeel · Joseph E Gonzalez -
2023 Poster: Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases »
Xiaoxia Wu · Cheng Li · Reza Yazdani Aminabadi · Zhewei Yao · Yuxiong He -
2022 : Back to the Source: Test-Time Diffusion-Driven Adaptation »
Jin Gao · Jialing Zhang · Xihui Liu · Trevor Darrell · Evan Shelhamer · Dequan Wang -
2022 Poster: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Poster: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging »
Shishir G. Patil · Paras Jain · Prabal Dutta · Ion Stoica · Joseph E Gonzalez -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging »
Shishir G. Patil · Paras Jain · Prabal Dutta · Ion Stoica · Joseph E Gonzalez -
2022 Spotlight: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Spotlight: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Poster: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Spotlight: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2022 : Tools for Big Model, Key Takeaways, and Q&A »
Lianmin Zheng -
2022 : Intra-Operator Parallelism »
Lianmin Zheng -
2022 : Trends Driving Big Models »
Ion Stoica -
2022 Tutorial: Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models »
Hao Zhang · Lianmin Zheng · Zhuohan Li · Ion Stoica -
2021 Workshop: Beyond first-order methods in machine learning systems »
Albert S Berahas · Anastasios Kyrillidis · Fred Roosta · Amir Gholaminejad · Michael Mahoney · Rachael Tappenden · Raghu Bollapragada · Rixon Crane · J. Lyle Kim -
2021 Poster: I-BERT: Integer-only BERT Quantization »
Sehoon Kim · Amir Gholaminejad · Zhewei Yao · Michael Mahoney · EECS Kurt Keutzer -
2021 Poster: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Spotlight: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Oral: I-BERT: Integer-only BERT Quantization »
Sehoon Kim · Amir Gholaminejad · Zhewei Yao · Michael Mahoney · EECS Kurt Keutzer -
2021 Poster: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Oral: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Poster: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2021 Spotlight: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2020 : Determinantal Point Processes in Randomized Numerical Linear Algebra »
Michael Mahoney -
2020 Workshop: Beyond first order methods in machine learning systems »
Albert S Berahas · Amir Gholaminejad · Anastasios Kyrillidis · Michael Mahoney · Fred Roosta -
2020 Poster: Forecasting Sequential Data Using Consistent Koopman Autoencoders »
Omri Azencot · N. Benjamin Erichson · Vanessa Lin · Michael Mahoney -
2020 Poster: PowerNorm: Rethinking Batch Normalization in Transformers »
Sheng Shen · Zhewei Yao · Amir Gholaminejad · Michael Mahoney · Kurt Keutzer -
2020 Poster: Frustratingly Simple Few-Shot Object Detection »
Xin Wang · Thomas Huang · Joseph E Gonzalez · Trevor Darrell · Fisher Yu -
2020 Poster: VFlow: More Expressive Generative Flows with Variational Data Augmentation »
Jianfei Chen · Cheng Lu · Biqi Chenli · Jun Zhu · Tian Tian -
2020 Poster: Error Estimation for Sketched SVD via the Bootstrap »
Miles Lopes · N. Benjamin Erichson · Michael Mahoney -
2020 Poster: Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers »
Zhuohan Li · Eric Wallace · Sheng Shen · Kevin Lin · Kurt Keutzer · Dan Klein · Joseph Gonzalez -
2020 Poster: Variable Skipping for Autoregressive Range Density Estimation »
Eric Liang · Zongheng Yang · Ion Stoica · Pieter Abbeel · Yan Duan · Peter Chen -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Invited Talk 6: RLlib: A Platform for Finance Research »
Ion Stoica -
2019 : Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks »
Michael Mahoney -
2019 Workshop: AI in Finance: Applications and Infrastructure for Multi-Agent Learning »
Prashant Reddy · Tucker Balch · Michael Wellman · Senthil Kumar · Ion Stoica · Edith Elkind -
2019 Poster: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2019 Oral: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2018 Poster: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Poster: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Oral: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Oral: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2018 Oral: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2017 Poster: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney -
2017 Poster: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney