Timezone: »

GACT: Activation Compressed Training for Generic Network Architectures
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung

@ None #None

Training large neural network (NN) models requires extensive memory resources, and Activation Compression Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.

Author Information

Xiaoxuan Liu (UC Berkeley)
Xiaoxuan Liu

I am Lily (Xiaoxuan) Liu, a second-year CS Ph.D. student from UC Berkeley. I am fortunately advised by Professor Alvin Cheung. I am broadly interested in database and machine learning system research.

Lianmin Zheng (UC Berkeley)
Dequan Wang (UC Berkeley)
Yukuo Cen (Tsinghua University)
Weize Chen (Tsinghua University)
Xu Han (Tsinghua University)
Jianfei Chen (Tsinghua University)
Zhiyuan Liu (Tsinghua University)
Jie Tang (Tsinghua University)
Joseph Gonzalez (UC Berkeley)
Michael Mahoney (UC Berkeley)
Alvin Cheung (University of California, Berkeley)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors