Skip to yearly menu bar Skip to main content


Poster

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li · Xinghao Chen · Han Shu · Yehui Tang · Yunhe Wang


Abstract:

Large language models (LLM) have recently attracted significant attention in the field of artificial intelligence, leading to numerous interesting applications. However, the training process of these models poses significant challenges in terms of computational and storage capacities, thus compressing checkpoints has become an urgent problem. In this paper, we propose the Extreme Checkpoint Compression (ExCP) framework, which significantly reduces the storage required for training checkpoints while achieving nearly lossless performance. Since the training process of large language models is gradual and continuous, their weights are updated with smaller changes controlled by the learning rate. We first calculate the residuals of adjacent checkpoints to obtain the essential information but very sparse for higher compression ratio. To further excavate the redundancy parameters in checkpoints, we then propose to utilize another important information during the model optimization, i.e., momentum and develop a weight-momentum joint shrinking method. In particular, we exploit the information of both model and optimizer to discard as many parameters as possible while preserving critical information to ensure optimal performance. Furthermore, we utilize non-uniform quantization to further compress the storage of checkpoints. We extensively evaluate our proposed ExCP framework on several models ranging from 410M to 7B parameters and demonstrate significant storage reduction while maintaining strong performance. For instance, we achieve approximately 70x compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks.

Live content is unavailable. Log in and register to view live content