Timezone: »

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Yixiao Li · Yifan Yu · Qingru Zhang · Chen Liang · Pengcheng He · Weizhu Chen · Tuo Zhao

Wed Jul 26 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #228

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoid- ing their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approxima- tions, and low-rank approximation prevents prun- ing from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural lan- guage generation tasks. We show that it signif- icantly outperforms existing compression meth- ods. Our code is publicly available at https: //github.com/yxli2123/LoSparse

Author Information

Yixiao Li (Georgia Institute of Technology)
Yifan Yu (Georgia Institute of technology)
Qingru Zhang (Georgia Institute of Technology)

Qingru Zhang is a Ph.D. student at Georgia Tech. His research mainly focuses on developing principled learning algorithms with an emphasis on language models and graph representation learning.

Chen Liang (Georgia Institute of Technology)
Pengcheng He (Microsoft)
Weizhu Chen (Microsoft)
Tuo Zhao (Georgia Tech)

More from the Same Authors