Timezone: »

 
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun · Zhuang Liu · Anna Bair · Zico Kolter
Event URL: https://openreview.net/forum?id=tz9JV2PRSv »

As their size increases, Large Languages Models (LLMs) are natural candidates for network pruning. Existing methods require either retraining or solving a weight reconstruction problem, which may be computationally expensive for billion-scale LLMs. In this paper, we introduce a novel, simple yet effective pruning method, termed Wanda (Pruning by Weights and activations), to induce sparsity in pretrained LLMs. Motivated by the recent observation of emergent large magnitude features in LLMs, our approach prune weights with the smallest magnitudes multiplied by the corresponding input activations, on a per-output basis. Notably, Wanda requires no retraining or weight update, and the pruned LLM can be used as is. We conduct a thorough evaluation of our method on LLaMA, one of the best performing LLMs available. Wanda significantly outperforms the established baseline of magnitude pruning and competes favorably against recent methods involving intensive weight update.

Author Information

Mingjie Sun (Computer Science Department, Carnegie Mellon University)
Zhuang Liu (Meta)
Anna Bair (Carnegie Mellon University)
Zico Kolter (Carnegie Mellon University / Bosch Center for AI)

More from the Same Authors