Timezone: »

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
Ajay Jaiswal · Shiwei Liu · Tianlong Chen · Ding · Zhangyang “Atlas” Wang

Wed Jul 26 07:08 PM -- 07:16 PM (PDT) @ Meeting Room 313
Large pre-trained transformers have been receiving explosive attention in the past few years, due to their acculturation for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks which can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose **Instant Soup Pruning (ISP)** to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine. More specifically, during the mask generation stage, ISP takes a small handful of iterations using varying training protocols and data subsets to generate many weak and noisy subnetworks, and superpose them to average out the noise creating a high-quality denoised subnetwork. Our extensive experiments and ablation on two popular large-scale pre-trained models: $\texttt{CLIP} (unexplored in pruning till date)$ and $\texttt{BERT}$ across multiple benchmark vision $\texttt{\{MNIST, SVHN, Cars, GTSRB, CIFAR-10, CIFAR-100\}}$ and language datasets $\texttt{\{MNLI, QNLI, QQP, SST, ...\}}$ validate the effectiveness of ISP compared to several state-of-the-art pruning methods. Additionally, we show that ISP can be easily modified with minimal overhead to produce benefits comparable to model soups, without the prerequisite to generate multiple candidates fine-tuned models. Codes are available at: https://github.com/VITA-Group/instant_soup.

Author Information

Ajay Jaiswal (University of Texas at Austin)
Shiwei Liu (UT Austin)

Shiwei Liu is a Postdoctoral Fellow at the University of Texas at Austin. He obtained his Ph.D. from the Eindhoven University of Technology in 2022. His research interests cover sparsity in neural networks and efficient ML. He has over 30 publications in top-tier machine learning conferences, such as IJCAI, ICLR, ICML, NeurIPS, IJCV, UAI, and LoG. Shiwei won the best paper award at the LoG’22 conference and the Cum Laude (distinguished Ph.D. thesis) at the Eindhoven University of Technology. He has served as an area chair in ICIP‘22 and ICIP’23; and a PC member of almost all top-tier ML/CV conferences. Shiwei has co-organized two tutorials in IJCAI and ECML-PKDD, which were widely acclaimed by the audience. He has also given more than 20 invited talks at many universities, companies, research labs, and conferences.

Tianlong Chen (PostDoc - MIT/Harvard; Incoming Assistant Professor - UNC Chapel Hill)
Zhangyang “Atlas” Wang (University of Texas at Austin)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors