Timezone: »
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applications. To reduce the model size, researchers prune these models based on the weights' importance scores. However, such scores are usually estimated on mini-batches during training, which incurs large variability/uncertainty due to mini-batch sampling and complicated training dynamics. As a result, some crucial weights could be pruned by commonly used pruning methods because of such uncertainty, which makes training unstable and hurts generalization. To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound of importance estimation. In particular, for the weights with low importance scores but high uncertainty, PLATON tends to retain them and explores their capacity. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification to validate the effectiveness of PLATON. Results demonstrate that PLATON manifests notable improvement under different sparsity levels. Our code is publicly available at https://github.com/QingruZhang/PLATON.
Author Information
Qingru Zhang (Georgia Institute of Technology)
Qingru Zhang is a Ph.D. student at Georgia Tech. His research mainly focuses on developing principled learning algorithms with an emphasis on language models and graph representation learning.
Simiao Zuo (Georgia Institute of Technology)
Chen Liang (Georgia Institute of Technology)
Alexander Bukharin (Georgia Institute of Technology)
Pengcheng He (Microsoft)
Weizhu Chen (Microsoft)
Tuo Zhao (Georgia Tech)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Wed. Jul 20th 09:40 -- 09:45 PM Room Hall F
More from the Same Authors
-
2023 Poster: Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data »
Minshuo Chen · Kaixuan Huang · Tuo Zhao · Mengdi Wang -
2023 Poster: Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise »
Zhenghao Lin · Yeyun Gong · Yelong Shen · Tong Wu · Zhihao Fan · Chen Lin · Nan Duan · Weizhu Chen -
2023 Poster: Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models »
Zhihong Shao · Yeyun Gong · Yelong Shen · Minlie Huang · Nan Duan · Weizhu Chen -
2023 Poster: SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process »
Zichong Li · Yanbo Xu · Simiao Zuo · Haoming Jiang · Chao Zhang · Tuo Zhao · Hongyuan Zha -
2023 Poster: LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation »
Yixiao Li · Yifan Yu · Qingru Zhang · Chen Liang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories »
Zixuan Zhang · Minshuo Chen · Mengdi Wang · Wenjing Liao · Tuo Zhao -
2023 Poster: Machine Learning Force Fields with Data Cost Aware Training »
Alexander Bukharin · Tianyi Liu · Shengjie Wang · Simiao Zuo · Weihao Gao · Wen Yan · Tuo Zhao -
2023 Poster: Less is More: Task-aware Layer-wise Distillation for Language Model Compression »
Chen Liang · Simiao Zuo · Qingru Zhang · Pengcheng He · Weizhu Chen · Tuo Zhao -
2023 Poster: HyperTuning: Toward Adapting Large Language Models without Back-propagation »
Jason Phang · Yi Mao · Pengcheng He · Weizhu Chen -
2023 Poster: POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models »
Korawat Tanwisuth · Shujian Zhang · Huangjie Zheng · Pengcheng He · Mingyuan Zhou -
2022 Poster: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Spotlight: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2021 Poster: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Poster: Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks »
Hao Liu · Minshuo Chen · Tuo Zhao · Wenjing Liao -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks »
Hao Liu · Minshuo Chen · Tuo Zhao · Wenjing Liao -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2021 Spotlight: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2020 Poster: Transformer Hawkes Process »
Simiao Zuo · Haoming Jiang · Zichong Li · Tuo Zhao · Hongyuan Zha -
2020 Poster: Deep Reinforcement Learning with Smooth Policy »
Qianli Shen · Yan Li · Haoming Jiang · Zhaoran Wang · Tuo Zhao