Timezone: »
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applications. To reduce the model size, researchers prune these models based on the weights' importance scores. However, such scores are usually estimated on mini-batches during training, which incurs large variability/uncertainty due to mini-batch sampling and complicated training dynamics. As a result, some crucial weights could be pruned by commonly used pruning methods because of such uncertainty, which makes training unstable and hurts generalization. To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound of importance estimation. In particular, for the weights with low importance scores but high uncertainty, PLATON tends to retain them and explores their capacity. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification to validate the effectiveness of PLATON. Results demonstrate that PLATON manifests notable improvement under different sparsity levels. Our code is publicly available at https://github.com/QingruZhang/PLATON.
Author Information
Qingru Zhang (Georgia Institute of Technology)
Qingru Zhang is a Ph.D. student at Georgia Tech. His research mainly focuses on developing principled learning algorithms with an emphasis on language models and graph representation learning.
Simiao Zuo (Georgia Institute of Technology)
Chen Liang (Georgia Institute of Technology)
Alexander Bukharin (Georgia Institute of Technology)
Pengcheng He (Microsoft)
Weizhu Chen (Microsoft)
Tuo Zhao (Georgia Tech)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance »
Wed. Jul 20th through Thu the 21st Room Hall E #230
More from the Same Authors
-
2022 Poster: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Spotlight: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2021 Poster: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Spotlight: BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining »
Weizhen Qi · Yeyun Gong · Jian Jiao · Yu Yan · Weizhu Chen · Dayiheng Liu · Kewen Tang · Houqiang Li · Jiusheng Chen · Ruofei Zhang · Ming Zhou · Nan Duan -
2021 Poster: Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks »
Hao Liu · Minshuo Chen · Tuo Zhao · Wenjing Liao -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks »
Hao Liu · Minshuo Chen · Tuo Zhao · Wenjing Liao -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2021 Spotlight: Poolingformer: Long Document Modeling with Pooling Attention »
Hang ZHANG · Yeyun Gong · Yelong Shen · Weisheng Li · Jiancheng Lv · Nan Duan · Weizhu Chen -
2020 Poster: Transformer Hawkes Process »
Simiao Zuo · Haoming Jiang · Zichong Li · Tuo Zhao · Hongyuan Zha -
2020 Poster: Deep Reinforcement Learning with Smooth Policy »
Qianli Shen · Yan Li · Haoming Jiang · Zhaoran Wang · Tuo Zhao