Timezone: »
Towards Structured Sparsity in Transformers for Efficient Inference
Harry Dong · Beidi Chen · Yuejie Chi
Event URL: https://openreview.net/forum?id=c4m0BkO4OL »
Transformer models have been critical in accelerating progress in numerous fields, yet scaling these models come at high computational costs. In this paper, we explore sparsity properties in transformers and manipulate existing sparsity in transformers to be more structured for efficient training and inference. In particular, we create sparse structures that have inter-layer similarity and are block sparse which have the potential to bypass a significant amount of model loading and computation. We present preliminary results and ideas using a small transformer which we hope to extend to more complex models.
Author Information
Harry Dong (Carnegie Mellon University)
Beidi Chen (CMU / FAIR)
Yuejie Chi (CMU)
More from the Same Authors
-
2023 : Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer »
Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du -
2023 : Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation »
Wenhao Ding · Laixi Shi · Yuejie Chi · Ding Zhao -
2023 : Counterfactual Generation with Identifiability Guarantees »
Hanqi Yan · Lingjing Kong · Lin Gui · Yuejie Chi · Eric Xing · Yulan He · Kun Zhang -
2023 : Identification of Nonlinear Latent Hierarchical Causal Models »
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang -
2023 : H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Re · Clark Barrett · Zhangyang “Atlas” Wang · Beidi Chen -
2023 : Incremental Low-Rank Learning »
Jiawei Zhao · Yifei Zhang · Beidi Chen · Florian Schaefer · Anima Anandkumar -
2023 Workshop: ES-FoMo: Efficient Systems for Foundation Models »
Julien Launay · Daniel Y Fu · Tri Dao · Daniel Hesslow · Beidi Chen · Azalia Mirhoseini · Percy Liang -
2023 Oral: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond »
Jiin Woo · Gauri Joshi · Yuejie Chi -
2023 Poster: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Oral: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing »
Xingyu Xu · Yandi Shen · Yuejie Chi · Cong Ma -
2022 Poster: Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity »
Laixi Shi · Gen Li · Yuting Wei · Yuxin Chen · Yuejie Chi -
2022 Spotlight: Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity »
Laixi Shi · Gen Li · Yuting Wei · Yuxin Chen · Yuejie Chi -
2022 Poster: Monarch: Expressive Structured Matrices for Efficient and Accurate Training »
Tri Dao · Beidi Chen · Nimit Sohoni · Arjun Desai · Michael Poli · Jessica Grogan · Alexander Liu · Aniruddh Rao · Atri Rudra · Christopher Re -
2022 Oral: Monarch: Expressive Structured Matrices for Efficient and Accurate Training »
Tri Dao · Beidi Chen · Nimit Sohoni · Arjun Desai · Michael Poli · Jessica Grogan · Alexander Liu · Aniruddh Rao · Atri Rudra · Christopher Re -
2021 Poster: Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning »
Gen Li · Changxiao Cai · Yuxin Chen · Yuantao Gu · Yuting Wei · Yuejie Chi -
2021 Spotlight: Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning »
Gen Li · Changxiao Cai · Yuxin Chen · Yuantao Gu · Yuting Wei · Yuejie Chi -
2021 Poster: A Tale of Two Efficient and Informative Negative Sampling Distributions »
Shabnam Daghaghi · Tharun Medini · Nicholas Meisburger · Beidi Chen · Mengnan Zhao · Anshumali Shrivastava -
2021 Oral: A Tale of Two Efficient and Informative Negative Sampling Distributions »
Shabnam Daghaghi · Tharun Medini · Nicholas Meisburger · Beidi Chen · Mengnan Zhao · Anshumali Shrivastava -
2020 Poster: Angular Visual Hardness »
Beidi Chen · Weiyang Liu · Zhiding Yu · Jan Kautz · Anshumali Shrivastava · Animesh Garg · Anima Anandkumar -
2018 Poster: Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion »
Cong Ma · Kaizheng Wang · Yuejie Chi · Yuxin Chen -
2018 Oral: Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion »
Cong Ma · Kaizheng Wang · Yuejie Chi · Yuxin Chen