Timezone: »
Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice.
In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ''quantization knobs'' that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TinyScript always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.
Author Information
Fangcheng Fu (Peking University)
Yuzheng Hu (Peking University)
Yihan He (Peking University)
Undergraduate at Peking University 2020 class working on Machine Learning and Data Privacy
Jiawei Jiang (ETH Zurich)
Yingxia Shao (BUPT)
Ce Zhang (ETH Zurich)
Bin Cui (Peking University)
More from the Same Authors
-
2023 : Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models »
Mayee Chen · Nicholas Roberts · Kush Bhatia · Jue Wang · Ce Zhang · Frederic Sala · Christopher Ré -
2023 : GPT-Zip: Deep Compression of Finetuned Large Language Models »
Berivan Isik · Hermann Kumbong · Wanyi Ning · Xiaozhe Yao · Sanmi Koyejo · Ce Zhang -
2023 : Announcement and open discussion on DMLR (Selected members of DMLR Advisory Board) »
Ce Zhang -
2023 Workshop: DMLR Workshop: Data-centric Machine Learning Research »
Ce Zhang · Praveen Paritosh · Newsha Ardalani · Nezihe Merve Gürel · William Gaviria Rojas · Yang Liu · Rotem Dror · Manil Maskey · Lilith Bat-Leah · Tzu-Sheng Kuo · Luis Oala · Max Bartolo · Ludwig Schmidt · Alicia Parrish · Daniel Kondermann · Najoung Kim -
2023 Oral: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Oral: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization »
Zhen WANG · Weirui Kuang · Ce Zhang · Bolin Ding · Yaliang Li -
2022 : OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning »
Youhe Jiang · Xupeng Miao · Xiaonan Nie · Bin Cui -
2022 Poster: NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning »
Wentao Zhang · Zeang Sheng · Mingyu Yang · Yang Li · Yu Shen · Zhi Yang · Bin Cui -
2022 Spotlight: NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning »
Wentao Zhang · Zeang Sheng · Mingyu Yang · Yang Li · Yu Shen · Zhi Yang · Bin Cui -
2022 Poster: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2022 Poster: Deep and Flexible Graph Neural Architecture Search »
Wentao Zhang · Zheyu Lin · Yu Shen · Yang Li · Zhi Yang · Bin Cui -
2022 Spotlight: Deep and Flexible Graph Neural Architecture Search »
Wentao Zhang · Zheyu Lin · Yu Shen · Yang Li · Zhi Yang · Bin Cui -
2022 Spotlight: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2021 Poster: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Spotlight: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Poster: 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed »
Hanlin Tang · Shaoduo Gan · Ammar Ahmad Awan · Samyam Rajbhandari · Conglong Li · Xiangru Lian · Ji Liu · Ce Zhang · Yuxiong He -
2021 Spotlight: 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed »
Hanlin Tang · Shaoduo Gan · Ammar Ahmad Awan · Samyam Rajbhandari · Conglong Li · Xiangru Lian · Ji Liu · Ce Zhang · Yuxiong He -
2021 Poster: Evolving Attention with Residual Convolutions »
Yujing Wang · Yaming Yang · Jiangang Bai · Mingliang Zhang · Jing Bai · JING YU · Ce Zhang · Gao Huang · Yunhai Tong -
2021 Spotlight: Evolving Attention with Residual Convolutions »
Yujing Wang · Yaming Yang · Jiangang Bai · Mingliang Zhang · Jing Bai · JING YU · Ce Zhang · Gao Huang · Yunhai Tong -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: Distributed Learning over Unreliable Networks »
Chen Yu · Hanlin Tang · Cedric Renggli · Simon Kassing · Ankit Singla · Dan Alistarh · Ce Zhang · Ji Liu -
2019 Oral: Distributed Learning over Unreliable Networks »
Chen Yu · Hanlin Tang · Cedric Renggli · Simon Kassing · Ankit Singla · Dan Alistarh · Ce Zhang · Ji Liu -
2019 Poster: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2019 Oral: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2018 Poster: Asynchronous Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Wei Zhang · Ce Zhang · Ji Liu -
2018 Poster: $D^2$: Decentralized Training over Decentralized Data »
Hanlin Tang · Xiangru Lian · Ming Yan · Ce Zhang · Ji Liu -
2018 Oral: $D^2$: Decentralized Training over Decentralized Data »
Hanlin Tang · Xiangru Lian · Ming Yan · Ce Zhang · Ji Liu -
2018 Oral: Asynchronous Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Wei Zhang · Ce Zhang · Ji Liu -
2017 Poster: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang -
2017 Talk: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang