Timezone: »
Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice.
In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ''quantization knobs'' that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TinyScript always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.
Author Information
Fangcheng Fu (Peking University)
Yuzheng Hu (Peking University)
Yihan He (Peking University)
Undergraduate at Peking University 2020 class working on Machine Learning and Data Privacy
Jiawei Jiang (ETH Zurich)
Yingxia Shao (BUPT)
Ce Zhang (ETH Zurich)
Bin Cui (Peking University)
More from the Same Authors
-
2022 : OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning »
Youhe Jiang · Xupeng Miao · Xiaonan Nie · Bin Cui -
2022 Poster: NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning »
Wentao Zhang · Zeang Sheng · Mingyu Yang · Yang Li · Yu Shen · Zhi Yang · Bin Cui -
2022 Spotlight: NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning »
Wentao Zhang · Zeang Sheng · Mingyu Yang · Yang Li · Yu Shen · Zhi Yang · Bin Cui -
2022 Poster: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2022 Poster: Deep and Flexible Graph Neural Architecture Search »
Wentao Zhang · Zheyu Lin · Yu Shen · Yang Li · Zhi Yang · Bin Cui -
2022 Spotlight: Deep and Flexible Graph Neural Architecture Search »
Wentao Zhang · Zheyu Lin · Yu Shen · Yang Li · Zhi Yang · Bin Cui -
2022 Spotlight: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2021 Poster: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Spotlight: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Poster: 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed »
Hanlin Tang · Shaoduo Gan · Ammar Ahmad Awan · Samyam Rajbhandari · Conglong Li · Xiangru Lian · Ji Liu · Ce Zhang · Yuxiong He -
2021 Spotlight: 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed »
Hanlin Tang · Shaoduo Gan · Ammar Ahmad Awan · Samyam Rajbhandari · Conglong Li · Xiangru Lian · Ji Liu · Ce Zhang · Yuxiong He -
2021 Poster: Evolving Attention with Residual Convolutions »
Yujing Wang · Yaming Yang · Jiangang Bai · Mingliang Zhang · Jing Bai · JING YU · Ce Zhang · Gao Huang · Yunhai Tong -
2021 Spotlight: Evolving Attention with Residual Convolutions »
Yujing Wang · Yaming Yang · Jiangang Bai · Mingliang Zhang · Jing Bai · JING YU · Ce Zhang · Gao Huang · Yunhai Tong -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: Distributed Learning over Unreliable Networks »
Chen Yu · Hanlin Tang · Cedric Renggli · Simon Kassing · Ankit Singla · Dan Alistarh · Ce Zhang · Ji Liu -
2019 Oral: Distributed Learning over Unreliable Networks »
Chen Yu · Hanlin Tang · Cedric Renggli · Simon Kassing · Ankit Singla · Dan Alistarh · Ce Zhang · Ji Liu -
2019 Poster: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2019 Oral: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2018 Poster: Asynchronous Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Wei Zhang · Ce Zhang · Ji Liu -
2018 Poster: $D^2$: Decentralized Training over Decentralized Data »
Hanlin Tang · Xiangru Lian · Ming Yan · Ce Zhang · Ji Liu -
2018 Oral: $D^2$: Decentralized Training over Decentralized Data »
Hanlin Tang · Xiangru Lian · Ming Yan · Ce Zhang · Ji Liu -
2018 Oral: Asynchronous Decentralized Parallel Stochastic Gradient Descent »
Xiangru Lian · Wei Zhang · Ce Zhang · Ji Liu -
2017 Poster: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang -
2017 Talk: ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning »
Hantian Zhang · Jerry Li · Kaan Kara · Dan Alistarh · Ji Liu · Ce Zhang