Timezone: »
Poster
Oscillation-free Quantization for Low-bit Vision Transformers
Shih-Yang liu · Zechun Liu · Kwang-Ting Cheng
Weight oscillation is a by-product of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used $\textit{de facto}$ setting in quantization aggravates weight oscillation. In this work, we investigate the connection between learnable scaling factor and quantized weight oscillation using ViT, and we additionally find that the interdependence between quantized weights in $\textit{query}$ and $\textit{key}$ of a self-attention layer also makes ViT vulnerable to oscillation. We propose three techniques correspondingly: statistical weight quantization ($\rm StatsQ$) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ($\rm CGA$) that freezes the weights with $\textit{high confidence}$ and calms the oscillating weights; and $\textit{query}$-$\textit{key}$ reparameterization ($\rm QKR$) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that our algorithms successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S surpass the previous state-of-the-art by 9.8% and 7.7%, respectively. The code is included in the supplementary material and will be released.
Author Information
Shih-Yang liu (HKUST)
Zechun Liu (Meta)
Kwang-Ting Cheng (Hong Kong University of Science and Technology)
More from the Same Authors
-
2022 Poster: SDQ: Stochastic Differentiable Quantization with Mixed Precision »
Xijie Huang · Zhiqiang Shen · Shichao Li · Zechun Liu · Hu Xianghong · Jeffry Wicaksana · Eric Xing · Kwang-Ting Cheng -
2022 Spotlight: SDQ: Stochastic Differentiable Quantization with Mixed Precision »
Xijie Huang · Zhiqiang Shen · Shichao Li · Zechun Liu · Hu Xianghong · Jeffry Wicaksana · Eric Xing · Kwang-Ting Cheng -
2021 Poster: How Do Adam and Training Strategies Help BNNs Optimization »
Zechun Liu · Zhiqiang Shen · Shichao Li · Koen Helwegen · Dong Huang · Kwang-Ting Cheng -
2021 Spotlight: How Do Adam and Training Strategies Help BNNs Optimization »
Zechun Liu · Zhiqiang Shen · Shichao Li · Koen Helwegen · Dong Huang · Kwang-Ting Cheng