Skip to yearly menu bar Skip to main content


Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization

Eldad Meller · Alexander Finkelstein · Uri Almog · Mark Grobman

Pacific Ballroom #142

Keywords: [ Statistical Learning Theory ] [ Computer Vision ]


Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.

Live content is unavailable. Log in and register to view live content