Paper ID: 1278 Title: Fixed Point Quantization of Deep Convolutional Networks Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors of the paper presents a principled way of quantizing an already trained deep convolutional network, with the aim of decreasing the model size keeping a similar accuracy. This is done by optimizing the signal-to-quantization noise ratio. Clarity - Justification: The paper is easy to read overall, and concepts are clearly explained. Significance - Justification: This paper contains somewhat significant advances in terms of compressing networks, which may be useful in many scenarios. The main complaint I have about this paper is that the authors do not mention other means of reducing the complexity of a deep net without having a significant impact in its accuracy. Particularly, the authors do not mention approaches based on "dark knowledge" such as [1], and approaches based on the hashing trick, such as [2]. It would be interesting that the authors comment on those, discussing whether net compression made through quantization is orthogonal to other kinds of compression as the ones previously mentioned. [1] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the Knowledge in a Neural Network." [2] Chen, Wenlin, et al. "Compressing Neural Networks with the Hashing Trick." Proceedings of The 32nd International Conference on Machine Learning. 2015. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Apart from the main concern I previously mentioned, I have a few minor comments: In problems (11) and (13) the optimizing variables (\gamma and \lambda) should be specified under min. In line 420, the equality does not necessarily hold in the mentioned case. Some typos: In line 145 there is a typo in bit-wdiths. I cannot parse the sentence in lines 398-401. I cannot parse the sentence in lines 444-446. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a fixed point quantization scheme of CNN activations and weights for test time that preserves accuracy but reduces the model size. In particular, the authors introduce an optimization scheme that can dynamically compute the optimal quantization on a per-layer basis and contrast their approach with approaches that use single, global setting. This quantization is achieved using an optimization over quantization parameters while maximizing the signal to quantization noise ratio (SQNR). Clarity - Justification: The paper is clear and well-written. I appreciated the explicit examples, e.g. around line 200. Significance - Justification: This paper presents a relatively incremental advance. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Compressing CNNs at test time for more efficient processing and with smaller model sizes is an important problem that helps pave the way toward CNNs running on embedded devices. This paper presents an optimization scheme that quantizes the weights and activations in the CNN. It is well-written and reasonably clear, and the experiments demonstrate better error rates for fixed model sizes compared to baselines. It was not clear to me why the paper does not compare to Sajid et al. 2015, who seem to have a very similar motivation but a different approach. I think the authors should explicitly mention this in the paper and discuss the benefits of the proposed approach. I would also appreciate a summary of the differences between this scheme and the equal bit-width scheme. Clearly the performance is better but what about other factors - for example ease of implementation, speed of the optimization, etc. I think more of these tradeoffs should be explicitly discussed by the authors. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a formal optimization process for converting a Floating-point Convolution neural network into a Fixed-point version in order to save memory and (potentially) compute time. Their approach assumes that the weights and activations are a gaussian distribution with a longer tail than usual. Using this assumption, they pose the quantization process as an optimization to maximize the SQNR of the weights / activations (separately and together were tried). They conduct experiments on the CIFAR-10 dataset using a smaller ConvNet and the Imagenet dataset using a larger network. Their experiments show good empirical results, performing better than an equal bit-width quantization across the network. Clarity - Justification: The paper is well-written and was generally very easy to follow. Line 488: comma instead of full-stop. Line 395: can written as -> can be written as Significance - Justification: In terms of significance, the paper does not provide a significant improvement in terms of empirical performance, as one can simply brute-force the optimization problem (the number of layers is usually not a large quantity). However, the paper introduces a formal and clean process to optimize the bit-width, and is an improvement over brute-force search. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The main weakness of the paper is the hand-waving in lines 360-367, where they assume that SQNR is a good proxy for classification accuracy. 1. This is actually not at all obvious or likely true, as the function represented by a multi-layer neural network is highly non-linear, including inter-layer interactions. 2. It is also not obvious that all weights in a layer are equally important (the SQNR proxy takes the average error). However, the authors have conducted good experiments on more than MNIST / CIFAR, which gives the paper added value in today's day and age. =====