Reviewer 1$
We thank the reviewer for succinctly summarizing our contribution in this paper. We have corrected the errors you identified.

We also thank the reviewer for pointing out the problems with our assumption of SQNR as a good proxy for classification error. We have revised our manuscript to account for the above observations as follows:

“In deep learning, there is not a well-formulated relationship between SQNR and classification accuracy. However, it is reasonable to assume that in general higher quantization noise level leads to worse classification performance. Given that SQNR can be approximated theoretically and analyzed layer by layer, we focus on developing a theoretical framework to optimize for the SQNR. We then conduct empirical investigations into how the proposed optimization for SQNR affect classification accuracy of the DCN. Our findings are reported in Section 5.”


***************
Reviewer 2

We thank the reviewer for pointing us to alternative approaches to handle the complexity of deep networks without impacting accuracy. We have revised our manuscript accordingly and have added these references:

“Other approaches to handle complexity of deep networks include: (a) leveraging high complexity networks to boost performance of low complexity networks, as proposed in Hinton et al. (2014), and (b) compressing neural networks using hashing (Chen et al., 2015). These methods are complementary to our proposed approach and the resulting networks with reduced complexity can be easily converted to fixed point using our proposed method. In fact, the AlexNet-like network we perform our experiments on in Section 5.2 was trained with the dark knowledge approach using the inception network (Ioffe &amp; Szegedy, 2015) trained on ImageNet as the master network. The authors of Chen et al. (2015) also acknowledge that the fixed-point representation of networks can be readily incorporated with HashedNets.”

We also thank the reviewer for pointing out the typos. We have revised the manuscript accordingly.


***************
Reviewer 3:

We thank the reviewer for the comments regarding the comparison with Sajid et al. 2015. We are quite aware of paper by Sajid et al. and in fact have cited this paper in the last paragraph of Section 2. As noted in the paragraph, our approach is quite similar to Sajid et al. in spirit, i.e., convert a pre-trained DCN model into a fixed point equivalent. However, while Sajid et al. performs exhaustive search to identify the right quantization per layer, our approach is to formulate a SQNR optimization problem and derive analytical solution for suitable quantization per layer of the network. We have revised the manuscript to discuss the benefit of the proposed approach as follows:

“The benefit of our approach as opposed to the brute force method is that it is grounded in a theoretical framework and offers an analytical solution for bit-width choice per layer to optimize the SQNR for the network. This offers an easier path to generalize to networks with significantly large number of layers such as the one recently proposed by He et al. (2015).”

We also thank the reviewer for the suggestion regarding discussing the tradeoffs between the proposed scheme and equal bit-width scheme. We have highlighted these tradeoffs and added a note on our view of these trade-offs in our revised manuscript as follows:

“The experiments on our CIFAR-10 network and the AlexNet-like network demonstrate that the proposed cross-layer bit-width optimization offers clear advantage over the equal bit-width scheme. As summarized in Table 3 and Table 5, a simple offline computation of the inter-layer bit-width relationship is all that is required to perform the optimization. 

However, in the absence of customized design, the implementation of the optimized bit-widths can be limited by the software or hardware platform on which the DCN operates. In this case, the optimized bit-width needs to be rounded up to the next supported bit-width, which may in turn impact the network classification accuracy.”