We thank the reviewers for their thoughtful and useful remarks. Due to the limited space, we cannot address all comments and suggestions.$
Assigned_Reviewer_3 asked about the empirical comparison of our work to the work of Xie et al. and other baselines. The paper of Xie et al. is a very high-level discussion on the possibility and challenges of using polynomial activation functions to enable inference on encrypted data. They propose approximating standard activation functions (Sigmoid, ReLU) with polynomials of as low degree as possible. They do not present any details of an implementation, or any experimental results. The method they propose has several limitations when compared to our proposal:
(a)	Approximating Sigmoids/ReLUs require polynomials of higher-degrees which make the evaluation slower, and deeper nets infeasible.
(b)	Training nets with their proposed approximation is harder since higher degree polynomials have more local-optima points and since they shoot to infinity faster. We have made attempts to train networks using their scheme and could not achieve accuracies close to the accuracies we have presented in this work. The best accuracy we managed to obtain was below 98% compared to 99% we achieved with our proposed method.
(c)	Their description does not include any discussion about encoding, parameter selection, and batching which are key aspects. Therefore, there are insufficient details to allow implementing their proposal.

A recent paper (January 2016) has presented a secure implementation of a linear model (McLaren et al. “Privacy-preserving genomic testing in the clinic: a model using HIV treatment”). Their solution can apply only linear models with an overall latency of ~12 minutes. In comparison, we have demonstrated similar latency when applying a neural-network. Moreover, our implementation can process 8192 instances simultaneously. Therefore, despite the fact that we are using a more sophisticated model, we have similar latency but 4 orders of magnitude better throughput! We will add the reference to this newly published paper to support our claim for high throughput. 


Assigned_Reviewer_4 suggested that the contribution of this work is limited since any engineer can do it with no difficulty. As far as we know, there is no prior work showing that it is possible to apply neural-networks securely to data in a practical time. While in retrospect it may be possible to find all the building-blocks in existing literature, pointing to the right components and meshing them together requires innovation and multi-disciplinary approach. We see it as a success of our work that we were able to present a blueprint that allows every engineer to build a secure neural network. 

Another demonstration of the non-triviality of the solution can be found when contrasting our work to the work of Orlandi, Piva and Barni (“Oblivious neural network computing via homomorphic encryption”). They tried to address the same problem of applying neural networks securely. However, they could not find a way to apply the non-linear activation functions to the encrypted data and therefore suggested that any-time an activation function is to be applied, the data will be send back to the client who will decrypt it, apply the activation function, encrypt the results, and send them back to the server. Therefore, their solution is more cumbersome, requires additional rounds of communication, leaks more information about the model to the client and much slower due to the communication delays. Moreover, it does not allow the server to apply the model to data stored in the cloud while the client is offline.

It is interesting to note that many cryptographers see Homomorphic Encryptions as too slow for practical use. For example, Naehrig, Lauter, and Vaikuntanathan wrote a paper in 2011 with the title “Can homomorphic encryption be practical?”. To achieve our results, we had to match between insights from cryptography, engineering and machine-learning. For example, on the machine learning side, we had to carefully choose our network so that it works well with homomorphic encryption but still yields high accuracy. On the cryptography side, we had to carefully consider how to encode both the input data and the network parameters, and how to fine-tune the encryption parameters for optimal performance (e.g. by very carefully studying the noise growth properties of the encryption scheme). By working towards improving both performance and accuracy from different directions, we are able to obtain the results that we have. 

We thank the reviewers again for their valuable comments and suggestions. We hope that the explanations we provided will help them when considering this work for publication.