We would like to thank all the reviewers for their productive feedback.$
Assigned Reviewer (AR) 1:
1. ...unifying different type of AEs...approximations given in theorem 3 for casting dae in general form.

Prop 1 &amp; theorem 1 yield the underlying conditions under which AEs learn sparse representation. Since DAE satisfies this condition (as shown by taylor's expansion in theorem 3), we only claim dae learns sparse representation. Thus unifying different AEs is only done in the context of sparsity and not otherwise.

2. ...how these results could generalize to deep AEs...

It is non-trivial to analyse the objective of deep AEs. Our analysis serves as a precursor to this goal. Currently we don't even have any study that reveals why single layer AEs learn sparse representation.

3. I find result of Th3 unclear...valid for very small corruption...Please clarify.

Using Taylor's expansion makes the analysis of DAE possible. The assumption of small corruption comes as a result, which is also used in "Alain and Bengio(2014)" and "Chen et. al. (2014)". We doubt this can be avoided if we intend to analyse DAE for sparsity.

4. ...I do not understand eq 7...second order derivative...should be in expectation.

It is a typo, the entire term is in expectation. Thank you for pointing out. We have corrected it.

AR5:
1. ...reconstruction residual vector...is i.i.d. random variable with Gaussian...is clearly an approximation.

While minimizing sum of squared error on reconstruction loss, we already assume each residual element follows zero mean Gaussian. The i.i.d. assumption does seem strict, but our predictions do hold true empirically, thus the assumption is justified.

2. ...Some of CIFAR-10 results...moved to appendix...

We moved those results back to main paper but forgot to remove the footnote.

3. ...subsection 3.2 the authors state that increasing regularization...increasing sparsity...later...decreasing sparsity...have I misunderstood something?

Thank you for pointing out, the latter is a typo, the sentence should be "increasing sparsity with increasing coefficient" as seen in the figure &amp; claimed in section 3.2.

AR6:
1. I disagree with...Sparse Distributed Representation (SDR) constitutes fundamental reason behind success of deep learning.

SDR is at the very foundation of deep learning. Please refer to
Bengio, Yoshua, Aaron Courville, and Pierre Vincent. "Representation learning: A review and new perspectives." TPAMI 2013.
Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends in Machine Learning 2.1 (2009): 1-127. 
Hinton, Geoffrey E. "Distributed representations." (1984)
Otherwise there is no reason to prefer deep learning over shallow models like SVM applied on data directly. And as a result, our analysis of AEs for sparsity is an important step towards fundamentally understanding how neural networks learn SDR.

2. Authors state SDR captures the generation process of real world data...claim needs clarification...

We have provided relevant references in 1st paragraph of introduction:
Patterson et. al. (2007), Olshausen &amp; Feildt (1997), Hubel &amp; Weisel (1959)

AR 1,5,6:
To the best of our knowledge, our work is the first to formally analyse conditions under which AEs learn sparse representation. While doing so we have shown how various AEs/activation functions share some very fundamental characteristics (leading to sparsity) although they were proposed with different underlying intuitions as goals. We believe these properties will certainly help in any future research on AEs, including analysis of deep AEs, in terms of understanding the representations learned, since learning good representation is the very goal of deep learning.