We would like to thank the reviewers very much for their feedback and helpful suggestions. $ We will expand our discussion of how our runtime bounds compare to other methods, such as exact projection or projection via randomized PCA. We would like to point out that, aside from performing a full SVD, which is infeasible on many matrices, any algorithm approximating PCR will have a dependence on the eigenvalue gap. Iterative techniques such as the power method or Lanczos algorithm will depend on both the gap *and* number of principal components desired. However, they *will* have a better gap dependence than our algorithm, so as noted by Reviewer 2 there's a tradeoff. We will discuss in more detail. Roughly speaking, any technique with no gap dependence (e.g. typical bounds given for randomized PCA), will give an approximation akin to the weaker approximation discussed in Section 3.2 — in which a relaxed form of PCR is performed with large principal components being kept, small principal components removed, and intermediate principal components ‘dampened’. Reviewer 2 correctly noted that we have some dimensional mismatches due to switching between considering problems on an n x d matrix (regression, projection, and ridge regression), and applying the d x d ridge regression operator. We will correct these errors along with the other errors and omissions pointed out. Some responses to specific comments: - “Line 124, this smoothed projection operator seems funny, since wouldn’t we want something like A*inv(A^TA+lambdaI)*A^T instead of inv(A^TA+lambdaI)*A^T*A?” The operator you give is also a smoothed projection operator, projecting vectors in R^n smoothly to A’s column principal components. We use the d-dimensional operator because our PCR algorithm first computes A^Tb (a d-dimensional vector) before performing regression using this vector. It is a good point that at this point in the paper it is confusing why we are using the d-dimensional operator and not the n-dimensional one, and we will clarify/add a footnote. “Summary above Lemma 2.1: do the iterative Hessian Sketch (Pilanci and Wainwright) and LSRN (Mahoney et al.) give the same bounds?” - The bounds we cited are typical results achievable with stochastic gradient type methods, however, depending on the input matrix, faster runtimes can be obtained with traditional iterative methods and sketching methods. Both the Iterative Hessian sketch and LSRN have cubic dependence on the dimension of the data points (d in our paper) but avoid a condition number dependence. Our cited runtime will be faster whenever kappa_lambda < d. Regardless, the runtimes given by Hessian Sketch and LSRN are similar (somewhat higher in the most general case) to the runtimes achievable by using the Nelson, Nguyen and Cohen et al. papers that we cite to construct randomized preconditioners. Any of these algorithms can be used with our PCA projection and PCR algorithms. In the final version, we will clarify specifically what papers our regression runtime bounds come from and what solvers those papers use. - “For all 2, would it be possible to combine the ideas more fundamentally and only make one iterative pass rather than 2?” We did not specifically attempt to develop an ‘integrated’ algorithm like this, however, we think it is a good question, and feel that some sort of integrated iteration is likely possible. The reason that we considered projection separately and then showed how to obtain PCR was largely to simplify analysis. “Section 5.1, is this polynomial new or not? If not, please cite.” - As far as we can tell, this specific polynomial has not been considered before. However, similar polynomials based on integrating functions that are large near 0 (see Lemma 5.2 for this interpretation of our polynomial) have been considered, and we will add the correct citations.