Paper ID: 1051 Title: Principal Component Projection Without Principal Component Analysis Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a method to project onto the top eigenspace of the Gram matrix implicitly, using ridge regression and polynomial approximation. This can be applied to PCR, though not trivially, as the authors point out that it can be ill-conditioned, and hence the authors provide a less naive combination of the projection with PCR. The work is theoretical and it considers relevant theorems and provides understandable proofs. Clarity - Justification: A lot of thought was put into the paper, and it is presented well. Proofs are not all easy, but presented in a nice manner. Significance - Justification: This seems to be a big new idea. It may not lead to a faster method for many applications, but it is a new type of approach and it may be beneficial in many situations. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I am enthusiastic about this paper; I think it is a clever idea that has good implications, and the paper is well executed for the most part, with a lot of thought in it. My main complaint is a combination of the fact that the method may not be especially fast for some situations (since it depends on an eigenvalue gap) and the overstatement of the results in the abstract and the paper. The abstract, and subsequently in the paper, often re-emphasizes that the running time does not depend on the number of top principal components; but it does depend on the spectral gap (or its proxy, gamma) which depends implicitly on which principal components one wants, and it will not necessarily lead to a faster algorithm (it’s easy to think of situations when it is faster, and when it is not faster). The other major complaint is that while most of the paper is quite fastidious, there is a bit of confusion about the right-hand side vectors. In definition 1.1, A_lambda is the same size as A, so it is n x d (it wasn’t clear, but when projecting, it is meant to stay in the same ambient dimension “d”, right?) Thus the right-hand side “b” should not be of size “d” otherwise the squared loss ||A_lambda x - b || does not make sense. I think you want “b” to be of size “n”. Similarly, in the ridge regression section (lemma 2.1), there is an incongruity between eq. (1) (a function of “b” in R^n) and lemma 2.1 (y is in R^d). You really want y := A^T*b, but this is never made explicit. You might want to call the operation in Lemma 2.1 the ridge regression operator, but distinguish it from the full ridge regression problem (1). Minor comments: - Please define P_{A_lambda} in your notation. - Line 124, this smoothed projection operator seems funny, since wouldn’t we want something like A*inv(A^TA+lambdaI)*A^T instead of inv(A^TA+lambdaI)*A^T*A? - Line 175, since gamma is not yet defined, maybe give a reference to an equation later in the paper. - Summary above Lemma 2.1: do the iterative Hessian Sketch (Pilanci and Wainwright) and LSRN (Mahoney at al.) give the same bounds? - In Lemma 2.1, for the error/time bounds, can you please cite the specific papers? - Thm 3.2, maybe mention the relation of delta and delta’ in the statement of the theorem. - Sections 3.2 and 4 were well thought out. - For all 2, would it be possible to combine the ideas more fundamentally and only make one iterative pass rather than 2? - Section 5.1, missing “is” before “uniformly” on line 600. - Section 5.1, is this polynomial new or not? If not, please cite. - Section 6, please mention the dimensions of the matrix for the tests. - Missing space before Rahimi and Recht citation, line 859 - Reference, line 888, capitalize “QR” - Appendix, line 1163 is awkward; Here and line 1174, add “the” before “triangle inequality”. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a methodology for regressing to principal subspace without performing PCA (i.e., without performing eigenanalysis to find the dominant eigenspace of the covariance). This procedure has many advantages (e.g. numerical stability etc.). Full error analysis is performed. Toy experiments, as well as some with real data (MIST) are provided. In general, I find the approach novel and elegant. By only reservation is the limited amount of experiments provided, especially with real data. Clarity - Justification: The paper is easy to read. The derivations are correct (though I did not go through all derivations). Significance - Justification: The paper proposes an important algorithm. Theoretical results and error analysis justify its use. Unfortunately, it contains rather limited experiments. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I believe that the methodology is important and novel. I would recommend to add additional experiments. In general, I am happy to recommend acceptance. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper presents a method to project a vector onto the top principal components of a matrix without explicitly computing these components. This method is based on the observation that ridge regression can be viewed as a "smooth" projection operator. Therefore, the main contribution of this paper is to sharpen this approximation. This is done by approximating the symmetric step function with a low-degree polynomial. The proposed method is applied to Principal Component Regression. Clarity - Justification: This paper is well organized in general and some examples are used to give intuition. However, some of the theoretical results (e.g., those in Section 3.1.) need a little bit more explanation. Significance - Justification: This paper uses the ridge regression problem to approximate the projection, without first computing principal components. The theoretical results are interesting and important. Also, the application to Principal Component Regression is of interest. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I have not verified the accuracy of all theorems in this paper. However, the arguments are strong and I found the results interesting. It would be nice to experimentally compare the runtime and accuracy with the exact projection or approximate projection techniques such as randomized PCA. However, I understand that the focus of this paper is on the theoretical analysis and there are page limitations. =====