Paper ID: 1184 Title: Faster Eigenvector Computation via Shift-and-Invert Preconditioning Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This is a numerical analysis paper on a fast method for computing eigenvectors. The task is of great importance for many machine learning problems, so it is relevant to the machine learning community. Clarity - Justification: The paper well written and is generally easy to follow at the high level. The meat of the paper appears to be in what the authors refer to as their full paper, which is a 40 page supplement to the submission. Significance - Justification: The authors make a good case that they are advancing the state of the art for asymptotic complexity. It would be nice to see some experimental results to validate that these theoretical improvements yield practical improvements on problems of interest to the machine learning community. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This appears to be nice work on a problem of importance to the machine learning community. I do have some concerns: 1) It is unlikely that any ICML reviewer will have the time to check the proofs carefully. 2) One might ask whether this is a numerical analysis paper or a machine learning paper, and 3) Given that eigenvector computation is a tool, rather than a core problem in the machine learning community, some demonstration of practical significance would be welcome. Despite these concerns, I would lean favorably towards acceptance if at least one reviewer were willing and able to check the results somewhat carefully. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a novel eigen-solver through approximate inverse iterations. The major contribution is solving the linear system approximately using SVRG to get a better dependency on the number of nonzeros of the input and the eigengap. The authors show that using approximate linear system solves, the algorithm still converges quickly to the true solution Clarity - Justification: The paper is well organized. The concepts are explained clearly and it is easy to read. Significance - Justification: Although inverse iteration is not new, the paper shows how SVRG can be effectively used in the linear system solve to get approximated solutions and how such solutions still serve as good iterates in the power iteration. It also interesting, by leveraging SVRG, they get a better dependency that splits the number of nonzeros and the eigengap. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Inverse iteration is a well-known technique in numerical linear algebra for computing eigenvectors efficiently. The bottleneck in directly applying inverse iteration is the linear system solve that has a bad dependence on the eigengap using iterative methods. The major contribution and novelty of this paper is to show that one can use SVRG to get an approximate linear system solve, and using such approximations during inverse iteration, the algorithm can still converge to the true solution. SVRG breaks up the multiplicative dependency on the nnz(A) and the eigengap into additive dependency, thus reduces the overall complexity. SVRG also leads to natural extension in the streaming setting. The idea is interesting and the authors have provided careful analysis for the approximation. My major concern is that Theorem 5 shows the potential function decreases by a constant factor plus a constant error. Through induction, one can only get the potential function will be bounded by a constant error. Then how can one achieve the overall error O(eps) in the end? It seems one must solve the linear system with decreasing additive error in Theorem 5 to obtain an eps error in the end. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper gives a faster algorithm for computing the top singular vector of a matrix (up to error \epsilon) -- a fundamental problem in many applications. The running time they obtain is nnz(A) + d. f(gap), where the latter term depends on the gap between the top two singular values. Algorithms such as the power method and Lanczos have a running time nnz(A) \times 1/f'(gap), which could be much slower for many interesting ranges of the gap parameter. The paper also gives a way to "accelerate" in the case the d.f(gap) term dominates the running time, by appealing to known results. The paper also gives online algorithms, in the model in which we obtain vectors, and the goal is to compute the top eigenvector of the covariance matrix. Overall the paper has significant technical contributions, and it gives a new bound for computing the top singular vector, which is a fundamental problem for matrix data. Clarity - Justification: The authors do a very good job of presenting an outline within the space limitations, but the paper does have references to theorems in the full version, which should be avoided in the final version. Some portions about the initialization could be explained better (see details below). Significance - Justification: The paper considers the shift and invert preconditioning framework to compute the top singular vector of a matrix. This involves (a) computing an approximate top singular value \lambda (the approximation should be within a (1+O(gap)) factor, (b) performing power iteration on B^{-1}, where B = \lambda I - AA^T. The paper uses ideas from the recent developments on "gap independent" singular value estimation to handle part (a) (though there is substantial novelty to make sure there is no (1/gap) overhead), and for (b), the proof goes by showing that if initialized well, having a fairly weak approximation for computing B^{-1} x (for a given x) suffices to ensure progress. Overall, the paper puts together a lot of recent techniques in a novel way to obtain better algorithms for eigenvector computation. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): While the writing is generally very good, there are some portions that could use some more exposition: - How is the solve() procedure in Theorem 8 different from the one from Theorem 5? Lines 759-765 seem to suggest that it is simply an iterated version of the earlier one (described in lines 726-730). It would be nice to use different names, and point out how exactly they differ. - Section 6 is could use a few more details. It appears that some of the shift-and-invert analysis and the solver from Section 4 is used to compute \lambda. If there's an interesting bootstrapping going on, it would be nice to mention it. =====