Paper ID: 224
Title: Fast methods for estimating the Numerical rank of large matrices

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a new method for estimating the numerical rank of a matrix in a computationally efficient manner. The paper ties together several ideas in an elegant way: using the density-of-states (DOS) as a surrogate for the spectrum, and using the Lanczos spectroscopic method in order to approximate the DOS function The DOS function is used to find the best eigengap. Then, the paper shows how to use stochastic estimators of polynomial or Chebyshev filters to estimate the trace of the spectral projector, which yields the rank. Experiments show the new method accurately and efficiently identifies the eigengap between "signal" and "noise", and estimates the numerical rank respective to that gap.

Clarity - Justification: Overall the paper is well written and east to follow. However, some details which might have helped follow the paper are omitted (see comments below). In addition, I found the order in which the authors present the different components of their method to be confusing. Perhaps a better mapping between the "key concepts" section and the following sections would have helped.  I was also missing a "background" section or subsection where the authors explain what is the current state-of-the-art for this problem. I am familiar with rank-revealing QR and various random projection based methods, and I was expecting a deeper discussion of those (and perhaps other methods I'm not familiar with) in relation to the proposed method. 

Significance - Justification: As far as I am aware, the paper present a significantly faster method for estimating numerical rank, compared with existing methods. This is an important practical problem: Given the extraordinary speed gains showed in the paper, this could end up being a widely used subroutine for anyone using spectral methods for large matrices and large graphs. I would also note that this method would probably be of use beyond machine learning since rank estimation is crucial in many applications.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I found the following two issues hard to follow: 1. The paper alternates between discussing scalar functions and matrix functions, e.g. the comment before equation 4 about interpreting P as a step function, or using the functions T_k( ) as matrix functions. While for polynomials this is well defined, in general I found this confusing: for example,  in what specific way are the authors suggesting applying T_k to matrices?  2. The paper alternates between PSD matrices, matrices with eigenvalues in [-1,1], and rectangular matrices. I would appreciate if the authors could make clearer the way they transform one problem to the other. For a matrix with [-1,1] eigenvalues, how does one create a PSD matrix? For a rectangular matrix A, does an explicit product A^T A have to be formed or can this be avoided?   comments:  line 204: why can P be interpreted as a step function of A?  equation 11: what are the \beta_j 's ?   figure 2: I think adding a subplot with the sorted eigenvalues or cumulative singular values could be helpful.  lines 480-489: I found this hard to follow. Where does F(p) come from? Why is this the right way to approach \phi(t) and why is an interpolation needed?  line 587: which delta was used? How sensitive is this choice?  minor comments:  line 159: typo  "the nd"  line 452: typo "Lin Lin"  line 550: typo "is a now"  line 805: typo "trails"

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper, the authors propose two efficient algorithms to esitmate numerical rank for large-scale matrices by two polynomial filters: McWeeny filter and Chebyshev filter.

Clarity - Justification: See detailed comments below.

Significance - Justification: See detailed comments below.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Pros:  - This paper is well written and easy to follow.  - The proposed solutions include detailed routines to select both the threshold parameter and the corresponding degree used in each polynomial filter.  - Empirical results demonstrate the efficiency and accuracy the proposed algorithms.  Suggestions:  - A comparison to an other numerical-rank estimation approach will be more convincing. For example, the serial version of the randomize algorithm using Chebyshev polynomial proposed in [Zhang et al, 2015]  - Emphasis more on the significance to the machine learning community. Otherwise, this paper is more like a paper in the area of pure numerical linear algebra. 

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes rank estimation for large (sparse) matrices using stochastic trace estimation on the column space projector. The projection is approximated by a polynomial of the matrix. The method only uses matrix vector multiplications. The method is supported by numerical experiments establishing the quality of the approximation and the performance. 

Clarity - Justification: The paper is well written. The theory is presented in an accessible manner with fortunate selection of notations. The paper is well organized and the contribution is clear.

Significance - Justification: Rank estimation is an important sub-problem in many applications. The menuscript gives a good set of examples and hence motivations. The analysis of and working with large datasets has become very important and the paper has a potentially large audience as a result.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The only weakness is with the selection of the experiments demonstrating the accuracy and performance. A single (sparse) example is mentioned with size 1.25x10^5. This reviewer encourages the authors to run on at least two more very large matrices. Synthetic experiment is ok, one could simply generate a sparse, rank d matrix as XX^T with sparse n x d sparse X. This would also allow one to verify the qulity of the approximation for very large matrices.  The run-times for the experiments presented in table 1 should also be listed with the base line using SVD included.  l. 241: typo, "supplementatary" l. 246: typo, "approximatedby" l. 289-295 line spacing affects readability l. 396-399 this is a bit repetitive, the basic idea has been laid out in earlier section (2.2) l. 452 check citation, Lin v. Lin Lin  

=====