Paper ID: 462 Title: Compressive Spectral Clustering Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper builds on a couple of recent ideas to improve the speed of spectral clustering algorithms. The latter has a bottleneck in both computing the eigen-decomposition, and also running k-means on large datasets. Authors show how by only running k-means on a sample of the data (in fact a very small sample!) is enough to recover a good clustering. Clarity - Justification: The paper is reasonably well written. Significance - Justification: see below. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The bulk of the authors' contribution seems to be improving the performance of k-means on large datasets (very large N, reasonably large k (k=~1,000). However, in doing so they have completely ignored previous work on speeding up k-means for large datasets. In fact, a promising direction is that of _coresets_: small samples of the data that have the property that the optimum solution on the coreset is a nearly optimum solution on the overall dataset. Many coresets are known for k-means, a comparison with one of the known methods would make the paper significantly stronger. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose an approximation of spectral clustering which is faster by a few magnitudes at controllable error rates. Random Graph filtering / random projection techniques are employed. Overall a nice paper sometimes a bit technical which lasts quite a lot on referenced work but widely convincing. Clarity - Justification: The mathematical derivation could be detail a bit more to ease the readability between the steps. But I understand that this may be better captured in a journal paper or an additional technical report to make the paper more accessible. Significance - Justification: Theory and results are convincing and appear to show a substantial novel improvement in the field. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - There are a number of references to related work. It would be good to highlight in more detail the own contributions in contrast to former work - I would prefer that unreviewd work (like technical reports in arxiv) are not directly used to support any claims - at the end the papers in arxiv ar not (yet) reviewed - Experiments should be extended to further graphs - to show that the approach generalizes well - in the related work part it would be good to highlight also why these approaches are not sufficient - In 2.1. you say that L is psd - please add a reference - '... and discussed in Sec. 4.1. ...' - this sounds a bit out of context did you really want to refer to 4.1? - 'All our results may be reproduced with' - I hope ... they 'can' be reproduced ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a time efficient spectral clustering algorithm that combines the recent progress in random sampling and interpolation. The result is a new randomized algorithm that approximates the original spectral clustering algorithm proposed by Ng et al. One issue with the traditional spectral clustering algorithms is the computational cost: thought there exist faster algorithms, it is still computationally expensive. The reasons are two folds: 1. The top eigenvectors of the graph Laplacian are expensive to get 2. when the number of samples grows huge, running the k-means on the N points in the spectral space can also be slow. To address the first concern, the authors propose to use the sampling technique over the graph and guarantee can be proved naturally from the JL lemma. To address the second issue, the paper suggests to impose k-means on a random subset of the samples to get a sketch of the spectral vector over the sample dimension this time, and then reconstructs back the original sample labels by an optimization procedure. Again if certain conditions hold, for example, RIP of the sampling matrix M, then the reconstructed labels are “faithful estimations” of the original one. Furthermore, the authors show there always are ways of sampling such that the number of nodes required so that M satisfies RIP is bounded by k\log(k). Clarity - Justification: The paper is well organized, and references are abundant. I understand due to the page limit, the authors have to put a lot of proofs in the appendix. Significance - Justification: Overall I think this is a good paper. Though most of the proofs in the paper are based on previous result from Puy et al., reducing the complexity to k^2\log^2 (k) is a great improvement when the size of sample grows large. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper proposes a time efficient spectral clustering algorithm that combines the recent progress in random sampling and interpolation. The result is a new randomized algorithm that approximates the original spectral clustering algorithm proposed by Ng et al. One issue with the traditional spectral clustering algorithms is the computational cost: thought there exist faster algorithms, it is still computationally expensive. The reasons are two folds: 1. The top eigenvectors of the graph Laplacian are expensive to get 2. when the number of samples grows huge, running the k-means on the N points in the spectral space can also be slow. To address the first concern, the authors propose to use the sampling technique over the graph and guarantee can be proved naturally from the JL lemma. To address the second issue, the paper suggests to impose k-means on a random subset of the samples to get a sketch of the spectral vector over the sample dimension this time, and then reconstructs back the original sample labels by an optimization procedure. Again if certain conditions hold, for example, RIP of the sampling matrix M, then the reconstructed labels are “faithful estimations” of the original one. Furthermore, the authors show there always are ways of sampling such that the number of nodes required so that M satisfies RIP is bounded by k\log(k). Overall I think this is a good paper. Though most of the proofs in the paper are based on previous result from Puy et al., reducing the complexity to k^2\log^2 (k) is a great improvement when the size of sample grows large. I have three questions: 1. Could the authors explain more about the meaning of “faithful estimations” in line 385? As we all know k-means has the issue of local optima, so the potential c_j could vary with different initial point. Then what does it mean by saying “faithful estimations” of c_j? 2. That is the time complexity to solve the optimization (5)? Could the authors explain in details about the fast algorithms to solve (5)? 3. The paper only demos a case with regular graph that the algorithm works. (klogk on the second step) Though in the paper the author claimed there exist method to recude to k log k for every graph, could the authors provide another example for a general graph? =====