Paper ID: 563 Title: A Subspace Learning Approach for High Dimensional Matrix Decomposition with Efficient Column/Row Sampling Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a novel method for sparse and low-rank matrix recovery. Their method crucially relies on adaptive sampling of rows and columns (as opposed to uniform sampling). Clarity - Justification: The paper is well written and easy to follow. The formatting seems wrong though, e.g., font / formatting is not consistent with all other submissions I’ve seen. The authors should check that they are using an incorrect style file Significance - Justification: In standard low-rank matrix approximation, there has been a rich body of literature in developing adaptive sampling methods to improve theoretical and empirical accuracy. It is thus quite natural and interesting to explore similar ideas in the context of sparse+low-rank modeling. However, the authors are misleading in the way they present some important aspects of their work / contributions (as noted in the detailed comments below). If this were a journal, I'd suggest a revise-resubmit, but the inaccuracies in this paper are too substantial for acceptance into ICML as is. That said, I do think the high-level ideas of this work are quite interesting. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - line 110: ares -> are - The results in Thm 1 only hold under the random orthogonal model setting assuming no noise, which simplifies the analysis but is not realistic or general. In contrast, the results from Mackey et al tackle the more general noisy setting and do not make the random orthogonal model assumption. The authors highlight their improved sampling complexity as a main contribution, but it is quite misleading to compare these two bounds, as they are not an apples-to-apples comparison. - line 132: Several different methods are proposed in Mackey et al 2011a for low-rank matrix approximation, including the generalized Nystrom method but also column sampling and random projection. line 187: why not include the proof in the appendix? line 290: it seems that this online variant is at odds with the idea of adaptively selecting rows/columns. Also, the column-projection approach in Mackey et al 2011a also seems amenable to online processing of datapoints, albeit in a minibatch context. line 298: This isn’t accurate, as low-coherence implies small leverage scores. In other words, by working with incoherent matrices, random sampling is in fact nearly optimal relative to sampling from leverage scores. Section 3.2: Several other ‘adaptive sampling’ methods have been proposed in the context of low-rank matrix approximation and should be cited/discussed to make it clear that many of these ideas are not novel (though the application of these ideas to sparse+low-rank domain seems to be novel afaik). See for instance [Kumar et al, Sampling Methods for the Nystrom Method, JMLR, 2012] for details. - Would it be possible to extend the work of Mackey et al 2011a to also allow for adaptive sampling? Figure 5: These are promising results. Section 4.3: Mackey et al 2011a present results for their DFC method on a similar dataset. Have the authors tried DFC on this particular experiment and compared the results with the proposed method? I suspect that the results will be comparable, and thus I am skeptical about the usefulness of the results presented in this section. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper tries to solve the low rank+sparse decomposition problem for large matrices. The new approach is based on first sampling a subset of rows/columns, then apply traditional SDP based algorithm on subsampled columns, then do some post-processing. The running time is better than previous algorithms. Clarity - Justification: The main part of the paper is clear. The part that talks about better sampling technique is a bit confusing (as explained below). Significance - Justification: A clean way to improve the running time of low rank+sparse matrix decomposition. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The traditional ways to solve this problem is by a semidefinite program, which can be slow in practice. Previously there were also algorithms that use sampling techniques. This paper proposes a clean approach that combines a sampling based procedure and the SDP based approach, which improves the running times of the previous approaches. The main idea of the paper is to learn the column subspace by subsampling columns of the original matrix, and apply low rank+sparse matrix on this much smaller matrix. Once the column space is determined it uses an l_1 regularizer to find the low rank+sparse representation for each column of the matrix. The algorithm is fairly clean, and the analysis mostly follows from combining the works in low rank+sparse decomposition and column sampling techniques for preserving the subspaces. The paper also proposes a better way to reduce the number of sampled rows/columns. However, it seems like there the paper still needs to assume a uniform sampling of C_r r rows to contain enough information. Then it is not entirely clear to the reviewer why this procedure gives any benefit. The sampling procedure in Algorithm 3 still uses the SDP algorithm on a matrix with C_r r columns/rows so it does not really seem to reduce the amount of computation required. The authors should make sure to emphasize what are the assumptions required for the better sampling algorithm and what is the benefit. Overall I think this is a reasonable paper that gives a clean analysis for a faster low rank+sparse matrix decomposition algorithm. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Authors present a low-rank+sparse decomposition method which utilizes column and row subsampling to mitigate the computational burden. Additionally, authors present iterative methods for choosing row and column samples which are plausibly superior to uniform sampling strategies. Clarity - Justification: I can't say "excellent" because the material is inherently challenging, but authors should be commended for compressing a lot of content into the page limit while retaining intelligibility. Significance - Justification: A robust scalable low-rank+sparse decomposition primitive could impact multiple areas of machine learning, especially if a software implementation is widely accessible. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): . =====