Paper ID: 1130 Title: Greedy Column Subset Selection: New Bounds and Distributed Algorithms Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the problem of column subset selection (CSS). They propose several variants of the greedy algorithm for CSS. Performance analysis of the simple greedy as well as the distributed greedy is given. Empirical evaluation of the proposed CSS strategies is given on MNIST and 20Newsgroup data sets. Clarity - Justification: The paper is well written and easy to follow. Important ideas and related work have been well discussed. Significance - Justification: The paper improves upon previous work on the theoretical analysis of the greedy algorithm for CSS. However, the improvement of the bound seems not significant. On the other hand, I think it is quite interesting that the authors are able to leverage techniques for submodular maximization to scale up the greedy implementation for CSS while preserving similar theoretical guarantees. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I have several questions below and hope that the authors could address them. -- In line 255, do we need to make sure that previously selected columns in B cannot be chosen again? Given the description, it looks like that identical column can be chosen for k rounds. -- In line 301, should it be k instead of k' ? -- As stated in Theorem 1, it seems that one needs to choose at least r columns (r is potentially much larger than k) using the greedy algorithm so as to bound the performance of the greedy output within some factor of the optimal solution of size k. But this is really not bounding the performance of greedy algorithm when it is restricted to only output k columns. I am wondering if there is any way to modify the analysis to give a more useful bound for this scenario? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper studies the column subset selection problem (technically a slight generalization) and specifically the natural greedy algorithm for this problem (find the column with best improvement to the objective and add it in to the solution set). The two main contributions are: 1. A refined analysis of the greedy algorithm which gives a (1-eps) competitive ratio with O( k/(eps \sigma_{\min}(Opt)) ) columns. This improves previous work by replacing \min_S \sigma_{\min}(S) with \sigma_{\min}(Opt), which is substantially better. 2. A randomized distributed version of this algorithm that partitions the dataset into shards, solves the CSS problem on each shard and aggregates the solution. A multi-round version of this algorithm also produces a (1-eps) competitive ratio but it may requires as many as O(\kappa k/\eps sigma_{\min}(Opt)) where \kappa is the condition number of Opt. The paper considers some extensions involve random projections and also performs an empirical evaluation, using the method as a preprocessing step to classification problems. The large scale experiments (the important ones) are brief and could use further description, but the results seem promising. Clarity - Justification: The paper for the most part is really nicely written. The experiments section I think could be reworked to emphasize the large scale experiments over the small-scale ones. Significance - Justification: Column subset-selection is an important problem and this paper makes good theoretical progress. Both the analysis of the greedy algorithm and the distributed setting are nice. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Generally I liked it. But I do worry about the empirical evaluation. 1. I think in the statement of the multi-round distributed CSS result (Thm 3) is a bit underspecified. The number of columns in the solution is then rk' right? Each epoch produces k' columns and all of them are used in the final r-round solution. So the total number of columns used here \kappa(OPT)/\eps \times k/\sigma_{\min}(OPT) which scales with 1/\sigma^2_{\min}(OPT). 2. Do you know what is the best possible result for the distributed setting? 3. For the experiments, I would prefer to see results on the larger dataset as opposed to the smaller one. Figure 1 is on the MNIST problem right (The caption does not say but it is referenced in the text)? Especially since one of the claims in the paper is a computational one, but it is not apparent in the smaller scale experiment, since the 2-Phase algorithm is faster. 4. On the larger scale experiment, what is the performance of the the 2-Phase baseline? I think this is actually an important point in the evaluation of the paper. If the proposed algorithm and its variants are slower than the 2-Phase variant, then the upside of the new algorithm is minimal. I therefore think experimental results on the larger scale problem, comparing against the 2-Phase algorithm is critical for selling the message of the paper. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents a novel analysis of a well known greedy algorithm for the column subset selection problem. Additionally, the authors describe a distributed implementation of this algorithm. I find the contributions of the paper interesting and significant. Clarity - Justification: This is a very well written paper with the contributions stated clearly up front. Significance - Justification: Important novel analysis of the natural greedy algorithm for the column subset selection problem. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I found the results of this paper worth of publication in ICML. A novel analysis of the greedy algorithm for the column subset selection problem is an open problem in the relevant literature. The distributed implementation of the algorithm is also interesting; my only comment to the authors is to compare their results with those of theorems 62 and 74 in http://arxiv.org/pdf/1504.06729v2.pdf =====