Paper ID: 974 Title: How to Fake Multiply by a Gaussian Matrix Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proved properties of random projection via a composition of matrices (a Gaussian matrix times a CountSketch matrix). An application of this is that, under some settings, using this composite random projection lead to faster implementation of a particular solver for separable NMF. Clarity - Justification: Some more intuition of the CountSketch matrix and the composition would be useful. I thought the structure of the paper actually obscured its main contribution by placing everything in the context of separable NMF. The main technical contribution is Theorem 2, which seems generally useful. Separable NMF is one nice illustration of the method, but it's a very specific setting (separability is too strong a requirement in most cases). Significance - Justification: The question of obtaining faster random projections with good guarantees is interesting, and I liked the technical result of Theorem 2, which seems useful in many contexts. Just to clarify, the CountGauss transform, T, has been used previously, right (e.g. Clarkson and Woodruff'13)? But your analysis (Theorem 2) is new. It seemed strange then to say, "we propose a new randomized transform". Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, I think this is an useful result and separable NMF demonstrates its applicability. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper analyses the "CountGauss" method, which is a method for "fake" multiplying a matrix by a Gaussian matrix. In particular, oftentimes we need to multiple a tall skinny matrix by a Gaussian matrix on the left. The authors prove that if we apply the CountGauss (basically a CountSketch, followed by a Gaussian matrix), then the result is close to having multiplied by a fully Gaussian matrix, up to a small total variation distance. The advantage is that, for such matrices which are also sparse, we can reduce dimension fast. The authors apply the method to Non-negative matrix factorization (NMF), and show empirically it often obtains comparable quality, but faster results, than just multiplying by a fully random Gaussian. Clarity - Justification: The theoretical results are presented well, but the experimental results are not presented particularly cleanly. It is hard to read how the quality/time compares to other methods. For example, in table 1, what do the different numbers represent? Significance - Justification: I found the main theorem to be very satisfying and putting the paper in the accept regime. Although the algorithm has been proposed before, and used for speeding up algorithms where multiplying by Gaussian matrix is useful, it is a very clean statement that merits to be "out there". Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I recommend acceptance, based mainly on the theoretical contribution of the paper: that multiplying a matrix by the "CountGauss" is close to multiplying it with a fully-random Gaussian matrix, up to small total variation distance. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a randomized transform which improves the calculation of random projection via multiple a i.i.d. Gaussian matrix. It shows application in NMF and SVM. Clarity - Justification: It is well presented in general with the necessary background reviewed and contributions emphasized. Significance - Justification: The proposed method has theoeretical justification as well as demonstration via numerical examples. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1. In the statement of Theorem 2 and 3, I would like to see the constants replaced by a generic value, e.g., 1/10, and of course specify how the result depends on the constant. 2. In the numerical comparison about the speed, e.g., FIgure 6, it seems like the cost difference is very little before the data has dimension 2^14. In addition, although the time is much less for the new method, the total time is only 5 seconds, which is very small. I am wondeirng whether the experiment can be extended to higher dimensions from which we may see a more significant edge of the new method. =====