Paper ID: 1193 Title: Computationally Efficient Nystr\"{o}m Approximation using Fast Transforms Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a novel method for reducing computational complexity of Nystrom approximation. The contribution of the authors is built from evidence that computational complexity of traditional Nystrom method is dominated by computation of the matrix K. Accordingly, the authors propose to restrict themselves to a family of kernels which needs to compute inner product. For accelerating the kernel value computation, the authors introduce a specific structure on the Nystrom landmarks. By imposing these landmarks to have Haar or Hadamard structure, they are able to compute these inner products in linear time. Clarity - Justification: The paper is clearly written and well motivated. Experimental results support the claims of efficiency compared to state of the art. Significance - Justification: The paper addresses a problem for which several solutions have already been proposed. The contributions proposed here by the authors is relevant and strong. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper addresses a problem for which several solutions have already been proposed. The contributions proposed here by the authors is relevant and strong. strengths + relevant and easy to apply ideas + good experimental results supporting the idea comments * Figure 1 it is difficult to make the difference between blue and red dots. * typos line 229-230 : m^3? line 710 ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper describes a method to accelerate the Nystrom method by forcing a structure on the landmark points. The specified structure considered is a diagonal matrix times a Haar or Hadamard matrix. The diagonal matrix is learned by minimizing the error between the real landmarks and the structured ones. An 2x-10x speedup comparing to the standard Nystrom method is observed during experiments. Clarity - Justification: This paper is well written and easy to read. Significance - Justification: There are several works using structured matrices to accelerate kernel method. But in my best knowledge it is first paper to apply similar technologies to the Nystrom method. And I believe this is important to scale the Nystrom method to large scale problems. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Learning the diagonal matrix (or called the seed) is crucial to the proposed method. However, little experiment is shown in the paper. In particular, I'm curious about 1. How fast the convergence of the minimization problem. It needs to learn both the seeds and an indicator vector, so it may be not so easy. 2. The quality of the learned seeds, and how the quality affects the final approximation error of the Nystrom method. From Figure 2, it seems the learned seeds only outperforms the random seeds a little. 3. How does the minimization problem scale to large problems, namely with different number of examples (m) and dimensions (d). On the experiments, both m (fixed to 2000, if I read correctly on Section 4.4) and d (16-2000) are small. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper is focused on approaches to (especially) speed up the prediction time for Nystroem approximated models by using fast transformations and structured landmarks. Using structured transformations like the Haar-Transform the generation of the Nystroem approximation can be done more efficiently by effective implementations of the respective transformations. Clarity - Justification: Well written - some details could be elaborated in more detail. Significance - Justification: I thing the idea is novel and interesting with clear substantial improvements over state of the art. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): - part until sec 4 is a bit lengthy and repetitive - for all results you should provide mean/std-dev and a significance analysis What data set was used in Table 1? - please make sure that all parameters are sufficiently explained and settings are detailed - it would be good to address the pro/cons of the different transformations and to detail the parametrization - maybe sometimes with a further parameter study (technical report, appendix) =====