We thank the reviewers for their kind comments and suggestions.$ To Reviewer_1: - The current parameter settings are in the appendix, and we will try to move some of them to the main paper. - Table 1 is the results on MNIST dataset with 60,000 samples (line 300). We will also add this description in the caption of Table 1. - Thank you for the suggestion about discussing the pros and cons of different transformations. Haar transform has better time complexity but worse approximation error compared with Hadamard transformation. We have some comparisons of Fast-Nys with Hadamard and Haar transforms in Figure 2, and we will conduct more experiments in the future. - We will improve the paper based on your other suggestions. To Reviewer_3: Thank you for the suggestions. We will correct the typos and make the figures easier to read. To Reviewer_4: 1. We use an alternating minimization algorithm to find the seeds. The algorithm usually converges to a reasonably good solution in 10 iterations, so we fix the number of iterations to be 10 for all the experiments. For example, on MNIST dataset with k=10, the initial objective function value (using random samples) is 1750260, after 10 iterations it drops to 90041, and the converged solution has objective function value 89872. 2. The difference between learned seeds and random seeds is not that large in some datasets (such as Fig 2b), but for some datasets the learned seeds is much better than random seeds (such as in Fig 2a and 2c). 3. The alternating minimization problem requires O(nd+md) time for Haar transform and O(nd\log d + md) time for Hadamard transform (see line 575-578). The detailed description is in Section 4.3, and we will make this part more clear in the final version. The algorithm can scale to large problems because the time complexity is linear to m (number of landmark points), n (number of samples), and almost linear to d (data dimensionality).