We thank all reviewers for the feedback.$ Reviewer 1: - re: theoretical result. Turned out we were using a weak result for the eigendecay in a Sobolev class. Once we used the correct decay, we got the correct rate. Thanks for pointing this out! - re: choice of features/ predictor: This is given in section D of the appendix along with our reasons for doing so. - re: randomness/ noise in the experiments In our experiments, we didn't shuffle the data as you have done. We just split them randomly into one training and testing sets and reported the error on the test set. However, we agree that shuffling and reporting the averages are a better comparison. (We did something similar in our synthetic experiments (Fig 1) by generating independent datasets on each run.) But we reran the real experiments and are reporting the mean and the std-error (std/sqrt(# experiments)) at the end of the rebuttal. (It is customary to use the std-error as opposed to the std, e.g. when two sample testing two gaussians with large variances, you collect several samples and compare the empirical mean against the empirical-std/sqrt(# samples)). The rankings seem to be about the same although as you point out some datasets are still noisy (e.g. blog). We will rephrase some of our text accordingly. Reviewer 3: Bach 2009: Thanks for the reference - we will cite this. Bach's paper is on MKL whereas we do not really intend to learn the kernel. Our use of the additive kernel is to reduce the complexity of the class and is very different to the goal of Bach. Efficiency of Computing eqn(5): This is one of the key points in the paper and can be computed in O(Dd^2) time as opposed to $O(D^d)$ time. It is explained in the 2-3 paragraphs after (5). Reviewer 4: Which kernels ? Any kernel which can be written as a tensor product of the kernels on each dimension can be used. For e.g. in the Gaussian case, if we use the same bandwidth for all D-choose-d kernels, or pick bandwidths h_1,h_2,... h_D and use the corresponding h_i whenever the i^th coordinate appears, it can be written this way. In our experiments, we use the latter approach. We will add more detail in the paper. re: presentation of empirical results: Thanks for the suggestions, we agree with you mostly and will incorporate those changes. Results ======= The results are the "avg Test Sq.Error over 30 runs" +/- "std-error", "Time" They are given in sorted order. Not showing some poorly performing methods at the end due to 5000 character limit. Airfoil Method: GBRT, Err: 3.699211e-01 +/- 0.006368, Time:5.36 s Method: MARS, Err: 4.900934e-01 +/- 0.006054, Time:86.20 s Method: SALSA, Err: 4.910758e-01 +/- 0.006675, Time:21.20 s Method: M5P, Err: 4.960106e-01 +/- 0.016050, Time:17.86 s Method: LL, Err: 4.974269e-01 +/- 0.007112, Time:11.58 s Method: LASSO, Err: 4.977163e-01 +/- 0.006629, Time:1.88 s Method: KRR, Err: 4.994275e-01 +/- 0.006746, Time:1.49 s Method: LQ, Err: 4.997986e-01 +/- 0.006803, Time:24.63 s Method: SV-nu, Err: 5.120234e-01 +/- 0.007184, Time:12.81 s Method: LR, Err: 5.125612e-01 +/- 0.007022, Time:0.04 s Method: COSSO, Err: 5.447684e-01 +/- 0.030788, Time:31.06 s Method: regTree, Err: 5.478079e-01 +/- 0.014028, Time:0.24 s Method: GP, Err: 5.777450e-01 +/- 0.032375, Time:14.13 s Method: SV-eps, Err: 6.317633e-01 +/- 0.008014, Time:1.55 s Method: KNN, Err: 8.139842e-01 +/- 0.012503, Time:0.19 s Method: NW, Err: 8.435121e-01 +/- 0.012045, Time:4.33 s Method: BF, Err: 9.553430e-01 +/- 0.012355, Time:82.89 s ... Speech Method: MARS, Err: 1.886635e-02 +/- 0.000425, Time:7.44 s Method: SALSA, Err: 3.165091e-02 +/- 0.000935, Time:6.73 s Method: GP, Err: 3.412899e-02 +/- 0.001083, Time:4.97 s Method: LQ, Err: 3.482135e-02 +/- 0.004744, Time:6.22 s Method: GBRT, Err: 4.169216e-02 +/- 0.000936, Time:3.58 s Method: KRR, Err: 4.378518e-02 +/- 0.001422, Time:0.63 s Method: M5P, Err: 4.927832e-02 +/- 0.001861, Time:5.15 s Method: regTree, Err: 6.713757e-02 +/- 0.001804, Time:0.10 s Method: SV-nu, Err: 8.394053e-02 +/- 0.003686, Time:18.81 s Method: LASSO, Err: 8.968835e-02 +/- 0.002718, Time:16.57 s Method: LR, Err: 8.978255e-02 +/- 0.002685, Time:0.02 s ... Skillcraft Method: SALSA, Err: 5.385468e-01 +/- 0.005357, Time:75.99 s Method: GP, Err: 5.390974e-01 +/- 0.005454, Time:114.01 s Method: KRR, Err: 5.505176e-01 +/- 0.005797, Time:10.10 s Method: GBRT, Err: 5.677342e-01 +/- 0.006277, Time:10.74 s Method: SV-nu, Err: 6.545533e-01 +/- 0.006909, Time:59.35 s Method: SV-eps, Err: 7.001738e-01 +/- 0.007419, Time:5.54 s Method: LASSO, Err: 7.104703e-01 +/- 0.098104, Time:2.56 s Method: MARS, Err: 7.306918e-01 +/- 0.188332, Time:16.76 s Method: RBFI, Err: 7.966417e-01 +/- 0.008776, Time:1.01 s Method: LR, Err: 8.061446e-01 +/- 0.117273, Time:0.04 s Method: SI, Err: 8.786962e-01 +/- 0.009332, Time:5.40 s ...