REFEREE 1$ I think that the paper might have made a much stronger point if it also compared the proposed method to recently published GP-based approaches. Response A valid comment. A comparison between GP-based methods and the RKHS approach of Gonzalez et al. has recently been published elsewhere (Macdonald et al., Biomedical Engineering Online 2016, http://eprints.gla.ac.uk/113366/), and the methods were found to be on a par. We have therefore focused on the comparison with the RKHS method of Gonzalez et al. REFEREE 2 All the other components, i.e. the kernel regression and gradient matching, have already been used in parameter estimation of ODEs. […] I am wondering whether this would be of interest for the ICML audience, especially since the paper does not bring something new from a learning perspective. Response Papers on parameter inference in ODEs have been included at previous ICML conferences (see references [1] and [2] below); so the topic of our paper does not seem out of scope. You are right that previous papers have also used kernel regression and gradient matching. However, "kernel regression" and "gradient matching" are very broad, general methodological frameworks, like "Gaussian processes (GPs)", which were used in [1,2]. It is the specific form to the model (eqns 13-19) and the training scheme (theorem in Sec. 3) that "offers something new from a learning perspective". The limitations of GPs in this context were clearly pointed out in [2], which has motivated our work on an RKHS-based method as an alternative. We have shown that this new method performs significantly better than state of the art. We would be grateful for specific pointers to what we need to do additionally to make our work publishable in the future. References [1] Wang & Barber, ICML 2014, JMLR & WCP 32, 1485–1493 [2] Macdonald et al., ICML 2015, JMLR & WCP 37, 1539–1547 REFEREE 3 It's not clear that the new method is a significant novel extension of Gonzales et al. possibly because of the presentation. Response Due to space restrictions, it is difficult to describe Gonzalez’s method mathematically and discuss its intrinsic limitations comprehensively while leaving enough room for the presentation of our own method. Retrospectively, it would have been better to leave it with a high-level non-mathematical summary and refer the reader to Section 4.1 of the authors' own paper (Pattern Recognition, 2014), where the limitations are clearly described. In essence, Gonzalez’s method is based on a linear operator, which is not applicable to non-linear ODEs. Nonlinear ODEs need to be linearised via dummy variables obtained from standard nonlinear regression, leaving the method at the mercy of the quality of this external scheme. Our method is conceptually different and avoids this linearity constraint. The practical consequence is that in our approach, the ODEs properly regularise the nonlinear interpolant from the regression. The resulting improvement is illlustrated in Fig. 3. It is also not clear that there are significant performance improvements. […] I think statistical significance is a strange way to present results. Response We have evaluated predictive performance (difference in function space) and explanatory performance (difference in parameter space) on several independent data instantiations. In addition to showing the parameter estimation errors (Figs. 1a, 2a, 5a, 6a), we have shown the error differences, as you suggested (Figs. 1b, 2b, 5b, 6b). Sorry for not making that clearer in the text. Due to space restrictions, the explanation has only been given once, in the caption of Fig. 1. From these distributions we have computed the statistical significance levels (Tables 2 and 4), which show that in 12 of the 18 cases, the improvement over the method of Gonzalez is significant, and in 6 cases there is no significant difference. This follows established statistical evaluation procedures. If according to your assessment this is insufficient, we would be most grateful for the suggestion of an alternative procedure that would be required to make our work publishable in the future. Additional responses to your queries We use standard kernels (RBF and MLP) whilst Gonzalez (2014) use the approximated kernel K_G. As such, eq. (14) is not directly comparable with (10-12). Gonzalez (2014) linearise the nonlinear ODE before estimating ODE parameters, and in doing so obtain an analytical form of their (equivalent to our) alpha parameters (see eq. 10-12). However, we do not perform linearisation and instead deal with the whole nonlinear ODEs (eq. 14) resulting in a nonlinear optimisation and no closed form solution for b. The dimensionality of this optimisation is high, and direct optimisation, whilst possible, is slow. This motivates the iteration (eq. 19) where in each round b is optimised analytically conditioned on the current values of theta and b~, backed by the theorem in Sec. 3.