Paper ID: 273
Title: Stability of Controllers for Gaussian Process Forward Models

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents theory establishing the stability of systems governed by Gaussian process dynamics. This theory is crucial to the acceptance and use of Gaussian processes for learning control problems. That said, the proposed algorithm for establishing stability, built upon numerical quadrature, is of debatable use in practice. 

Clarity - Justification: It could be noted that paper's contributions serve to provide proof of convergence for models employing an auto-regressive GP, as in Quinonero-Candela et al (2003), irrespective of whether the system is being controlled. These contributions are valuable.  That said, that the paper is primarily concerned with control is a little unclear. Sections 3 and 4 mention the constraints required on controllers π for the theory to hold only in the following passage.  > We consider learned controllers, which depend only on the current state (Markov property) and are differentiable with respect to the state.   I would like to have seen more elaboration upon this point throughout the paper: why are these conditions sufficient? Are they necessary?

Significance - Justification: I suspect many would be interested to see if the theory on convergence could be coupled with additional theory bounding the associated rates. That said, I accept that this is an effort that lies beyond the scope of the paper at hand.   Section 4, on the stochastic stability of Gaussian process dynamics, is of primary significance to users. That said, it should be acknowledged that the bound is conditional on the smoothness of the integrand, as it is this smoothness that enables the theory supporting the quadrature. That said, for the proposed Gaussian integrand, I'm sure that this will not invalidate any of the paper's conclusions.  My major concern about the approach is its practical applicability in establishing the stability of real systems. Specifically, I'm concerned by the fact that the proposed algorithm is constructed upon a grid. Intuitively, the grid should fail to scale to high dimension: any bound constructed with such a grid should surely get uselessly loose in high dimension. Explicitly, I'd expect even three dimensions to present concerns for the approach. This is cursorily addressed by the authors in the following passage.  > Of course, this will not hold for systems with many state dimensions and our particular setup of product quadrature rules, as these rules suffer from the curse of dimensionality. However, there are various approaches to overcome this drawback of NQ and our analysis holds for arbitrary quadrature rules.   This last sentence needs far greater support. Which approaches are described? Which quadrature rules would the authors suggest?  The empirical results presented for the algorithm are interesting, but illustrative only. It would be helpful to supplement the figures with quantitative measures of the performance of numerical quadrature, moment matching and Monte Carlo, gathered over many repeats of the experiments. The empirical description is also a little vague:  * "a grid on the state space is defined": containing how many points? * "After a sufficiently long time": how long? * how many samples are used for Monte Carlo? * "However, we found that the real-world results are substantially better than this worst-case bound" by how much? * In line 829, I suspect that the statement "resulting distributions after 1.2 seconds" refers only to the simulated time spent only in the mountain car world, rather than that capturing the wall-clock time elapsed during the run of the algorithm itself. Either way, this should be clarified.  * Similarly, the "100s" mentioned in the caption of figure 6 and in line 841 should be more crisply described. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): It would be helpful to explicitly write out the form of M at some point around line 415. 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper provides a way for proving stability of feedback control systems where the dynamics function comes from a GP (either as mean prediction only or as mean and covariance). The paper sketches algorithms for constructing stable regions based on positive invariant sets. This is done by discretising the state space with a variable-spaced grid (based on the derivative of the GP mean) and fitting a metric ball inside the stable area. For the GP prediction with uncertainty, there is an additional step of numerical quadrature involved to solve the necessary integrals. The paper shows on simple simulated experiments that the algorithm works and returns meaningful stability regions.

Clarity - Justification: The paper is very interesting, well written and sound (except for small issues with the notation). The main problem I have is that the paper is relatively dense: even for readers familiar with Gaussian processes, control theory and basic stability proves via Lyapunov functions and positively invariant sets, this might be hard to digest.  

Significance - Justification: This paper combines different ideas from machine learning and control to prove stability, which is known to be hard for most nonlinear systems. This clearly adds value to the intelligent control community, but it is also not a substantial breakthrough since it is always in model and it is unclear if the results transfer to cases where the true dynamics function is unknown.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I am unsure if I should try to convince the authors to rather submit this result to a control conference. I think this is a good paper that would be much more interesting for the control community, but on the other hand one strength of the machine learning community is that we are open for contributions of other fields as well, if there is a connection.  I am aware of the space constraints at ICML, but the paper solves two problems which makes it quite dense. I would like to see a version of this paper with the full algorithm and more details in the proofs at some point, since I couldn't follow all details without spending too much time reproducing all equations myself. It could help to focus on only one of the solved problems, so that more space can be spent on the proofs, algorithm and experiments.  What was missing was a time budget for the algorithms. Can they run online, after updating the GP model of an adaptive control framework? (I guess not.) Can I use them to prove stability for an offline identified system on my laptop? (I suppose.) Or do I need 4 weeks on a computer cluster if I have more than two states? Maybe the authors could add a paragraph about the runtime of the algorithms in comparison to the number of states and length scale of the GP?  There is one small problem with the understanding of the GP: In line 215/216 the paper states that the prior on $f$ is $\mathcal{N}(0,K(Z,Z) + \sigma^2_n I)$. This is not correct, since $f$ is a function and the reported distribution is a multivariate normal over the GP output measurements at certain locations. It should simply read $\mathcal{GP}(0,k)$ instead, since the prior on $f$ is a GP prior with zero mean and covariance $k$.  There are also some minor notation issues that the authors should fix for a camera-ready version:  1) The symbol "u" is used for different things (the state evolution of a system and the control input to a system), which is confusing. 2) In equation (3), it is not obvious what $m_i$ is, or should be. I would propose using $\beta_{m,i}$ instead of $\beta_{m_i}$. 3) The symbol "T" means time and transpose, but the time is sometimes also reported in superset. How about using $^\intercal$ instead of $^T$ for transpose? Alternatively, the time T should be in lowercase everywhere.

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): A stability analysis for a controlled dynamical system is proposed for systems that are modelled as a Gaussian Process (GP) and a controller that is given as a state feedback mapping.  Two cases are considered 1) the use of the mean of the GP and 2) the use of the full distribution to describe the dynamics. For both cases, an algorithm based on gridding is proposed to approximate the stability region, i.e. the initial states from which the closed-loop system converges to the target, and it is proven that the algorithms provide a set such that stability is guaranteed for all states in the set, not only for the grid points. For the analysis of the full GP model, finite time stability in probability is considered and a new approximation method based on numerical quadrature for propagating the uncertainties through the system is presented, for which also the approximation errors are derived. The proposed stability analysis is applied to two example problems and it is shown that the method achieves results comparable with the much more expensive MC sampling.

Clarity - Justification: The paper is well structured and overall well written. All important results are clearly stated and proofs are provided. Only the proof of Lemma 3 is not very clear, see detailed comments below.

Significance - Justification: The paper addresses a relevant problem. The analysis tools rely on standard stability concepts and ideas but new contributions are 1) derivation of a bound on the difference between GP function evaluations at two different states (Lemma 1); and 2) the use of numerical quadrature to approximate the propagation of the distribution and the derivation of a corresponding error bound.  Although the paper has some limitations, see detailed comments, it thereby makes an interesting contribution by providing analysis tools for GP-based dynamical systems.  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Some important limitations are: - the proposed stability analysis relies on sampling of the set and will therefore not scale to larger state spaces; - the analysis is restricted to the squared exponential kernel; - the derivation of the error bounds for the numerical quadrature in Lemma 3 is not very clear. The notation C(X) is not defined and the result strongly relies on an analysis in (Masjed-Jamei, 2014), which is not given, making the result obscure. It would be helpful to briefly restate this key result from the literature if possible.  - the example is discouraging in the sense that the derived error bound cannot be used as it makes the quadrature too complex (Section 5.3). It would be important to discuss this issue: is the derived bound generally expected to be overly conservative and impractical? It is also unclear, how the stability region was then obtained for this example? It seems that the illustrated region can in fact not guarantee stability, in which case the comparison should be removed.  - More details should be provided for the examples. What is the time horizon? How are the probabilities chosen? etc.   Minor comments: - It is not clear if the paper requires any assumptions on the control policy. It would be important to provide a discussion. - Proof of Lemma 2: the condition number kappa is not defined. - Introduction: Saying that GPs are standard in control theory is probably a bit too strong. There is an increasing interest, but the techniques are far from being standard. - Introduction: “A stability region in the state space ensures, that all trajectories starting in this region converge to the target point”. This is assuming asymptotic stability. Stability, e.g. in the Lyapunov sense, just means boundedness as also defined in Section 1.1. It should be more clearly specified that the stability region refers to asymptotic stability.  - Section 1.1 second paragraph: It should be x(t) instead of u(t). u is commonly used as the input of a system. - As stability is defined in 3.1, the definition could be omitted in Section 1.1. - Section 1.2: “All trajectories which start in this region are guaranteed to converge to the target state.” This would only hold for the deterministic system. The problem statement should, however, also refer to the full GP model. - Section 1.2: It should be discrete-time instead of time-discrete. - Section 1.2: It should be specified that u \in R^F. - It would be helpful to define the metric ball.

=====