Paper ID: 1359 Title: Slice Sampling on Hamiltonian Trajectories Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a new MCMC method that combines elements of slice-sampling and Hamiltonian Monte Carlo, generalizing elliptical slice sampling. They show a relationship between the hamiltonian trajectory with a gaussian likelihood, and the path defined in elliptical slice sampling, parameterized by a single variable. They note that hamiltonian trajectories in the space of uniform variables are straightforward to simulate, and that through the PIT these trajectories can be transformed to obtain desired invariance properties (i.e. matching the prior or incorporating the likelihood). They compare these trajectories to those of elliptical slice sampling, and, and run experiments on different models to show their method's efficiency. Clarity - Justification: The writeup is clear and concise, the notation is clear, and the language is easy to understand. Significance - Justification: This paper does a great job unifying two MCMC methods, the combination of which may provide easy-to-tune algorithms that are likelihood-evaluation efficient. Tuning HMC is a big area of research and this paper is poised to impact a wide variety of HMC use cases. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I am excited to see this algorithm applied to more complex, high dimensional probabilistic models. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors proposed a slice sampling algorithm built upon the Hamiltonian Monte Carlo. The algorithm replaces the leapfrog moves in the original HMC by a slice sampling on the trajectory. The idea is motivated by the Elliptical slice sampling, and relies on the analytical solution of the Hamiltonian system. The algorithm itself is simple and easy to implement and the experiment results showed some improvements over the original HMC. ================ updates =================== After a second reading, I realized that I have misunderstood the main part of the article. It does not simply replace the leapfrog moves but rather using the Hamiltonian system only on the prior part, and generate the analytical curve for slice sampling using the likelihood Clarity - Justification: The presentation is overall clear. =============== updates ==================== I think it will be better to explicitly write out the Hamiltonian system used in Algorithm 1. I was confused initially and thought you were using the same Hamiltonian system as the standard HMC for posterior sampling, while it is not. Significance - Justification: The novelty of this paper might be limited. 1. As noted in Neal (2012), when the analytical solution to the Hamiltonian dynamics is available, "the middle full step for q, which in ordinary leapfrog just adds εp to q, is replaced by the analytical solution", i.e., all leapfrog moves in one iteration can be replaced by a single jump based on the analytical solution. This strategy can tremendously reduce the computational cost -- i.e., the original HMC is also able to cut cost by using analytical solutions and I don't see any advantage of using slice sampling here. 2. For most likelihood functions, the corresponding Hamiltonian systems are intractable, which motivates the use of leapfrog integration to simulate trajectories. As a sampler for general likelihoods, relying on analytical solutions is sort of going backwards. 3. The proposed algorithm is only possible when the full analytical solution is available. On the contrary, the original HMC algorithm can gain its efficiency when only "partial analytical solutions" are available (Neal, 2012). 4. Applying probability integral transformation to multivariate distributions is impractical. In addition, if the model parameters could be decomposed as collections of conditionally independent variables, then applying the old-fashioned slice sampling would be adequate to finish all the work gracefully and efficiently. Adding extra Hamiltonian steps is likely to gild the lily. =================== Updates ======================= These comments are useless now as I realize that the proposed method is actually very interesting and useful in practice! Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Most of my points have been explained in previous comments. Followings are some minor comments on the experiments. 1. Is the comparison between HSS and ESS really necessary? The current results only downgrades the merit of HSS. 2. The comparison between HMC and HSS is unfair. As mentioned before, HMC can make use of analytical solutions as well. 3. Comparisons on real datasets might be more exciting. I believe this article holds a good viewpoint upon the bottleneck of HMC, i.e., the huge number of intermediate steps and the difficulty of parameter tuning. However, I don't think relying on the analytical solution of the Hamiltonian system is a promising direction. =================== Updates ===================== I still think the experiments need some some improvement since the current setup is too trivial, especially the sample size is very small. It will be more exciting to see examples with larger sample size and higher dimensions. One concern I have for this sampler is the efficiency of using the prior curve for slice sampling. How will it perform when the prior is substantially away from the posterior? Will the random momentum carries the state too far away from the target and results in low sampling efficiency? My suggestion would be to use an approximation to the full likelihood instead of using the prior for generating the analytical curve. You can choose a nice family f(q) that can induce analytical Hamiltonian solution to approximate the full likelihood, and use this as a proposal distribution. Then you can do slice sampling with a new likelihood of l(q)/f(q). Then the whole framework won't be restricted by the choice of prior and you are guaranteed that the acceptance rate would be high, regardless of how much the prior might deviate from the posterior. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a new MCMC method, based on a measure-preserving flow with respect to the prior which draws a 'slice' through a multivariate space, enabling slice sampling methods to be used. The flow is based on exact hamiltonian flow for the uniform 0,1 distribution (though other choices can be made), combined with an analytical inverse CDF map for the prior. This extends the elliptical slice sampler to scenarios where ellipses are not sensible flows from which to slice sample (e.g. distributions which are far from Gaussian), provided the inverse CDF transform is available (or something else suitable). Several relevant examples in which the technique can be employed demonstrate its efficacy. I found the paper quite stimulating, and would support its publication. Below are some thoughts I had when reading it, and there are minor typos and suggestions in the detailed comments section. - This looks to me like a dimension-free method, in that the statistical efficiency (as measured by things like the spectral gap) of the sampler won't degrade as the dimension of the state space increases (see e.g. [1]). The method therefore extends the class of infinite-dimensional MCMC methods (e.g. [2]) to non-Gaussian priors, which I think would be of interest to many in that community, so perhaps noting the connection would be worthwhile. - Similarly to those methods I suspect the performance of the sampler will degrade with more data, as the posterior looks less like the prior and hence the prior flow isn't such a good starting point (many proposals along the slice may be thrown out before a suitable choice is found). While this certainly isn't a fatal flaw, it may be highlighted and explored in further work. - Once you transform the uniform hamiltonian flow using the inverse CDF I'm not sure whether the resulting flow you get will be hamiltonian in nature (though it will be measure-preserving for the prior). If you take the exponential distribution as prior, using inverse CDF gives the flow x_t = -log(1-u_t), where u_t = u_0 + t*p_0/m. This gives dynamics dx/dt = (dx/du)*(du/dt) = (1-u_t)^{-1} * p_t = p_t*e^x_t, and dp/dt = 0. For there to be a H such that these dynamics are from dH/dp and -dH/dx (Hamilton's equations) then it must be that d^2H/dxdp = d^2H/dpdx, as partial derivatives commute. This isn't true here as differentiating dx/dt with respect to x gives p_t*e^x_t and differentiating dp/dt wrt p gives 0. I hope this is clear. This is simply a back of the envelope calculation but I think it holds. Because of this, you are not really slice sampling along Hamiltonian trajectories, so some of the statements made in the paper could be clarified, though this does not affect the validity of the approach. It also shows that choice of mapping here is a different problem from choice of Riemannian metric in HMC, though they aim to achieve the same goal (a good flow through the space). References [1] Hairer, Martin, Andrew M. Stuart, and Sebastian J. Vollmer. "Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions." The Annals of Applied Probability 24.6 (2014): 2455-2490. [2] Cotter, Simon L., et al. "MCMC methods for functions: modifying old algorithms to make them faster." Statistical Science 28.3 (2013): 424-446. Clarity - Justification: Very well structured paper. I would have suggested a paragraph or two reviewing the basic slice sampler, to make the paper more self-contained, but this is not essential. Significance - Justification: The method links together some existing MCMC methods, and seems to outperform state of the art approaches in some sensible benchmark cases, so I think it merits publication and further study. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): x.y means page x column y. 2.1 line 139. The potential is not proportional to -logpi, it is this plus a constant. 3.1 line 234 repeated 'is' Algorithm 1 output: should read 'the marginal distribution of f' is also \pi^*' (the ' is missing). There is a note at the end of Figure 1 that should be removed. =====