We thank the reviewers for their insightful comments and suggestions, and address their concerns below:$ Assigned_Reviewer_2: - Our paper is not a generalization of stochastic Nose-Hoover dynamics. The Nose-Hoover system from the molecular dynamics literature is not Hamiltonian in a strict sense, nor is it symplectic. The SG-NHT paper ([DFBCSN2014]) does not address these facts, other than saying in passing that their dynamics ``appears to be the same as'' the Nose-Hoover system. Our paper provides a more thorough justification of the proposed Riemannian Nose-Poincare Hamiltonian, grounded in molecular dynamics theory, and the generalized leapfrog dynamics which are both time reversible and symplectic. - The delta function is related to our goal, because to show that the stochastic dynamics derived from our H sample from the correct distribution, we have to show two things: one, the stochastic noise correction terms do not mangle the underlying Hamiltonian (shown using the Fokker-Planck equation in Theorem 2), or in other words the stochastic and deterministic equations produce the same dynamics from the underlying Hamiltonian, and two, the underlying Hamiltonian system itself is correct in the sense that the (deterministic) dynamics of the system sample from the canonical ensemble (correct distribution). This is what Theorem 1 and the delta function in the proof show. Most of the recent ML papers in this area are concerned with the details of the first step i.e. using the Fokker-Planck equation to add stochastic noise correction terms, and do not discuss issues like symplecticness (required for sampler correctness) in detail. Our work is more thorough in that regard. - The delta notation is a standard one used in the molecular dynamics literature to denote microcanonical ensembles. For more details one can consult [LR2004], Molecular Dynamics chapter, pp 296-297. - The identity connecting the \exp form and the delta function can be written as \exp(-a)=\int_{s}\delta\[a+\log s\]ds. For this and two other identities we have used in our proof of Theorem 1, one can consult [LR2004], Molecular Dynamics chapter, pp 300-301. The trick to get rid of the multiplicative s-term in the Hamiltonian is essentially the same as the one used in the proof of the main Theorem in [BLL1999]. - In the proof of Theorem 1 in Appendix A, we are indeed integrating out s and q from \exp(-H(\theta,p,s,q)), using the transformations mentioned above for \int_{s}. We will modify the writing of that section to more clearly reflect that fact. Also, the proof goes through for unidimensional \theta, since we are working with s and q here, both of which are scalars. Assigned_Reviewer_1: - The runtime of our sampler depends on two factors: the choice of the Riemann metric tensor G(\theta) and the solution technique used for the implicit dynamics. For example, for the univariate Gaussian experiments we used inverse Fisher information plus log prior as our metric, and fixed point iterations to solve the resulting complicated system of equations. The resulting sampler was 1.2-1.5x slower than the SG-NHT sampler. For the diagonal tensors we used Newton iterations with Gaussian elimination to solve the dynamics. For the topic modeling experiments our sampler's per iteration runtimes were 1.5-2x those of SG-NHT. Thus, a properly chosen solution scheme for the system of equations along with a reasonable number of fixed point / Newton iterations (we used 5 in our experiments) keeps the runtimes reasonable, in spite of the added complexity from the Riemann formulation. We omitted these minor points from the main paper because of space constraints. We will incorporate the improvements suggested by Assigned_Reviewer_1 and Assigned_Reviewer_4 about the text of Theorem 1 and experimental results on the distributions of the generated samples. References ----------- [BLL1999] Bond, S. D., Leimkuhler, B. J., and Laird, B. B. The Nos\'{e}-Poincar\'{e} Method for Constant Temperature Molecular Dynamics. J. Comput. Phys, 151:114–134, 1999. [DFBCSN2014] Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., and Neven, H. Bayesian Sampling using Stochastic Gradient Thermostats. In NIPS, 2014. [LR2004] Leimkuhler, B. and Reich, S. Simulating Hamiltonian Dynamics. Cambridge University Press, 2004.