The authors would like to thank all the reviewers for careful reading and useful comments.$ As the reviewer1 and reviewer2 comment, one of our main contributions is to introduce the concept of the weight on persistence diagrams. Based on this new concept, our paper develops a statistical framework for persistence diagrams, and the results achieve significant advances for theory and applications. Furthermore, as the reviewer3 points out that “TDA (topological data analysis) has been a popular idea for some years now, but it has been limited by computational efficiency and applicability. This paper addresses both sides”, our kernel method has a big advantage for computational efficiency, and this allows to apply TDA to practical problems. Followings are responses to the reviewers individually: Reviewer1 >> The authors should be sure to change the title “Submission and Formatting Instructions for ICML 2016” to the correct short title. We will change it appropriately. Reviewer2 >> Maybe it is worth being mentioned that the kernel (2) is universal if k is characteristic, see Christmann & Steinwart Thank you for suggesting the relevant paper. We will include it in the reference of the final version. Reviewer3 >> I hope the authors share their code/data online As mentioned in l.614, we computed all persistence diagrams by CGAL and PHAT, which are standard and public softwares in TDA. After this paper will be published, we are willing to open to the public our codes for the proposed kernel and the experiments together with the used synthesized data (you can obtain protein data from “Protein Data Bank” webpage following Cang et al.(2015).). >> l.134 “be supposed to contain” is confusing We will change it appropriately. >> l.315 maybe start this equation with “d_H(X,Y) :=” We will change it so. >> l.377,431 The notation in equations (1) and (2) lost me. In particular, what is the right hand side that mu maps to? I'm not sure how to understand the dot inside the integral against d\mu(x). For fixed $x \in \Omega$, $k(\cdot,x)$ denotes a function of the first argument (dot), from $\Omega$ to $\mathbb{R}$, i.e. $y \mapsto k(y,x)$, which is an element of RKHS (Moore-Aronszajn theorem). In other words, we set a function $k_x:\Omega \rightarrow \mathbb{R}$ for $x \in \Omega$ by $k_x(y):=k(y,x)$, and take the integral $\int k_x d\mu(x)$ for the RKHS valued function $x \mapsto k_x=k(\cdot,x)$. We will clarify the meaning. >> l.424 Did you try multiple w(x)'s? How much of a difference does it make? Empirically, we are expecting that another weight function which is in proportion to persistence (or its exponentiation) would perform similarly. Theoretically, for any weight function, the injectivity in equation (2) and Proposition 3.1 hold. However, we do not know whether Theorem 3.2 still holds for another type of weight, and this is a future work. >> l.434 “loose” -> “lose” We will fix it. >> l.677 A small image (there's one in the supplementary materials) of what this data looks like would simplify the discussion here considerably. It may be difficult to include another figure in the main text by the space constraint, but we will add a sentence in Section 4.2 like "there are example figures of the synthesized data in the supplementary material". >> The authors experiment on a number of datasets, although I do not have enough experience in the application areas to know if the results they report are strong (e.g. Table 1's entries seem uniformly low!). In Table 1, the purpose is to show the high performance of the classification rate of PWGK+ RKHS-Gauss (top right, proposed method) comparing with other methods, especially with PSSK (competitive previous method), which show almost the chance level. Furthermore, the experimental results on SiO2 glass and proteins have their own impacts in application. In particular, it has been a big controversy even in current physics whether the glass and liquid states are distinguishable from a snap-shot of atomic arrangements. The result in Section 4.3 derived by kernel Fisher discriminant ratio and kernel PCA clarifies that the atomic arrangements of these two states actually have a clear geometric difference, and hence statistically gives an answer to the above controversy. This is definitely a new finding in physics and will accelerate the further understanding of the glass structures in materials science. >> As I understand it, the use of Fourier features has varying levels of error depending on the number of terms in the sum on l.576. Is it worth empirically checking convergence/approximation quality? In Figure 6, the top left figure shows the computational performance of random Fourier features used for the PWGK, and empirically guarantees the fast convergence with good approximation precision over 600 (the horizontal axis means the number of terms in the sum on l.576.).