Paper ID: 942
Title: Persistence weighted Gaussian kernel for topological data analysis

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper constructs a (weighted) Gaussian kernel on the space of persistence diagrams, which play a crucial role in topological data analysis. The basic idea is as follows: - a persistence diagram is identified by a collection of    points in the plane, see figure 2. These points define an   empirical measure.  - the empirical measures are mapped into an RKHS by a standard   kernel mean embedding plus a weight factor that discount birth-death-events   close to the diagonal. This defines a distance on persistence   diagrams that has a Hilbert space geometry. - The latter observation is used to construct linear and Gaussian kernels   for persistence diagrams by simply replacing the usual squared   Euclidean norm by the Hilbert space structured distance constructed    above.  Some theoretical results for this kernel are presented, and a few experiments are reported.

Clarity - Justification: I had to review an earlier version of the paper for a different  conference last year. It got rejected back then, mostly, as far as I remember, for presentational issues. The authors did a great job addressing these concerns.

Significance - Justification: I think the kernel idea is neat and novel.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Maybe it is worth being mentioned that the kernel (2) is universal if  k is characteristic, see  Christmann & Steinwart, Universal kernels on non-standard input spaces, NIPS 23

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper presents a kernel on persistence diagrams from computational topology.  The combination of kernel methods and topology allows analysis of /sets/ of point clouds, that is, considerations of problems where the individual data "points" are point clouds, e.g. collections of molecular configurations.  The kernel can be approximated relatively quickly using Fourier features and is applied to several tasks.

Clarity - Justification: The paper is written in an extremely "user friendly" way, including a supplemental document with proofs and even an intro to topology.  I have mostly watched computational topology from the outside, and this paper piqued my interest --- I hope the authors share their code/data online so that it's easy to play with these methods (my only doubt about reproducibility is expense/challenge of computing persistence diagrams).  Localized points:  l.134 -- "be supposed to contain" is confusing  l.315 -- maybe start this equation with "d_H(X,Y) :="  l.377,431 -- The notation in equations (1) and (2) lost me.  In particular, what is the right hand side that mu maps to?  I'm not sure how to understand the dot inside the integral against d\mu(x).  l.424 --- Did you try multiple w(x)'s?  How much of a difference does it make?  l.434 --- "loose" -->  "lose"  l.677 --- A small image (there's one in the supplementary materials) of what this data looks like would simplify the discussion here considerably

Significance - Justification: TDA has been a popular idea for some years now, but it has been limited by computational efficiency and applicability.  This paper addresses both sides --- efficiency is achieved with Fourier features and applicability is extended via construction of a kernel method.  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper provides an accessible, interesting approach to data analysis using advanced ideas from TDA.  The exposition is clear even though persistence diagrams are notoriously tricky to understand, and the kernel they construct seems like a reasonable way to apply TDA to more ML tasks.  The authors experiment on a number of datasets, although I do not have enough experience in the application areas to know if the results they report are strong (e.g. Table 1's entries seem uniformly low!).    As I understand it, the use of Fourier features has varying levels of error depending on the number of terms in the sum on l.576.  Is it worth empirically checking convergence/approximation quality?  For readers outside the TDA community, an extra paragraph in the conclusion indicating more directions for theoretical work would be much appreciated.

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a persistence plot kernel for topological data analysis. Persistence plots are a way describe the homology groups of data, which in turn describes topological features of the sample distribution.  The kernel consists a weighted kernel embedding of the points in the persistence plots, where the weighting essentially captures how confident we are that a detected cavity, ring, etc is not just noise. The authors demonstrate some desirable theoretical properties for this kernel.  The authors apply this kernel to synthetic data to perform classification and real life datasets to perform estimation and classification. Their experimental results results seem good.

Clarity - Justification: Topological data analysis is a very technical topic and the authors did a good job of explaining it considering the length constraint of the paper.  The mathematical explanations were very lucid.

Significance - Justification: I am not well versed in TDA so its hard for me to determine where this result fits into the TDA literature, but the theoretical results seem valid and from the experiments it seems like the method works well when applied to data. It seems like it would be desirable to include more competitor algorithms.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): As mentioned earlier I am not very familiar with this field. The technique seems good if not methodologically groundbreaking.  The authors should be sure to change the title "Submission and Formatting Instructions for ICML 2016" to the correct short title.

=====

Review #4
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper under consideration need to be compared and contrasted to the one by Bauer et all: http://arxiv.org/abs/1412.6821 the paper under consideration have a few advantages (and no disadvantages) compared to the paper by Bauer et all. First of all, the computational advantage (some smart quick way of evaluating exponential functions). Second of all, the scaling by arctan. According to my experiments it is an important for the quality of the results (it would be good if the Authors elaborate on this).    The week point of this kernel, and the kernel in the paper by Bauer at all (both based on gaussians centered in the points of diagram) is stability, which depends on the number of points in the diagram. This is hovewer something that cannot be fixed, it is a consequence of the approach.  To conclude, I think that compared to the previous paper by Bauer at all the paper under consideration is substantially better and should be discussed with people working in machine learning in a conference like ICML.

Clarity - Justification: The paper is clearly written, I see no issues here. 

Significance - Justification: Kernel methods in persistent homology is a new, hot topic. The presented work have a solid statistical background based on the work by the first author. Therefore the mathematics behind that is very good. Also the applications discussed (analyisis of glass structure and proteins) important and the  results are promising. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I have no further recomendations for Authors. 

=====