Paper ID: 903 Title: Nonparametric Canonical Correlation Analysis Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): A natural algorithm is proposed for nonparametric CCA which relies on density-based estimation of a pointwise mutual information operator across multiple views whose SVD yields the solution to a CCA formulation which makes no parametric assumptions. The method is competitive with kernel CCA and deep CCA on empirical tasks. Clarity - Justification: Well written paper which is fairly clear for the most part. Significance - Justification: The formulation is natural and should attract interest. The proposed approach seems cleaner and more scalable than current techniques for nonlinear PCA. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The method appears to be natural and seems to perform well. The main issues with the approach seem to be: (1) It relies on density estimation which introduces its own challenges, in terms of scalability, introducing new hyperparameters, and the sparsity problems for high quality density estimation in high dimensional settings. Some empirical analysis around datasets where the proposed approach fails to outperform other baselines would actually have been illuminating. (2) In its pure form, the approach is transductive and does not naturally extend out of sample. A Nystrom extension is proposed in the paper. Please clarify if the quality of this out of sample extension has been studied carefully. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Inspired by some nice results in Lancaster (1958), this paper proposes an implementation of nonlinear canonical correlation analysis (CCA) based on estimated joint density of the data. Clarity - Justification: The paper is well organized and clearly written. Significance - Justification: I have mixed feelings about this paper. On one hand, the result in Lancaster (1958) paper is elegant, and it would be nice to have a practical algorithm to implement it. On the other hand, the contribution of this paper is mainly an implementation of that idea; the direct implementation involves density estimation in high dimensions, which seems to be a more difficult problem than nonlinear CCA itself. It sounds like "shooting butterflies with a shotgun." Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I have mixed feelings about this paper. On one hand, the result in Lancaster (1958) paper is elegant, and it would be nice to have a practical algorithm to implement it. On the other hand, the contribution of this paper is mainly an implementation of that idea; the direct implementation involves density estimation in high dimensions, which seems to be a more difficult problem than nonlinear CCA itself. It sounds like "shooting butterflies with a shotgun." On the computational side, usually one would like to avoid explicit density estimation when performing nonlinear CCA; that is one of the reasons why kernel CCA has been frequently used. Since the proposed method involves density estimation, its the behavior depends on how one estimates the density, which, to me, should really be avoided. The formulation of partially linear CCA might be useful in certain circumstances, but it is pretty straightforward—it is naturally either a special case of nonlinear CCA or an extension of linear CCA. In the context of kernel CCA, it can be simply achieved by using different kernels. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper present a new approach (NCCA) and a sub-case of that novel approach (PLCCA) used for nonlinear canonical correlation analysis. The approach contrary, the paper claims, to other approaches to nonlinear CCA, does not make any assumption on the shape of the distribution of the data. They derive a general solution which is then approximated by using kernel densities estimates of the true densities of the data. Clarity - Justification: To this reviewer who is knowledgeable without being and expert in the most recent literature in the field, the paper provides a quick overview that is very well written and easy to follow. The description of the novel approach is clear and well motivated. The experimental section seems also very convincing. Significance - Justification: Paper with good significance, bringing knowledge and an alternative efficient approach to the nonlinear CCA field. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Typos: - line 736: Mel-frequencEy cepstral - References: the first two references seem to be unintentional Minor comments: - 562: too high? (rule of thumb?) a few paragraphs before, the paper stated that KDE would manage high dimensions if the intrinsic dimensionality was low. After that, this PCA lacks justification... - The formal comparison to Kernel CCA is very thorough. As a minor criticism, one could have wish for a comparison to Deep CCA which gives very close results in term of performance to NCCA. =====