Paper ID: 220 Title: Beyond CCA: Moment Matching for Multi-View Models Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper studied three generalizations of the standard CCA setting, involving non-Gaussian noise and discrete (Poisson) models. A method based on generalized moments and covariances was proposed and analyzed. Clarity - Justification: The paper is well-written for the most part. For readers who are not familiar with cumulants and the diagonalization techniques, Sections 3 and 4 are challenging to follow. I appreciate the space constraints, but it would be good for the authors to include more intuitions behind these sections in the main text as well as more backgrounds in the appendix. Significance - Justification: The results are interesting mostly from a theoretical perspective. The experiment on the real data was quite terse and more work is needed to illustrate how well this works in practice. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I like the models studied and the idea of using generalized covariances. I believe a recent paper (http://arxiv.org/pdf/1507.03867v1.pdf) solved a generalization of your model 1 (eqn 4) using a different cumulant-based approach (Section 3). This is fine, but you should discuss how your techniques and results compare with that paper. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers extensions of Canonical Correlation Analysis models and considers moment based approaches for learning these models. In standard CCA, we are given two observation vectors and goal is to find a projection for these two vectors that maximizes the correlation. This paper studies problems related to learning CCA-related probabilistic models. 1. They first introduce settings with Discrete version of CCA and mixed version of CCA where one of the views is discrete, and the other view is non-gaussian. 2. The paper extends the cumulant based methods that were developed for ICA to work for their models with similar guarantees. 3. The algorithms are tested on both simulated data and real data. In particular the algorithm for Discrete CCA is tested on extracting topics from the Hansard collection. Clarity - Justification: The introduction and technical content is clearly written. However there are many different models, with small variations in them. It would be easier for the reader if they mainly focus on one model, and then describe extensions to other models later. Significance - Justification: The models studied in the paper are fairly natural extensions of probabilistic CCA. The paper leverages the similarities to ICA and obtains cumulant based algorithms to learn the model parameters when at most one of the sources is gaussian and the others are non-deterministic. This is similar to the nature of guarantees that are known for ICA and ICA-related models like discrete ICA. Algorithms of ICA are based on algorithms for tensor decomposition; similarly the algorithms in this work are also based on simultaneous matrix diagonalization methods. While the problems and methods developed here are interesting, and backed well by experiments (esp. the Hansard collection), I feel the paper is more on the incremental side with respect to methodological/ algorithmic contribution. Hence, I am on the fence about this one. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): A couple more comments: 1. Given that many of the algorithmic ideas are inspired by algorithms for ICA and its variants (e.g. Goyal et al., Podosinnikova et al.), it would be good to give a good idea of the technical differences. 2. It is not clear why the whitening step is needed for the joint matrix diagonalization approaches described in section 4. Algorithms described in [Leurgans et al., Bhaskara et al etc. section 2] describe algorithms for joint diagonalization that do not involve whitening, even when it is non-symmetric i.e. M_p= U B_p V^T and M_q=U B_q V^T, as long as U, V are full rank. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces new models of CCA with identifiability guarantees. The proposed models are discrete CCA, and the results are discussed by analyzing a connection with discrete ICA. A method-of-moment technique is also proposed to estimate the factor loading matrices in the model. Clarity - Justification: Overall, the paper is well-written. The background concepts and related works are mentioned in sufficient amounts. The new analyses and results are also well described based on earlier works. It was a smooth experience to read the paper. I have one major suggestion to improve the presentation of the paper. The paper is written in a story-telling fashion, starting from backgrounds and explaining new results based on that in a step-by-step way. So, the authors walk the reader through the paper. I appreciate that, but on the other side, this also makes it harder to find the main contributions of the paper. It is also sometimes useful to directly jump to some main results and highlight them. Other techniques such as providing lemmas to highlight main steps (for instance for moment forms in Section 3.1, Equations (13),(6),(18)), and also proposing an algorithm in Section 4 would be useful to tackle this. Significance - Justification: I think the significance of the paper is marginally above average. I see the identifiability results as the main contribution of the paper, while the rest of contributions are very marginal. For instance, many techniques are borrowed from the discrete ICA analysis in Podosinnikonva et al. (2015). The models are not quite different from the fundamentals of CCA, mainly the Gaussian assumptions are removed. The diagonalization technique is also not actually new. These are very similar to existing method-of-moment techniques. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Given the significance justification I mentioned above, I rate the paper as a marginal accept. =====