Paper ID: 34 Title: Diversity-Promoting Bayesian Learning of Latent Variable Models Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors present a diversity encouraging prior for latent variable models (with multivariate, continuous latent structure) based on angular distance between components. They derive both an MCMC and variational inference procedure for using such a prior, and use it with a mixture of experts model in a classification task on two datasets. They then compare this model with different priors on a few datasets. Clarity - Justification: Generally this paper is clear and straightforward to follow, and I believe the results could be reproduced (though, of course, I did not reimplement them). There are some stylistic suggestions that I would make: - Line 398 - 455: I would suggest that some important equations, specific to this paper (e.g. the variational family) should be placed in an align block and made more prominent. Some of the other details (e.g. about variational inference in general or specifics of the derivation) can be relegated to an appendix where they can be fleshed out and described in a bit more detail. - (Line 260-263), (Line 217-222), (Line 132-136): these all say very similar things about the shortcomings of a DPP - maybe this can be condensed a bit. Significance - Justification: Diversity promoting priors seem like a great idea, and the empirical results in this paper are convincing that this is a useful way to formulate them (based on angular distance). I would say the contribution here is incremental, but useful. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There are some instances where a statement is made that doesn't have much support. Line 105 on page alludes to a story of interpretability - that diversity promoting priors would improve interpretability of latent variable components. This feature, to me, seems pretty model specific. Even in a simple linear latent variable model, a transformation invariant to this angular prior's specification, like a rotation, can destroy interpretability. The details about accurately modeling "the long tail" are empirically compelling, but it is still unclear why this would be the case. Does this have to do with model misspecification? Given a clearer and convincing explanation for the long tail and interpretability, particularly as it applies to this angular-based prior for diversity, my vote would be a weak accept. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a new method to ensure diverse latent components are found in Bayesian approximate inference by introducing priors that assign higher probability to components with larger mutual angles. It also advocates a posterior regularization approach that encourages diversity in a different way, due to the desire to have low variance (and high mean) in the mutual angles between components. Clarity - Justification: The writing and motivation were very clear, though perhaps some of the longer equations could be moved to an appendix and summarized in the main text. Significance - Justification: Finding diverse latent components is an interesting problem and the proposed solutions make sense and are inventive. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I think this is a good paper, with a few interesting methods proposed and compared to relevant work. Experiments were fine, though it would be nice to see results on more than one model. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes diversity-promoting mutual angular prior using the von-Mises Fisher (vMF) distribution. The authors derive variational inference and sampling algorithms involving this prior and also diversity-inducing posterior regularization, which takes into account the variance of angles. The derived algorithms are implemented on the mixture of experts model and tested on two real data sets. Clarity - Justification: The paper is clearly written except for the reparametrization that the authors use for facilitating variational inference. Significance - Justification: The contribution is somewhat incremental since it is based on the general frameworks of variational inference, MCMC sampling and posterior regularization once the prior or the regularizer is defined. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): p.4, Eq. (3): The authors introduce the reparametrization of the vMF distribution. Although it is computationally reasonable, little is discussed on its meaning. How is the inference influenced by the dependency of the concentration parameter on the other angles through ||\sum_{j=1}^{i-1}\tilde{a}_j||_2. Section 4: It would be better to add discussion on the interpretability of the results, if any, as the authors mention in the introduction that facilitating interpretation is one of their motivations. minor: p.4, l.393: It is not mentioned what VI stands for. p.4, l.412: Is the expectation of the angle simply \mu, not A_p(\kappa)\mu? =====