Paper ID: 1264 Title: Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a latent-variable probabilistic approach for modeling data in the form of country-country interaction events with timestamps (and is applicable more broadly to any data consisting of typed time-stamped interactions between actors). The key idea in the paper is the use of the Tucker decomposition as the latent factor representation, extending recent work by Hoff with Gaussian likelihood to count data with Poisson likelihoods. The paper develops a Bayesian framework using an MCMC-based inference scheme. The model is shown to generalize a number of other related models for similar data, such as Poisson latent variable network models and infinite relational models. Experimental results on a data set of time-stamped country-country interaction events illustrate both the predictive advantages of the model over competing methods as well as the interpretability of the model. Clarity - Justification: The paper is generally clear and easy to follow. The section below on detailed comments has a few suggestions for improvements. Significance - Justification: (my real rating here is probably closer to "average" than "above average", but "average" was not an option that could be selected) The paper makes a nice and potentially useful contribution to the literature on latent-variable modeling of time-stamped interaction data. This type of data is increasingly common in a variety of social data analysis contexts (not just in international relations), so there are plenty of potential applications for such models. One could argue that the degree of novelty in the paper is modest, but I think its a useful contribution nonetheless, taking the recent work of Hoff (2015) and extending it to Poisson likelihoods and showing how the resulting model is is related to (and generalizes) a number of prior contributions in machine learning on this topic. Certainly this is the type of paper that helps fill in some missing gaps in terms of figuring out how seemingly different modeling approaches in this space are related to each other. The experiments are interesting - but the paper may be selling itself a bit short by only having results for a single data set and focusing solely on its application to international relations. The paper would be stronger if it had results for at least one other data set (perhaps involving human actors rather than countries) - and I think the title/abstract/introduction could all be generalized to be at the level of general "actor-actor" interactions rather than "country-country" interactions, leaving the "country-country" interactions as a nice motivating example, rather than the sole example. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Aspects of the paper that could be clearer: - it wasn't entirely clear why you discussed the non-parametric versions of the model, e.g., lines 296-312. If I understand the paper correctly, the paper describes a finite-dimensional model (i.e., C, K, and R are fixed and finite) - so you may want to explain to the reader a bit better your motivation for discussing limiting behavior as K-> infinity, etc. This is interesting, but could be a bit of a distraction to the reader, and you want to make clear that you are still going to use a finite model in practice. One option would be to pull all the text on the non-parametric interpretation into a separate clearly-marked subsection rather than interleaving it with the rest of the text. - the paper also says (line 388) "In practice we set C, K, and R to large values to approximate the non-parametric interpretation of BPTD" - but in practice the values of K=8 and R=3 (in particular, line 688) hardly seem that large - so maybe some clarification of what is meant by "large" in this context would be helpful? Typos, Clarifications, Suggestions, etc: - equations 4 and 10: should D be C? (seems like D is not defined...) - K and R should be defined earlier in the paper (near where they are introduced in Equation 4) - how many events are in the data set used in Section 6? would be useful to provide this information - The fonts in Figure 2 are too small to read - perhaps this figure can be broken up in some way? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper introduces a Bayesian tensor factorization model based on the Tucker decomposition, using a Poisson likelihood, and applies it to model interactions between countries, and their latent community memberships. Clarity - Justification: The manuscript is mainly well written and easy to follow. Significance - Justification: While many Bayesian factorization models have already been proposed in the literature, the proposed modeling framework is elegant and likely to be useful for modeling other types of data as well, spurring further work. The authors extend their model to the nonparametric Bayesian setting, which is a nice contribution. Finally, the application to the study of international relations is of interest in its own right. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors propose a Bayesian tensor factorization model using a Tucker decomposition and a Poisson likelihood, and applies it to the modeling of timestamped, directed actions between countries. They show how to extend it to the nonparametric Bayesian setting, and infer it using Gibbs sampling. The model and posterior inference algorithm are sound. Although a number of Bayesian and matrix tensor methods have been explored in the literature, the Tucker decomposition approach given here is a useful piece of the jigsaw puzzle. The application to country-country interaction data from GDELT and ICEWS is a particular strength of the manuscript. The qualitative results exploring the latent space recovered by the model are especially revealing, in addition to the prediction results. For the prediction experiments, the test set observed/held-out split, between the 15 most active countries and the remaining countries, is unusual. A random split would be more standard, and it is not clear why this decision was made. One limitation of the current formulation of the proposed model is that it models time in the tensor factorization framework, rather than with a realistic dynamics model, e.g. capturing the intuition that the latent representations should vary smoothly over time. An extension with dynamics on the latent representations such as a hidden Markov model on the memberships, etc, would be an interesting direction for future work. Although the manuscript is generally well written, I have one small gripe in the exposition: the model is introduced as "Bayesian Poisson Tucker Decomposition," which suggests that the model is generally applicable, yet the model is introduced specifically for the international relations modeling task. Is it no longer "Bayesian Poisson Tucker decomposition" if you use a different number of latent factors, or apply a similar model on a differently shaped tensor? If the general name of the model is used, it would be better to also formalize the full class of proposed models, and introduce the international relations model as a special case. Otherwise, it would be better to rename the model to something more specific. The proposed model can output "topic"-specific latent representations. The following is a relevant reference for this task: Topic-Partitioned Multinetwork Embeddings. P. Krafft, J. Moore, B. Desmarais, and H. Wallach. In Advances in Neural Information Processing Systems 25, 2012. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper builds a model and algorithm for decomposing the tensor of (country, action, country, time), thus "embedding countries into communities, actions into topics, and time steps into regimes". I.e. it does clustering using tensor decomposition, and demonstrates the method on a real data set of information extracted actions/communications between countries. Clarity - Justification: The writing is mostly clear. My one request would be for a clearer explanation of when and how this method is better (or worse) than alternative methods for addressing the the same problem. (and along with that, a quantitative evaluation of the quality of the results. Significance - Justification: The math seems strong, and it's nice that it was applied to a real data set of reported communications/actions between countries, although I can't say that I saw any particularly interesting insights into the communities (clusters of countries). It would have been nice to have a paragraph of commentary on interesting insights gained. In particular, given that there is clustering on time as well, were any interesting changes in community interactions observed? Also, given that there is not much evaluation, is there any evidence whether this is better or worse than alternative algorithms? I'm not sure how I am supposed to judge the quality of the results obtained. Overall, this looks like solid research and I would like to see it accepted at ICML. Unfortunately it is much too far out of my area of expertise for me to have high confidence in my recommendations. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): the authors write "Models that use the token representation naturally yield efficient inference algorithms, models that use the tensor representation exhibit good predictive performance, and models that use the network representation learn latent structure that aligns with well-known concepts such as communities" - great summary; this would benefit from some citations for these claims. Does this mean your method should exhibit good predictive performance? It seems you talk more about communities rather than predictive accuracy, so I'm a little confused. =====