Paper ID: 1019 Title: Cross-Graph Learning of Multi-Relational Associations Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper presents a method for ranking tuples by combining a similarity graph for each type of object in the tuple. The combination method is a learned low-rank tensor function based on the top eigenvectors from each similarity graph. Optimization is done by SGD on pairs of tuples, one that is known to be true and a randomly selected tuple that is assumed to be false (due to sparsity). Results on two domains look promising compared to a wide array of baselines. Clarity - Justification: I found the motivation and intuition behind the method difficult to follow, and I'm still not entirely sure if I understand how this method works. Some of this would be clearer if I was more fluent in tensor algebra and tensor methods in general, but even so, the high-level description of what is being learned and what patterns are being exploited is somewhat buried. Typos and grammar errors: "cast as to joint learning" -> "cast as joint learning" "discover about who" -> "discover who" "predict about who" -> "predict who" "We call the prediction problem in both examples as cross-graph..." -> "We call the prediction problem in both examples cross-graph..." "How to make the cross-graph..." -> "How to make cross-graph..." "the massively available" -> "the massive number of available" "for each type of objects although" -> "for each type of object, although" "suffer from a poor scalability" -> "suffer from poor scalability" Tensor is sometimes capitalized for unknown reasons "(1) ... (ii) ... (iii)" - numerals inconsistent "larger than two while method" -> ??? "We introduce our notations" -> "We introduce our notation" Significance - Justification: This is an interesting way to combine multiple graphs and the results are promising. The problem seems somewhat restricted, though -- it assumes that the goal is to rank heterogeneous tuples based on similarity graphs. This is but one of many kinds of prediction task and relational structure. The protein prediction task seems like a natural fit. The DBLP task is more contrived -- if we have similarity graphs for papers, authors, and venues, then why would we need to predict (author, paper, venue) tuples? Perhaps there is something else about the model structure that has interesting insights, such as suggesting which authors are likely to publish in which venues in the future. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This method seems principled, presents a somewhat different way of combining graphs using tensors, and the results seem promising. However, the problem definition is restrictive and the intuitions about how and why the method works could use more detail. For example, is there an implicit clustering/dimensionality reduction taking place, as is in matrix factorization methods? If so, can the resulting space be visualized or otherwise described in any way? My review is low-confidence because I'm not entirely sure what the right baselines are for judging novelty, significance, and experiments. For now, I give this paper the benefit of the doubt -- the method seems sound (to the extent that I understand it), and it contributes a different perspective on combining multiple graphs (as far as I'm aware). ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper tackles the problem of learning from associations among multiple different relations. It extends recent work of Liu & Yang to handle more than 2 relations. It also proposes a more efficient solution. Spefically, the paper looks at using a spectral graph product to combine the different relations. The approach is also transductive so that it can cope with limited training data. Clarity - Justification: The paper is mostly clear and well written Significance - Justification: The extensions over Liu & Yang seem important. The results show the approach is outperforms a number of competitors, though it would be nice to see a comparison to Liu & Yang on the enzyme data set since that only has 2 graphs in it. While the approach does better the competitors, in a big picture the MAP scores for all of the methods are quite low < 0.1 of DBLP and < .2 on enzyme. These are quite skewed domains, so I’d suspect MAP is a more accurate view than ROC AUC for seeing performance. The paper does ignore one really large body of related work, which is all the work on (statistical) relational learning. See Getoor & Taskar (2007) for an introductory book. These formalisms (Markov Logic, ProbLog) can all naturally handle multiple different relations and there are loads of models and learning algorithms that can address the tasks mentioned in this paper. Note also that work on mining of molecular graphs has a long tradition in Inductive Logic Programming, which has been used in several steps of the drug design process see work by people like David Page and Ashwin Srinivasan: http://pages.cs.wisc.edu/~dpage/mlj_page.ps http://www.cs.wisc.edu/~jdavis/misayu.pdf This kind of work should at least be acknowledged in the paper. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There are more detailed comments within the significance section as well. The paper should acknowledge and compare contrast in some way with SRL/ILP approaches. I’d recommend including random guessing for MAP. This may put the results in a bit more context. Is there a reason why the authors only focused on one completion on each domain? The paper would be strengthened by including multiple completions in each DB. minor points line 317 is not involved -> are involved line 319: to incorporate -> incorporating ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper addresses the problem of learning multi-relational tuples of entities from different graphs. The main idea is exploiting eigenvectors derived from each respective adjacency matrix to learn multi-relational tuples among graphs. Specifically, the proposed new approach is supposed to scale well and to offer a convex cost function. The paper is in general well presented and the experiments seem convincing. Clarity - Justification: It is of great help that the authors offer quite a few examples (verbal and graphical) in explaining e.g. the model's objectives, central concepts, etc.. Significance - Justification: The proposed approach seems to be novel and capable of tackling some difficulties from which current state-of-art methods are suffering. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): We are not certain about describing the proposed approach as a framework, because -as far as we could understand- most parts of the method are already quite concrete and deterministic. Therefore we do not see much space for generalizing the model. The authors only mentioned about replacing the Tucker component with CP or TensorTrain; this should be explained in more detail. We would suggest reconsidering this formulation or making it more explicit to which extent the novel method can be extended/generalized. Furthermore, although the proposed approach is supposed to scale better than existing ones, the solution to scalability of if seems simply to be a sampling process as in Equation (10). Such methods could also be found in common Tensor factorization models, which therefore do not necessarily suffer from sparsity in data either (which is proclaimed in section 1). =====