Paper ID: 966 Title: Complex Embeddings for Simple Link Prediction Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes to use **complex valued** embeddings for link prediction problem. The proposed method has two advantages: 1) it can model the anti-symmetric relations naturally 2) it is still simple and fast enough (linear time via dot product). It justifies that the class of low-rank normal matrices can encompass most KB binary relations, and can be approximiated using low-ranking factorizaiton with complex-valued latent factors. The empirical results demonstrate that this model outperform the previous methods on WN18 and FB15k datasets (comparable with HolE on WN18 though). Clarity - Justification: This paper is well-written and easy to follow. Significance - Justification: It is an interesting problem and the proposed method is simple but effective. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): It might be a good idea to look into what these complex embeddings have learned. Can you visualize the embeddings of entities in complex vector space, and make some comparisons with real-valued counterpart? Table 4 presents the per-relation accuracies on WN18. It'd be nicer to highlight which relations can benefit most from this method (especially the anti-symmetric ones). ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper is extremely simple (and make you wonder why nobody came up with this idea before, as the problem is very well studied) but at the same time neat and also seems to work very well in practice. I think it will have a large impact on the area. The basic idea: it is standard to use the dot product of argument embeddings (e.g., real vectors for "Tower Bridge" and "London") to score how likely 2 arguments to fit a relation (e.g., "located_in"). This factorization is a bit problematic as it can handle only symmetric relations (as long as the arguments are embedded in the same space independently of their position in the relation; using position-specific embeddings is not so great idea for different reasons). In this work, they instead consider complex embeddings and use the Hermitian product. And it works very well. That's basically it. Clarity - Justification: There a few paragraphs which are slightly confusing (see my comments below) but overall the paper is very clear. Significance - Justification: It is a simple novel idea which is likely to have a large impact on the area. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1) Page 2, r.c., 215: as here the authors still consider real matrices, it probably makes sense to refer to U as orthogonal, not unitary. 2) I find the paragraph discussing real normal matrices over-compressed. It would be interesting to see more discussion. Moreover, the sentence "The matrices that are not normal are somehow (??) related to triangular matrices, which can be handled through non-linearities of the logistic function". I am not sure I can even parse it (as well as the footnote). Overall, this discussion seems to be a central point for the paper, so it would be nice to extend it a bit. 3) Also, it may make sense to do include some analysis in the experimental section to show to which degree the considered knowledge bases (WordNet/Freebase) satisfy the normality assumption. Also, maybe give some examples when they don't. 4) Section 4. The experimental section argues that RESCAL is over-parameterized and overfits. I am wondering if some better regularization can be applied to improve generalization (rather than L2)? Even simpler, the rank of matrix M_r could be controlled by modeling it as M_r = U_r V_r. Or, something softer like the trace norm? 5) It would be interesting to see a curve with the amount of training data changing (at least for the artificial experiments). It is a bit surprising to me that RESCAL cannot capture symmetric and antisymmetric relations from so many examples. 6) Though it may be well known in the relational modeling community that reusing embedding across positions is important (and also seems very intuitive and plausible), to make this paper more self-containing it may be good to include some experiments supporting this claim (not on the artificial data of section 2.1, where it is obvious, but on the real data - FB + WN). ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): For link prediction problem, this paper presents a novel matrix factorization method. Specifically, the proposed method utilizes the Hermitian dot product which is the complex-valued dot product. Compared to the conventional state of the arts method, the proposed complex embedding is much simpler and is more efficient. Using real data sets, the authors show some strong results. Clarity - Justification: I think the experiment section is clearly written, but the proposed method should be written in the form of an algorithm. The current explanation looks informal. Significance - Justification: As the authors know, many methods for the relational link prediction problem have been presented so far. I think the proposed simple method is quite simple and practically valuable. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I found that the proposed method is much simpler than the conventional method and is effective. I recommend the paper be accepted. =====