Paper ID: 210
Title: Structure Learning of Partitioned Markov Networks

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper addresses the problem of learning a Markov network structure between two groups of variables. Instead of learning the full Markov network on all variables, the paper focuses on learning only the edges across two groups of variables. The proposed method is based on partitioned ratio that captures the relationship between two groups.

Clarity - Justification: The paper is generally well written.

Significance - Justification: The paper introduces a novel concept of partitioned ratio and passages that capture the relationship between two groups of variables, and proposes an estimation method for the partitioned Markov networks based on the partition ratio. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Strong experimental results on why focusing on learning edges across two groups is preferable than learning the full model would have been useful to show the advantages of the proposed method in terms of accuracy and computation time.  It is not clear how the experiments were performed to compare the proposed method with the previous methods in Sec 6.1.  The authors state previous methods were used to learn the differences between the two Gaussian densities p(x) and p(x1)p(x2) but I find this unclear.  While the results in Sec 6.2 are interesting, a "quantitative" comparison with other approaches on real datasets would be valuable. 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.):  The authors provide a method for estimating the edges between two known partitions of the original set of variables. They provide sample complexity guarantees and several definitions.       

Clarity - Justification:  The paper seems to miss some important discussions, conclusions and future work.       

Significance - Justification:  It is difficult to asses the significance of the paper. The authors should better emphasize the contributions of the paper in comparison to current literature, but in a more formal fashion (Please see detailed comments.)       

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.):  The paper seems to miss some important discussions, conclusions as well as future work.  The authors should formally argue about the fundamental difference between their problem/results versus estimating a partial matrix/structure for the original problem (e.g., Ravikumar'10) and just instantiating the theorems in (Ravikumar'10). For instance, consider X=(X1,X2), and the full structure to be Theta=[Theta11, Theta12; Theta12', Theta22]. Instead of estimating Theta, consider only estimating Theta12. Clearly, one does not have to estimate the sufficient statistics related to Theta11 and Theta22 anymore and the sample complexity will depend on the support size of Theta12 only. One can potentially modify the statements in (Ravikumar'10) in order to get sample complexity results that I believe might be similar to the ones in this submission.  There are some small notation issues: Line 189 uses X_{\i} while lines 269-270 uses \X_i, both seem to mean the same. Proposition 1: although proof is trivial, please include it in the supplementary material. Proposition 2: X_v \notin X_{N(u)} seems to be actually v \notin N(u).  Regarding experiments: Line 688: "15th smallest eigenvalue", why?       

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The problem discussed - partial structure learning, establishing links between two pre-defined communities and not attempting to discover links within the communities, is unusual and interesting.  Intro is good, theory is useful (even though a bit confusing) and the experiments are sound. 

Clarity - Justification: Intro is good and clear, theory is useful (even though a bit confusing - see below) and the experiments are sound/well explained.

Significance - Justification: The formulation is new and interesting. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Based on all of the above - I recommend publication.  However, if the paper is accepted, I would also like to see the authors streamlining discussions of the theory a bit. (My understanding is that this will not affect the general conclusions of the paper and results of the experiments. Specifically, I am not sure that the extended discussion of the Hammersley-Clifford theory (and respective generalization for the partitioned case) is critical. Indeed, discussing the HC approach the authors hint (with reference to Vapnik) that there is not much of a point in learning the conditional probabilities, which are any case conjectured. It is as good (and certainly simpler) to start straight from conjecturing the form and learning the (undirected) GM itself.

=====

Review #4
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper learns the structure of edges between two groups in a Markov network, which they call a partition Markov network. The learning approach is well-grounded theoretically by estimating the ratio between the full joint distribution and the product of the two joint distributions over each group. One weakness of the paper is how widely applicable this formulation is, but some good examples are given in the empirical section. Overall, it is a well-written paper that provides a novel approach to this problem. 

Clarity - Justification: Well written overall. The empirical section could use improvement (see detailed comments).

Significance - Justification: A novel problem formulation, but may be of limited applicability.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): How does the algorithm perform when the full \Theta is sparse? It might still outperform other methods because it is performing a more direct learning task. If, on the other hand, the algorithm only performs best when the within-partition networks are dense, this would be good to know.  Precision and recall would be better measures of performance than TPR and TNR, because the number of true edges is much smaller than the number of false edges.   The definitions of TPR and TNR given in the appendix are confusing, partly because the definition of S and S^c are hard to find in the paper. Aren’t the \delta functions in the denominators always equal to 1, by definition of S and S^c? It is also confusing to use \theta^* here while the true model in Section 6 is \Theta.  It would be nice to see the results from competing algorithms on the real data sets in the Empirical section. Because we do not know if this data meets the problem assumptions, it would be interesting to see how different results are using different methods (each with its own assumptions), even if it is only a qualitative comparison.  Fonts on Figure 4 are much too small to read. Figure 4(b) needs a brief descriptor.  In References, need to capitalize Gaussian and Markov.  Line 236, exit => exist  Line 332: Calling this a “curious” question does not help the reader know why you are asking the question.  As a side note, the Introduction could be condensed if more space is needed later in the paper.

=====