Paper ID: 1032 Title: Generalized Direct Change Estimation in Ising Model Structure Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper, authors generalized a "direct change estimation" method in learning the changes between Ising Markov networks. The main contributions are: a.) Rather than using the L1 regularized version proposed by Liu et al., 2014a,b, authors generalize this method by using atomic norms and proposed a FISTA type algorithm to solve the optimization problem efficiently. b.) Authors use a "generic chaining technique (Talagrand M. 2005, 2014)" to obtain convergence results of their method. In the case of L1 regularization, their results are sharper than those provided in Liu et al., 2014b. The same "smooth density assumption" is used, and authors analysis can be applied whenever such assumption holds. c.) Experimental results on L1 norm, group sparsity norm and node-perturbation show that the "direct approach" outperforms the "indirect methods" in most occasions. Clarity - Justification: The writing of this paper is good and easy to follow, though I hope authors may add a "notation" section for such a theoretical paper. It seems some symbols are not defined, e.g. Omega_R in (30) and S used after Theorem 2. Some minor typos are also in the equations: e.g. missing bars in (23). Significance - Justification: This work is certainly an interesting incremental work based on Liu et al., 2014a,b. It discusses a generalization of Liu et al.'s method with atomic regularization which can have multiple applications instead of just induce sparsity on individual elements. The most important contribution is that the authors' proof offers a better sample complexity O(s \log p^2) rather than O(s^2 \log p^2) which was obtained earlier only for L1-type norms. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, this paper is nicely written and easy to read. It addresses on an important problem and gave an improved (or generalized) statistical recovery result over previous works. The proof is done by applying a "generic chaining" technique (Talagrand 2005, 2014) and earlier developments on structured estimation over atomic norms (Chen et al., 2015). I followed the sketch of the proof, and it seems correct. The authors' claims are based on the conference version of Liu et al., 2014. A longer version with more results and analysis (including a sample complexity grows with min(n1, n2)) is available at http://arxiv.org/abs/1407.0581. 1) The biggest concern I have is, in Liu et al., 2014b, Theorem 1, the target was to recover the *correct support* of the density ratio parameter, i.e. SUPP(estimated) = SUPP(true). It seems in this paper, however, authors focus on the loss in *L2 norm* (||Delta||_2 in Corollary 3 and 4). Are these two results really comparable? Moreover, in the longer version, Liu at al. also have a bound for L2 norm recovery: Page 37, at the end of proof of A.3, which is O(sqrt(s) \lambda). If lambda scales as \sqrt(\log p^2 / n_1), the results are close to what authors stated in Corollary 3 and 4. Moreover, how is the sample complexity O( s \log p^2) computed? 2) The use of RSC condition is interesting. How does it compare to the "smoothness of normalization" condition (Assumption 3) used in Liu et al., 2014b? Minor comments: In (28) and (29), why the index i and j are counted over 1...n_1 and 1..n_2 (which are sample indices)? In abstract, "sharper than an existing result n1 = O(s^2 log^2 p^2)" should be "...O(s^2 log p^2)". Isn't log p^2 really 2*log p ? Should authors just write "log p" in the convergence results? Please update the citation of Liu et al., 2014b. It was in fact published at 2015 AAAI conference. Citations should be cited in a consistent format. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors propose a method for estimating the difference in parameters in two high-dimensional Ising models. The assumption is that the difference between the parameter vectors is sparse/block sparse, or possesses some other type of structure that may be discovered using the proper regularization. The loss used for the estimation problem is based on the density ratio between probability density functions conditioned on the individual parameters, written as a function of the difference in parameter vectors. The authors then provide statistical theory for minimizers of the regularized loss. They develop l_2-norm bounds for the estimation error of the difference in parameters, under appropriate regularization conditions and sample size requirements. Clarity - Justification: The paper is generally well-written and the results are clearly put in the context of previously existing work. Furthermore, the theorems are clearly organized and the proofs are easy to follow. Significance - Justification: The conclusions of the paper are not surprising given the previous work on Gaussian graphical models using a similar method of regularized empirical loss minimization via density ratios. However, the mathematical theory is rigorously presented. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is well-motivated and carefully presented. Although the methods and mathematical arguments contained in the paper are fairly natural, it seems to constitute a useful contribution to the literature. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper studies the problem of learning the difference between two Ising models. It provides better sample complexity that estimating the two Ising models separately, as well as better than prior work on direct change estimation. Clarity - Justification: Text and proofs are clear and easy to follow. Significance - Justification: I believe the problem is well motivated. The theoretical results generalize from sparse change to more general structured changes. Good experimental validation. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I personally enjoyed reading this paper. I only have a minor comment regarding notation. "q" is used in Line 256 without proper definition. =====