R1: Thanks for the encouraging comments. We will fix the typo.$ R3: Thanks for the encouraging comments. As R3 correctly points out, our results are a generalization of the analysis in (Liu et al. 2014b) to any norm. Generalization to any norm requires novel techniques and tools, e.g., generic chaining, which have been applied in other settings in the recent literature. However, our analysis works with two sets of samples and with the density ratio loss function which is not decomposable additively over the samples. These aspects present additional challenges for the analysis. R4: Thanks for your detailed and insightful comments. We will fix the typos and update the reference. As R4 correctly points out, our results directly generalize the analysis in (Liu et al. 2014b). The work of Liu et al. (2014b) is the first work on the density ratio loss function with a significant impact in the change detection realm. The major difference between our work and (Liu et al. 2014b) is that the results in (Liu et al 2014b) apply only to L1 norm whereas our analysis applies to all norms, e.g., lasso, group lasso, k-support norm, node perturbation penalty, etc. Related tools have been applied in other settings such as regression with norms in the recent literature (Banerjee et al. 2014, Tropp et al. 2015), but in the previous settings only loss functions with one set of samples are considered. Our analysis works with two sets of samples with the density ratio loss function which is not decomposable additively over the samples. These aspects present additional challenges for the analysis. Thanks for the pointer to the longer version (arXiv:1407.0581). In arXiv:1407.0581, Liu et. al improved the sample complexity to \min(n1, n2)=O(s \log p) when the “boundedness” of the density ratio model is assumed. However, in our analysis, under the “smooth density ratio” assumption (a more relaxed assumption than the “boundedness” assumption), we obtained \min(n_1, n_2) = O(s \log p) to accurately recover the changes. In arXiv:1407.0581, under the “smooth density ratio” assumption, Liu et al. obtained n_2=O(n_1^2), so that if n_1 = s log p, then n_2 = s^2 (log p)^2, which is more than what our analysis needs. We will add a discussion in the paper to clarify the differences for the L1 case. Results of Liu et al. provide conditions for exact support recovery, whereas in our analysis we provide the recovery bound on ||\Delta||_2. Under additional assumptions, one can obtain the support recovery from the bound on ||\Delta||_2 using the techniques that are well established in the literature. In particular, for L1 norm, under additional assumption on boundedness of minimum element of \delta \theta^* as well as incoherence assumption, our analysis can be extended to obtain support recovery, but such results follow from the existing literature, and no new analysis/techniques are needed. We will add a remark in the paper regarding support recovery. As R4 states, from Lemma 3 (proof in A.3, Page 37) in arXiv:1407.0581, the recovery bound ||\Delta||_2 < \sqrt(s) \lambda can be obtained (Page 16 in arXiv:1407.0581). However, to obtain such a recovery bound, the conditions in Lemma 3 needs to be satisfied. To satisfy the conditions in Lemma 3, they need to have n_2 = n_1^2, whereas we can show a similar bound without needing n_2 to be this large, e.g., in our setting we can have n_2 = n_1. The sample complexity in our analysis depends on satisfying the RSC condition. Interestingly, we show that the sample complexity only depends on n_2 and is satisfied with high probability after n_2 > w^2(C_r) (Theorem 2). For example, with L1 norm, the RSC condition is satisfied after n_2 > s \log p. Once the RSC condition is satisfied, the estimation error ||\Delta||_2 decreases as \sqrt{s \log p^2 / \min(n_1, n_2)}. Thus, the change can be accurately estimated with \min(n_1, n_2) = O(s \log p^2). R4 raised the question about the connection between Assumption 3 in (Liu et al. 2014b) and RSC condition in our analysis – we briefly address this below. Assumption 3 in (Liu et al. 2014b) states that the largest eigenvalue of sample fisher matrix information (“I”) is bounded. We do not have an equivalent assumption, and our analysis only relies on the behavior of the smallest eigenvalue. Since “I” is equal to the second derivative of empirical log partition function (\hat{\Psi}), lower bounding the restricted smallest eigenvalue of “I” is related to satisfying the RSC condition. Thus, the RSC condition is more related to Assumption 1 in (Liu et al., 2014b). Interestingly, in our analysis, we proved that when the number of samples (n_2) crosses the sample complexity, the RSC condition will hold with high probability (Theorem 2). On the other hand, Liu et al. (2014b) only assumed that Assumption 1 holds without subsequently proving when such an assumption will hold.