Paper ID: 801 Title: Learning Granger Causality for Hawkes Processes Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposed a method to learn Granger causality for Hawkes processes. The impact functions of Hawkes Processes is modeled with a non-parametric representation involving band-limited sampling functions. The coefficients in the representation are estimated using an EM-like algorithm. The authors also relate the impact function coefficients with the edge set of the Granger Causality graph and provide an efficient method to recover the Granger Causality graph (using sparse-group-lasso regularizer with MLE estimates). Methods for automatic selection of bandwidth of basis functions is proposed in the impact function representations. Clarity - Justification: The paper is well-written, with enough background provided for readers to understand and appreciate the proposed approach. Significance - Justification: The proposed solution differs from existing literature in the unique way that that it models the impact functions with non-parametric representation and provides a technique to select the basis functions for the same. It combines many of the previously existing approaches and can be categorized as an incremental improvement as such. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is generally well-written: adequate literature review has been provided, which compares and contrasts the current work with existing solutions to the problem. It uses precise and consistent notation and provides preliminaries to make itself self-contained. The paper is technically sound and builds on top of existing literature in the field. All the claims are well-supported either by providing proofs or by providing references to previous papers in the domain. The paper's major contribution is employing a non-parametric representation for the impact functions and providing a technique to estimate the coefficients. The sparse-group lasso regularizer makes it amenable to recover the Granger Causality graph structure. The authors have carefully evaluated the strengths and weaknesses of their technique and provided a fair comparison with existing state-of-art techniques. The proposed algorithm enjoys a fast computation complexity along with the advantage of being convex. An EM-like algorithm (employing sparse-group-lasso regularization) method is proposed to efficiently estimate the impact functions of a Hawkes process. There are a few potential issues: 1. The regularization weight-decay parameters ($\alpha_S, \alpha_G$ and $\alpha_P$) still have to be hand-tuned. Further work can be done to set these parameters systematically. 2. It is not clear whether the time complexity provided for the ODE-based algorithm, LS algorithm and the proposed EM-like algorithm is for a single iteration or for the full iteration. The log-likelihood values provided in table 1 are very close for MLE, MLE-S and MLE-SGL algorithms and providing just the log-likelihood values does not explain much as to how much better does MLE-SGL perform over the other variants and why. 3. The idea of using a non-parametric representation for bases of the impact functions is not exactly new and has been used previously. The major contribution of the paper is providing a method for automatically selecting the basis functions. Sparse-group-lasso regularizer has also been employed previously and is not a very novel contribution. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The proposed model in this paper aims to learn Granger Causality from a multidimensional time series generated by Hawkes processes. The Granger causality impact function is modeled as a linear combination of basis functions and the corresponding coefficients are learned by imposing a sparse-group-lasso regularizer on the coefficient tensor. The algorithm is tested on both synthetic dataset and real-world IPTV viewing record data and the performance supports the authors' claim. Clarity - Justification: The paper is well-written and key concepts are well explained. I feel that the part (sec 4.4) about selecting the basis functions is not clear enough for me. The authors argued that the criterion of selecting is based on estimating the cut-off frequency for the impact function, but what is the candidate pool of basis functions? Is it the Fourier basis? Significance - Justification: The target problem in this paper is novel and well defined. I agree that learning Granger causality could be very useful and yet pretty hard for general point processes. The authors analyzed the problem for Hawkes processes on the theoretical side and proposed a rational algorithm to learn the underlying impact function. The learning algorithm is based on existing ones but the modelling part is valuable. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1. Is it necessary to impose the sparse-group-LASSO regularizer instead of just using the group-LASSO? To my understanding the motivation seems to be to promote temporal sparsity (line 385), but I am not persuaded that ||A||_1 is really necessary on top of ||A||_{1,2}. I wish the authors could demonstrate the necessity by showing the performance difference of sparse-group-LASSO and group-LASSO. =====