We thank the reviewers for the detailed comments; these will help improve the final version of our paper if accepted.$ Reviewer 1: We'd like to emphasize the novelty of our work, clarifying the difference between our work and the existing ones, especially Zhou’s work. It should be noted firstly that our work focuses on learning Granger causality for Hawkes process, a challenging and important problem which is not considered in most of existing works, including Zhou’s work. The objective of our work is to advance the state-of-art and draw attention to research of Granger causality in the context of point processes. Focusing on your comments: 1) The definition of Granger causality for point process is based on the local independence graph of event types. The definition 3.1 is for general point process in (Didelez, 2008). Our theorem 4.1, however, provides the explicit sufficient and necessary condition of Granger causality between different event types for Hawkes processes, which is the key starting point of our algorithm. 2) Our work is different from Zhou’s work in several substantial ways: a) Zhou’s work does not consider the learning problem of Granger causality and cannot solve it. The relationship between Granger causality and impact function is not discussed at all in his work. Without sparse-group-lasso and pairwise similarity, his method cannot infer Granger causality robustly from learned impact function. b) In terms of methodology, using predefined basis function, our method is convex and has much lower complexity compared with Zhou’s work. We also propose a method to select basis functions. In Zhou’s work, however, both basis function and coefficients are unknown and how to set the number of basis function is not considered. We don’t need the smoothness constraint of basis function because the smoothness of impact function can be naturally obtained by selecting any smoothed basis functions satisfying the Assumption 4.1. c) Besides Gaussian basis, any sampling function satisfying Assumption 4.1 has the potential to be the basis function and can be specified using our selection method. d) Experimental results show the superiority of our method. 3) The group-lasso in our work is mainly motivated by our problem (learning Granger causality for Hawkes processes), rather than the stationarity. The group sparsity of impact function helps us learn Granger causality robustly. The sparse regularizer is necessary when the impact function is sparse in the time domain. 4) We have investigated and shown the robustness of our algorithm to parameters in the supplementary file. Our algorithm is robust to these parameters in a wide range. More principled approach for parameter is worthy of further research. 5) The non-negative assumption of impact function is also implied in many existing works, including Zhou’s. Relaxing this constraint is left as an interesting open problem, but it is not evidence that our work has no novelty. Based on the explanations above, we wish reviewer can reconsider the overall rating of our work. Reviewer 2: 1) We agree that setting parameters systematically would further improve our results. Fortunately, we have shown in supplementary file that our algorithm is robust to these parameters in a wide range. 2) The time complexity of various methods is for a single iteration. For real-world data set, the ground truth of Granger causality is unavailable. Therefore, we use the log-likelihood of testing sequences (Table 1) for evaluation. We have done more experiments using the training sequences with various time lengths. Our MLE-SGL obtains the best result consistently. Especially when having shorter training sequences, e.g., the viewing record in merely 3 months, rather than 10 months in the paper, the superiority of MLE-SGL is more obvious. These will be added in the final version. 3) Note that we focus on learning Granger causality for Hawkes processes. To our knowledge, our work is the first attempt to combine non-parametric Hawkes model with sparse-group-lasso, and enhance the robustness of learning Granger causality. The signal processing-based basis selection method is also new for Hawkes model. Reviewer 4: 1) The basis function can be arbitrary functions satisfying the Assumption 4.1. From the viewpoint of signal processing, we treat it as a sampling function, whose cut-off frequency (and sampling rate) can be set according to the upper bound of the highest frequency of impact function. The explanation is in the supplementary file and we will explain it more clearly in the final version. 2) The sparse regularizer is necessary when the impact function has a compact support in the time domain, while the group-lasso ||A||_{1,2} regularizes the Granger causality among impact functions. These two regularizers focus on the temporal and the group sparsity of impact function, respectively.