We thank reviewers for the valuable feedback.$ R1 Presentation style and clarity Our objective was to provide independent, self-contained exposition of the novel statistical guarantees for VAR estimation. Given that the topic needs certain mathematical sophistication, it was necessary to cover some background, including the VAR spectral analysis and the regularized estimation error analysis. In terms of new results, we state our contributions at the end of Sec.1, and present new proof techniques in Sec.3.4 (and Sec.2.2 & 2.3 of supplement, showing details). Based on reviewer’s suggestion, we will try to improve the exposition by including more illustrative examples and adding some of the proof ideas from supplement. Significance of contributions Our work proposes a unified framework to analyze structured VAR estimation problem. Obtained theoretical results not only subsume existing results but also generalize to any norms. From the practical point of view, our results answer 2 main questions: Can the estimator of the form (1) (least-squares + regularization) be used for VAR estimation, given that the samples are not i.i.d, or do we need a new estimator? (Answer: Yes, we can use estimator (1)). For estimator (1), do we need a new analysis for every norm we use? (Answer: No, our work covers general norm R()). From theoretical point of view, we provide guidance in selecting N (number of samples) and \lambda (regularization parameter). E.g., (18) gives the minimum N needed to satisfy restricted strong convexity condition and ensure recovery guarantees. For smaller N, accurate recovery is not possible (e.g., as illustrated in simulations in Fig.1). Related Work We thank the reviewer for mentioning several references to include in our paper. Due to space limitation we tried to concentrate only on the most relevant references. We will try to include some of the papers in suggested list in the main paper, and due to space constraints may have to defer some to supplement. Non-Gaussian noise Our analysis relies on results from spectral representation of VAR, on the properties of sub-exponential martingales and on Gaussian widths as a main tool to measure set sizes, which are based on Gaussian noise assumption. The extension to other noises (e.g., sub-Gaussian, sub-exponential) is beyond the scope of current work, and will be considered in the future. There are many key challenges in such generalization, including suitable spectral representation under general noise models. Selection of lambda Based on Theorem 3.3 our results provide a bound on lambda (Sec.3.2), where we mentioned that \lambda >= O( w(\Omega_R)/\sqrt{N} ), and w(\Omega_R) is the Gaussian width of unit norm ball. The bound provides helpful guidelines on choosing \lambda, but is still in order notation, so there is scope for improving the choice in a data dependent manner. Conditions (9) and (13) The interpretation of these conditions is discussed in Sec.3.2, where we mentioned that when (9) and (13) are well behaved, the corresponding VAR model is stable and the noise covariance matrix \Sigma is positive definite with bounded largest eigenvalue. Optimality of bounds in Theorems 3.3 & 3.4 Since we have not derived the matching lower bounds, we cannot guarantee that the obtained bounds are optimal. Interestingly, the order of the bounds match those for i.i.d. setting, for which there are matching lower bounds, so it may be possible to show that our bounds are indeed optimal. Establishing lower bounds and showing optimality will be explored in future work. R2 We thank the reviewer for the positive evaluation and pointing out the strengths of our work R3 We thank the reviewer for the accurate summary of the paper which correctly characterizes all the main contributions of the paper. Presentation clarity Authors agree that some of the material is dense and might be difficult to grasp. Some additional intuition and examples will definitely help improve the exposition. In the final draft we will include some of them. Plot of theoretical bounds Note that the bounds established are effectively in order notation and have multiplicative absolute constants. Specific values of the constants are hard to determine since they depend on unknown model/data parameters and tightness of concentration inequalities. Note that related bounds in the literature for i.i.d. data also have similar constants. Theory suggests that the errors will decay at the rate O(1/sqrt{N}), and that is indeed seen in practice in provided plots. We'll make discussion more clear to highlight this aspect. Also, please see similar plots in other related work, e.g., Fig.5 in [*] or Figs.5-7 in [**]. [*] Sivakumar et al. Beyond Sub-Gaussian Measurements: High-Dimensional Structured Estimation with Sub-Exponential Designs, 2015 [**] Ravikumar et al. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence, 2011