Paper ID: 385 Title: Estimating Structured Vector Autoregressive Models Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Paper extends results of [1] to vector autoregressive model. [1] Banerjee and Chen and Fazayeli and Sivakumar (2014). Estimation with Norm Regularization Clarity - Justification: There is a lot of notation, but little intuition. Also, the authors spend a lot of time describing background from Basu and Michailidis and how to obtain generalization bounds in high-dimensions. I think it would be much better to describe main contributions, describe how to prove them and what new techniques are needed, then present application to high-dimensional estimation. Significance - Justification: Not clear if the bounds will lead to new algorithms or practical improvements. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There are many references on high-dimensional vector auto-regression missing, including [2-7]. But google search reveals many other. Gaussian assumption on the innovations is very restrictive. How do results change if we only assume moment conditions on the epsilon vector? Note that in the case of lasso, one only needs moment bounds to obtain the same rate as in the case when errors are Normal [8]. In order to obtain a desired generalization and estimation error, one needs to choose lambda appropriately. In the case of iid data, one can use one of the methods described in [8-11] to choose the tuning parameter in a data dependent way. Can the current results be modified so that lambda can be chosen in a data dependent way here as well? In the VAR model there is a tight connection between the parameter of interest and covariance matrix and score vector. How big is the model class for which the norm of the parameters is small and also the quantities in (13) and (9) are well behaved? In other words, are the models for which the theoretical assumptions are satisfied useful in approximating applications in real world? Are the bounds obtained in theorem 3.3 and 3.4 optimal? [1] Banerjee and Chen and Fazayeli and Sivakumar (2014). Estimation with Norm Regularization [2] Basu and Shojaie and Michailidis (2015). Network Granger Causality with Inherent Grouping Structure [3] Shojaie and Michailidis (2010) Discovering graphical Granger causality using the truncating lasso penalty [4] Wang and Li and Tsa (2007). Regression coefficient and autoregressive order shrinkage and selection via the lasso [5] Nicholson and Matteson and Bien (2015). VARX-L: Structured Regularization for Large Vector Autoregressions with Exogenous Variables [6] Nicholson and Matteson and Bien (2014). Hierarchical Vector Autoregression [7] Lozano and Abe and Liu and Rosset (2009). Grouped graphical Granger modeling for gene expression regulatory networks discovery [8] Belloni, Chen, Chernozhukov, Hansen (2012). Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain [9] Gautier and Tsybakov (2013). Pivotal estimation in high-dimensional regression via linear programming [10] Belloni and Chernozhukov and Wang (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming [11] Sun and Zhang (2011). Scaled sparse linear regression ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper addresses the issue of establishing error bounds for the parameters of vector autoregressive models, found using regularized estimation. While such bounds have been found for independent data in a regression setting, autoregressive models assume linear, time-varying dependence between points in a dataset. Previously, bounds have been found for parameters estimated with regularizers such as Lasso and Group Lasso. Through tools such as spectral density and generic chaining, the authors found general error bounds for any valid norm used as a regularizer. After completing their analysis for general regularizers, they validate their results by deriving previously found bounds for the Lasso and Group Lasso regularizers. Additionally, the general bounds are used to present bounds for Sparse Group Lasso and OWL regularizers, which are novel findings. Finally, both synthetic and real datasets are used to verify their theoretical findings. Clarity - Justification: In general, I found this paper hard to follow. It is dense, but the results are technical and require mathematical sophistication. I do wish the authors had spent more effort trying to provide intuition to accompany the technical statements. Also, I may have been missing this, but in the "real data" section is seems that the authors only compare different regression methods. Why not plot the theoretical bounds (which are the main point of the paper) to compare to the empirical findings? Significance - Justification: The presented error bounds for estimated VAR parameters seem novel to my knowledge, and their impact is increased by the generality of the results. The techniques (e.g., generic chaining) are modern and powerful, so it is good to see more people putting them to use. Related work is adequately referenced in both the argument that general error bounds for arbitrary norm regularizers have not yet been found in the VAR setting, and in validating the results with previous work. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The claims made in the paper are well-supported by theoretical analysis and experimental results. The work could be made more complete by improving the clarity of the experimental results section and making stronger ties to the theory. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this article the author/s propose to estimate structured vector auto-regressive models by regularizing the least squares by adding a norm function such as lasso, group lasso or ridge. The authors provides some conditions under which the estimation error is guaranteed to be bounded by a known positive constant. The accuracy and performance of these estimators are then assessed by synthetic and real data. Clarity - Justification: This a very well written paper. Significance - Justification: Although the proposed regulation method is not novel obtaining the estimation error bound is interesting. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The probability bound on the regularized estimator is a nice result. =====