We thank all reviewers for their constructive comments. They should strengthen the final version and we are more than happy to address them.$ Reviewer_3 For practical reasons we restricted ourselves to fitting VAR processes rather than VARMA processes, since there are additional difficulties for the parameterization of VARMA processes. The parameters of R, Q and lambda were chosen to guarantee a random but stationary process. For estimation, we use the econometrics toolbox in matlab, more precisely the function vgxvarx (http://www.mathworks.com/help/econ/vgxvarx.html), which uses maximum likelihood to determine the parameters. Reviewer_5 Indeed, the proof in the supplement of Proposition 2.1 is analogous to the proof in Peters et.al 2009. However the reverse direction (that non-Gaussian errors lead to time identifiability) is non-trivial as seen by Theorem 2.2. In addition, Theorem 4.1 shows the consistency for the univariate and the multivariate algorithm. Regarding possible confounding for LiNGAM: we did not formulate this point very clearly: the confounding issue stems from cutting the time series into finite-length time windows (see Figure 5 in https://www.cs.helsinki.fi/u/ahyvarin/papers/JMLR06.pdf). The application to financial time series shows that when the model assumptions are not satisfied, the algorithm is not mislead but simply does not decide. We could include simulation results on model misspecifications, too. Reviewer_6 Given our theoretical result, we know that in the non-Gaussian case, the data can satisfy a VAR(/-MA) model at most in one time direction (at least for infinite sample size). In the Gaussian case, both directions will lead to a good fit. The independence test corresponds to a goodness-of-fit test. In the non-Gaussian case, it is not rejected for at most one time direction. Although we regard the connection to “Gaussianity measures” as interesting, to the best of our knowledge, there is no theoretical result that clarifies under which assumption the approaches based on “Gaussanity measures” are consistent. Our paper differs from these approaches in that our main thrust is to provide theoretical identifiability results. We will be happy to include a comparison/discussion.