We sincerely thank the reviewers for their insightful and constructive comments. We will address the specific concerns raised by each reviewer below.$ Reviewer 1 Q: The relationship with the paper “Low rank tensor recovery via iterative hard thresholding.” A: Thank you for the pointer. We can identify a few differences and connections between this work and ours: at the high level, [Holger et al.] solves tensor recovery while our approach solves tensor regression (i.e., a more supervised setting). At the algorithmic level, in the projection step, our method uses tensor power iteration (as opposed to HOSVD). This allows *early stopping* as it sequentially searches for the orthogonal subspaces. We will fix minor grammatical issues in the final version. Reviewer 2 Q: Weakness in the main theorem (Theorem 5.3) ... the authors do not show that the RIP assumption with respect to the tensor rank holds. A: It appears that there may be a misunderstanding of our main theorem, which aims to provide a possible explanation to the improved performance in the particular problem setting of our interest. The regression loss function relies on the inner product mapping < X,W >. In Table 1 on page 3, we give several examples of this mapping, which can all be transformed into matrix operations defined on the unfoldings of the tensor X and W. Therefore, even though we make the RIP analogy for X as a tensor, we actually impose the RIP assumption w.r.t. matrix rank instead of tensor rank. Similar assumption can be found in [Rauhut et al. 2015]. However, our contribution is mainly on the low-rank projection, which utilizes *clever* tensor power iterations. Our theorem justifies its merits in (1) fast convergence under reasonably good condition and (2) robustness to noise. Q: Lemma 5.2 at most guarantees that Algorithm 2 finds a stationary point. It wasn't clear why it can avoid a saddle point or avoid finding a degenerate solution. A: We initialize the low-rank projection with the tensor obtained from truncated HOSVD. With this initialization, Algorithm 2 is expected to converge to a local minimum. In fact, for Tucker tensor, convergence to a saddle point or a local maximum is *only* observed in artificially designed numerical examples [Mariya, et al. 2011]. We make the claim in Lemma 5.2 to summarize this behavior, which is later confirmed by superior empirical performance. Q: The authors argue that u(W^{t+1}) < = u(W*) but this is not true unless Algorithm 2 finds a *global* minimizer of the projection. A: The regression problem of our interest admits an approximate low-(Tucker) rank assumption, which renders the solution W before projection to be near the neighborhood of W*. Therefore, with W^{t+1} as the optimal minimizer of u in this neighborhood, the inequality follows. As a matter of fact (line #331), the loss functions for Tucker tensor at different local minima are very close to each other. Thus the choice between local minima does not really matter. In our setting, we do not require Algorithm 2 to find a global minimizer of the projection, which by itself is NP-hard. Q: An unclear assumption C^2 ||E||_F^2 < = L(W^k) (around line#509). Is this realistic? How large do we need C to be? A: Given the assumption: delta_{2R} < 1/3, line #509 directly follows if we select C > (1+\delta_{2R})/(1-3\delta_{2R}). This choice is commonly seen, see e.g [Prateek et al. 2010]. We would like to stress that our work is one of the first large-scale tensor regression algorithm based on projected gradient method. As put by other reviewers, “the scope of problem is important and our contribution is significant”. While being impressively fast and fairly simple, our algorithm outperforms state-of-art methods on spatio-temporal forecasting and multi-linear multi-task learning applications. We also provide theoretical analysis of the algorithm and guarantee its convergence. References Rauhut, Holger, Reinhold Schneider, and Željka Stojanac. "Tensor completion in hierarchical tensor representations." Compressed Sensing and its Applications. Springer International Publishing, 2015. 419-450. Ishteva, Mariya, P-A. Absil, Sabine Van Huffel, and Lieven De Lathauwer. "Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme." SIAM Journal on Matrix Analysis and Applications 32, no. 1 (2011): 115-135. Reviewer 3 Q: Check the more specialized literature on large scale tensor methods to compare with either qualitatively and/or numerically. A: To the best of our knowledge, only ALS and trace-norm minimization have been applied for tensor regression. Our work bridges existing work on tensor regression with tensor decomposition. In fact, our projection step employs rank-one updating strategy, but allows early stopping thanks to the regression setting. We also provide theoretical analysis to gain more insights of this elegant connection.