We thank all the reviewers for their comments and appreciation of the work. As asked, we will include a short motivation for the tensor completion problem in the introduction. Below are detailed answers.$ # Reviewer 1 - On r1 bigger than r2*r3 Yes, in this case G_d G_d^T is PSD. However, the constraint r1 <= r2*r3 is not ad-hoc as it is required to define a notion of “full-rank” core tensor G, i.e., all its unfoldings are full-rank. Consequently, G_d G_d^\top is PD. - On Lines 479-481 about global convergence Yes, it is an empirical observation based on our experiments. - On rank increasing strategy of Kressner et al. (2014). No, we have not tried that strategy. Proposing effective rank strategies is a future research direction. - On experiments with Kressner et al. (2014) The metrics are the same as those of Kressner et al. Our paper uses MSE, whereas they use normalized root MSE (equivalent up to squaring and multiplying by a constant). Resizing of the tensor is same. While the ranks in that paper are (15,15,6) and (65,65,7), ours is only (15,15,6) due to space constraint. Also, while their observation ratios are 10%, 30%, and 50%, ours are 5% and 10%. 5% is a more challenging case. Finally, the case with rank (15,15,6) and 10% observation ratio is the exactly same case as that of Kressner et al. - On computational complexity of batch and online variants There is a small typo in the computational cost for the online variant. If we assume that we move along n_3 and that we have seen T slices, then the computational cost of the current update is O(...+ T r_3^2 +...) instead of O(...+ n_3 r_3^2 +...). The factor T r_3^2 comes from the fact that we orthonormalize U_3, which is a Txr matrix (this, however, could be done once a while in a practical setting). Consequently, after sweeping through n_3 slices, the *total* computational cost is O(...+ n_3 n_1 +...) which is higher than the batch setting, where it is O(... n_1 + n_3 + ...). However, this is not completed unexpected because the online setting is well-suited for general problems with *rank-one* gradients. In fact, the computational cost is similar to rank-constrained learning in the matrix case, e.g., the work of Shalit et al., in “Online learning in the manifold of low-rank matrices,” NIPS, 2010. # Reviewer 2 - On G_dG_d^T being positive semidefinite Yes, but not under the notion of full-rank core tensor G, i.e., conditions like r1 <= r2*r3. This is also addressed in Lines 227-279 and in the answer to Reviewer 1. # Reviewer 4 - On approximating the cost function The approximation is only for finding the inner product (9). Once the inner product is defined, the optimization itself is on the original cost function (1). Hence, there is no relaxation of the original problem. - On high complexity We beg to differ on this. The computational complexity of our methods are comparable to state-of-the-art. - On ground truth comparisons We do show comparisons based on ground truth, which corresponds not to the training set \Omega, but to the test set \Gamma. The comparison results are denoted by ‘test’ MSE. In fact, 1(a) is on \Omega, but 1(c),1(e),1(f),1(g), and 1(i) are all on the test set \Gamma. Because of the space constraint, the paper only shows few plots. Additional plots are in the supplementary section. - On extension to CP decomposition The preconditioning technique could be applied directly. Similarly, the manifold structure could be exploited in CP decomposition. However, exploiting both of them *efficiently* is a future research direction. - On connection with quasi-Newton The quasi-Newton method involves computing a second order approximation of the Hessian and then multiplying it with the gradient to obtain a search direction. An equivalent interpretation is that we modify the metric induced from a Hessian approximation, and then compute the gradient in the modified metric space. To this end, there is a link of the paper with quasi-Newton methods. However, this time it is on a quotient manifold that requires special attention in the Riemannian framework. - On benefits of working with equivalence classes The complexity of the problem is not reduced. Practically, however, the quotient manifold approach helps designing *provably* first and second order algorithms, which is not the case if equivalence classes are not exploited properly. - On relationship with coset-space optimization We are not familiar with the topic. Could you provide some references? - On convergence with variable metric The Riemannian framework does not require the metric to be constant (on the manifold) to show convergence (Absil et al., 2008, Section 4.3). - On MSE not defined in the numerical section MSE is based on the original cost function (1). It is also mentioned in Lines 613 and 489. - On test error vs time The time plot shows the performance of the test error with the *training* time in seconds.