We are very grateful to the reviewers for the suggestive comments.$ Reviewer 1: > Connection to the literature Thank you very much for your helpful comments. We would like to add some discussions about the link to the studies that you have kindly listed. Indeed, the papers showed that the intrinsic dimension of the manifold and the smoothness of the function affect the exponent of the convergence rate, and an analogous statement can also be applied to our situation by utilizing the tensor structure. In addition to the convergence exponent, our analysis gives an explicit dependency of the constant coefficient on the number of components K and the rank d. We think this is an interesting point of the analysis. Reviewer 2: > Readability Thank you for your comment. As you pointed out, we needed to shorten the preliminary part in order to include all the results, and some explanations might have been too condensed. Based on your comment, we will definitely strengthen the preliminary part to give much more careful explanations about the notations and newly introduced concepts by moving some redundant contents in the experimental section to the supplementary material. We would be very grateful if you would appreciate that our paper contains highly theoretically advanced contents and they require some technical notations to be stated precisely (although they are standard). By the way, we believe that such a modification as mentioned above will make the presentation much clearer for every reader. > Computational cost We iterated the Gibbs sampling 500 times. It was seen that the estimator based on the 500 iteration was not improved by further iterations. Interestingly, the computational cost itself was not much different from the alternating minimization. The computation per iteration is linear to K. We think that the iteration number until convergence would also have mild dependency on K because our theoretical analysis indicates that the posterior concentrates well around the true value. > Connection to Deep Learning Thank you very much for your interesting discussion. We strongly believe that a similar discussion is applied also to the deep learning, that is, the intrinsic low dimensionality in the high dimensional parameter space accelerates the estimation efficiency. In that sense, we think that our analysis will give a piece of insight also for understanding such a complex learning system as deep learning. > Model complexity validity Actually, fitting too large model to a simple one is not a good idea because it induces overfitting. It is not limited to the tensor estimation, but also applied to any machine learning tasks including the ordinary kernel method. One good news indicated by our analysis and its slight extension is that by including a GP prior associated with a simple kernel (like a linear kernel) the Bayes method can automatically choose the appropriate kernel because it selects the appropriate rank and neglect the redundant component with a highly complicated model associated with a complicated kernel. The upper bound we derived is the best achievable error bound that balances the bias-variance trade-off. Thus, by looking at the posterior distribution on the candidate models (in other words, the candidate kernels), the best model can be found due to our analysis. Reviewer 3: Thank you very much for your positive comments. Actually, the result is quite natural, but the proof requires several additional techniques to incorporate the tensor structure. In particular, the minimax optimal rate is not straight-forwardly derived because we need to "lower bound" the complexity of the model (more precisely, the covering number). > about typos Thank you very much for pointing out that. We will fix that. =================== Additional comment: During the review period, we have noticed that the factor "K" in the coefficient in the upper bond of Theorem 1 can be removed. Thus the error bound analysis can be further tighten.