Paper ID: 336 Title: Doubly Decomposing Nonparametric Tensor Regression Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a nonparametric model for tensor regression problem and provides rates of convergence analysis. Clarity - Justification: The paper is well written and is easy to read and understand. Significance - Justification: Tensor data are increasingly becoming more common and the paper makes a relevant contribution. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper proposes a nonparametric model for tensor regression problem. The regression function is assumed to be a sum-product of one-dimensional functions. This is possible by assuming a low-rank structure on the input tensor. This structure provides improved rates of convergence (in terms of posterior convergence) compared to the naive way. The theoretical results are supplemented with simulation and empirical results on real world data set. On the whole, the paper presents a complete study and is well-written. One of the drawbacks of the paper is the novelty aspect - the model assumed and the results obtained are more or less straightforward and is along expected lines. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this submission, the authors propose double decompositions for tensor regression. They assume the problem of function learning on tensors and outputs with the Gaussian noise. The authors perform decomposition both in the functional space -- as the sum of local functions f on vectors x_r^k which would form rank-1 tensors in standard CP or HOSVD decompositions. The proposed AMNR considers a finite rank R^{*} and finite M^{*}xK local functions. The authors derive a posterior distribution and use the posterior mean as the bias estimator of AMNR. To obtain predictions, they compute the mean of the predictive distribution in a similar manner. Strengths: - an interesting decomposition in both functional and input space - a theoretical analysis of e.g. convergence rate for finite M^{*} and R^{*} Weaknesses: - perhaps experimental section that uses mostly synthetic data Clarity - Justification: The authors try to take the reader step by step to their main proposition. Significance - Justification: Regression on the higher-order models seems to be of significance to machine learning. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Major comments: 1. Can authors comment more on non-unique CP decomposition and the sign flipping? How is it performed and how does it exactly limit the sample size? (line 296) 2. How would the authors describe the family of functions this decomposition can learn? 3. The synthetic experiments seem to handle only second-order tensors in figures 2-4 and third-order 10x10x10 tensors in figure 6. Could authors provide experiments on e.g. 5th order tensors of size 10x10x...x10 and for the third order tensor analysis on 1000x1000x1000 tensors? These sizes are not uncommon in practical setting. Overall, the proposed framework feels interesting and considerations of regression on tensors is of interest to machine learning. The submission would benefit from stronger experiments. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper develops an additive-multiplicative nonparametric regression model for tensor regression function as the sum-product of the local functions. The theoretical properties and high performance in terms of one prediction are demonstrated. Clarity - Justification: The motivation is not exactly clear about how the doubly decomposing is proposed. Why it will perform better than the literature methods? I could not find the intuitive interpretation. Significance - Justification: The new model handles nonlinearity in a high-dimensional tensor space, breaks it into simple local functions by incorporating low-rank tensor decomposition. This approximation has nice theoretical properties in terms of convergence rate and consistency. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Will the models easily suffer from over-fitting problems? How can we decide the non-identification issue for low-rank or full-rank data? Can we provide any interpretation of the doubly decomposing? What about the computation efficiency? Is the algorithm scalable? =====