Paper ID: 468 Title: Low-rank tensor completion: a Riemannian manifold preconditioning approach Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper addresses the problem of tensor completion, a generalization of the more popular matrix completion problem. Extensions of the matrix case to the tensor case have been considered recently and various tensor completion techniques exist. In the context of Tucker-decomposition based approach to tensor completion, the authors consider a matrix manifold optimization scheme, where certain non-convex constraints can be cast into the structure of the Riemann manifold over which optimization takes place. The novelty of the proposed algorithm relies on the use of a recent technique (Mishra and Sepulchre, 2014) for preconditioned optimization on quotient manifolds. The algorithm seems to outperform various state-of-the-art approaches. Clarity - Justification: The paper is very well written and provides a useful and clear description of manifold optimization. Significance - Justification: The preconditioning technique for the optimization over quotient manifolds seems to be a promising technique that can be applied in a broad class of constrained optimization problems. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The generalization of the matrix completion problem to tensors is an interesting and challenging problem. With respect to its matrix counterpart, the tensor approach seems to produce higher quality results and allows the analysis of largest datasets (by reshaping big matrices in a multi-array object). However, the proposed application of the preconditioning technique to this problem seems to require an approximation of the objective function that previous methods do not need. Due to the high complexity of the proposed method, it is not completely clear how such relaxation may affect the quality of the solution. However, the paper contains a very exhaustive series of synthetic and real-world experiments showing the good empirical performance of the method, especially in terms of runtime. Evaluating the algorithm's outputs against some ground-truth would make the comparison with other methods more complete and help in establishing the role of the aforementioned relaxation from a practical perspective. Misc comments: * In the introduction, the only motivation for generalizing the matrix completion problem to tensors relates to large-scale issues. It would be nice, if possible, to add some extra arguments more related to the intrinsic property of the tensor approach. * The paper uses a completion approach based on Tucker decomposition, and in the paper only Tucker decomposition-based methods are discussed. Could the proposed method apply to other decompositions, such as for example the canonical decomposition? * To some extent, the preconditioning non-standard metric, which is the key ingredient of the proposed approach, plays a role analogous to the second order terms in Newton or quasi-Newton methods. What is the link between the proposed approach and such techniques? * A key feature of the proposed technique is the optimization over equivalent classes of matrices instead of single matrices. From a theoretical point of view this guarantees that the minima of the objective are isolated. From a more practical perspective, what is the advantage of performing the optimization over the quotient manifold instead of the original manifold? Is the complexity of the problem somehow reduced? * In line 209 it is stated that "applying this approach directly for the particular cost function of (1) is computationally costly" and hence define the simplified objective $ \| X - \hat X \|$ where the completion projector $P$, that selects the `given' entries of the tensor, is removed. The simplified objective is convex in $X$, can be written as a quadratic function and hence allows the proposed Riemann approach. Naively, the new objective seems to force all unknown entries of the tensor to be small instead of letting them vary freely. Is that justified by the coset-space optimisation? If yes, it would help the reader to have more detailed comment on how the two cost functions are essentially equivalent. * Regarding again the cost function approximation of the previous point, it is not completely clear if it would be possible to follow the proposed preconditioning approach without changing the objective function. Could the authors explain what is the cause of the mentioned extra computational cost, and what are the effect of the proposed approximation in practice? * In line 476 it is stated "For fixed rank, theoretical convergence of the Riemannian algorithms are to a stationary point". Are such results straightforwardly extended to the case of (varying) preconditioning metrics? The behavior of preconditioning iterative algorithms can be hard to analyze theoretically, how do local-convergence properties of the gradient algorithm change if nonlinear nonconstant preconditioning metrics are used? * The `mean square error' used in the comparison is not explicitly defined in the experiments section. Is this based on the original cost function (1) or on the approximate objective (see above)? * More generally, it is not completely clear why the `training mean square error' is used as a score function to evaluate the performance of the various algorithms. Why do the authors use this score instead of some measure based on the ground-truth (when possible)? * What is the role of the `test' mean square error? What does time represent in the corresponding plot? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a novel Riemannian manifold optimization method for tensor completion on the space of low (fixed) rank tensors, building substantially on earlier work in this field by Kressner et al. (2014) and Mishra & Sepulchre (2014). The manifold in question is viewed as a quotient space, removing the symmetries involved in the Tucker decomposition. Furthermore, the paper suggests a Riemannian metric based on the block diagonal components of the Hessian of the least-squares tensor completion problem. This metric effectively acts as a preconditioner for the optimization procedure. Detailed experiments show the proposed method is faster and achieves as good or better local optima, compared with current state-of-the-art methods. Clarity - Justification: The paper takes on the formidable task of succinctly explaining a large body of technical background regarding manifold optimization. I believe they do so successfully, given the space constraints. In addition the authors provide code for fully reproducing their experiments, which is admirable. Significance - Justification: This paper is essentially "lifting" the R3MC method introduced by Mishra & Sepulchre (2014) into the tensor domain. This requires overcoming certain technical difficulties, but none of them are profound. However, the experiments in this paper are thorough (including those presented in the supplemental material) and show a very clear advantage of the proposed method over the state-of-the-art across a wide variety of conditions, in terms of both accuracy and speed. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1. The introduction could be improved by providing a better motivation for solving the problem of low-rank tensor completion. 2. lines 226-229: what happens if the rank conditions are not satisfied, e.g. r1 is bigger than r2*r3? Does the matrix G_d G_d^\top become PSD instead of PD? 3. lines 479-481: how does one know they've reached the global minimum in these cases? Is that in a simulation setting? 4. The computational complexity of the online setting seems prohibitively expensive, depending on n1, n2 and n3. Am I right that actually running the online setting over (say) n1 slice would be more expensive than just using the batch method? 5. Experiments: Kressner et al. (2014) suggest a somewhat more sophisticated initialization scheme than random. Specifically, in Section 4.3 of their paper, they suggest a heuristic method of starting from a rank (1,1,1) tensor and incrementally increasing the rank. Have the authors tried that method? 6. Experiments: For the “Ribeira” image, Kresner et al. report slightly different metrics and use slightly different experimental conditions. This paper would be stronger if it compared on exactly the same conditions and metrics as Kresner et al. minor comments line 81: missing "the" before Tucker decomposition line 190: the word "endow" is repeated ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a Riemannian manifold preconditioning approach for solving low-rank tensor completion problem. This is done by incorporating the approximate cost function hessian into the definition of inner product. Subsequently all ingredients needed for optimization on manifold, namely Riemannian gradients, retraction and vector transport are derived. Clarity - Justification: The paper is well-structured and well-written. The potential reader can go through all arguments with no difficulty. In addition to rigorous description, the paper uses very nice diagrams and tables to convey the information. Significance - Justification: The paper studies an important problem in machine learning that has important applications in recommender systems. Exploiting the simple form of the cost function of the problem, the authors define a Riemannian inner-product that uses the second-order information of the (further simplified) cost function. It teaches the reader how a flaw-less development of a matrix manifold optimization algorithm can outperform the state-of-the-art algorithms in the specific task the authors investigated and also future potential tasks. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I would like to congratulate the authors for their nice paper. It is well-motivated and well-structured. The authors got the best from matrix manifold optimization methods by exploiting the simple form of the cost function for defining a new Riemannian inner-product. The extensive experimental results show the efficacy of their proposed method. I don’t have any major comments on the paper. Minor Comment (line 227): I think the terms G_dG_d^T are in general semi-positive definite. =====