Paper ID: 148 Title: Low-Rank Matrix Approximation with Stability Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes the use of stability of the RMS error (over all the reconstructed entries) as a measure of algorithmic stability instead of uniform stability of individual entries. While this seems to be a weaker notion of stability, it allows comparision of reconstruction based on different subsets of observed data. They show that the removal of entries that are easily predicted (or which have lower reconstruction error than the RMSE) can lead to solutions that generalize better, and this leads to a low rank matrix approximation algorithm (SMA) to optimize/increase generalization. Clarity - Justification: The paper would have been easier to read if there was a clearer summary of the direction in the beginning. While the problem was clearly specified up front, I had to parse the theorem statements to figure out what they were trying to do, and how they did it. Significance - Justification: The use of overall/average error (compared to the maximum of pointwise/individual entries) seems novel, and useful (tying more directly into generalization). This allows for comparison of the results based on different subsets of observed entries, and for the use of just a subset of observed entries if it could potentially increase generalization (presumably the use of less observed data would always increase overall error, but it could potentially result in better generalization). The have compared their algorithm against a reasonable set of candidates, and the results look much better than the competition. So the SMA algorithm seems to be reasonably motivated theoretically, and has good results practically. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is interesting reading, and the results seem useful. Showing that stability as a measure of generalization, and optimizing for it (instead of always using all the observed entries) is an interesting approach. However, the paper would have been easier to read if it started with an outline/summary of the direction/algorithm being proposed so we could see how the theorems fit into the direction. The experimental results generally look good, but Figure 1 was a little strange - while it does show that the stability and generalization both seem monotone with rank, it seems to indicate an abrupt phase transition for stability, while the generalization performance seems to be a more graudually changing function. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper offers to enhance the generalization properties of Low-Rank Matrix Approximation (LRMA) algorithms by introducing a new objective function emphasizing on hard predictable ratings. The claim is both supported by theoretical properties and (excellent) experimental results. Interestingly, the objective function can be applied to every LRMA algorithms. Clarity - Justification: The paper is addressed to someone familiar with the topic, otherwise it can quickly become obscure. The Theorems and their demonstrations are correct. Yet, they are very hard to follow as they are dense and sometimes clumsy. The section 3 (SMA Algorithms) and 4 (experimental) results are far easy to follow and provide a good insight. Figures are also usefull to support behavior of SMA. The final section (Related work) is well-detailled. The code is easy to install and read. Yet, very few information is provided to re-run the experiments from scratch. Significance - Justification: This paper provides an incremental advance to Collaborative filtering by providing a simple but efficient loss based on a Stability definition. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The overall paper apply the concept of stability to Collaborative Filtering (CF) Techniques. This idea is not novel but the paper succeed in showing the usefulness of stability in the particular case of LRMA. Compared to the related works quoted in the paper, this SMA loss is easy to implement and have good empirical results. More importantly, this loss does not rely on a particular algorithm which make it a very promising tools to enhance in-used algorithms. It is a pity that that stability definition is not always consistent through the paper. First, stability is defined as a small change of the training data may have an impact on the error. In the figure 1, stability is defined as a small change in the model hyperparameters (rank) may have an impact on the error. It is all the more misleading that the figure 1 would support the original paper claim. In the end, overusing the stability concept lower the claim and mislead the reader. The Theorems and their demonstrations fully back the SMA loss. Unfortunately, they are hard to follow as they are dense and sometimes clumsy. Intermediate steps are occasionally removed while some definition could have been easily factorized. Fortunately, the remarks following the theorems provide a good intuitions of the stakes. It would have been very helpful to select the most interesting profs and extend their presentation, even rejecting technical lemmas to the supplementary materiel. A scheme explaining the different sets (R, \omega, w_1, w_2 etc.) would have been highly appreciated (and helpful). The SMA algorithm is very clear and remains coherent with the previous theorems. On the other side, it is not clear how you define/compute the \lambda that weights D(R)_omega_k. (Algorithm, l10) This is very important as the loss highly depends on these values. Can you clarify this point? The benchmark looks coherent with the literature and ensure the reported SMA scores. It is also very exhaustive. From the best of our knowledge, the reported SMA scores are outstanding. It would have been fair to point out that extrema scores (r_ij gt 5 or r_ij lt 1) were rounded (source code: (double)Recommender::predict(int,int)). Indeed, this is good for the final RMSE. Actually, once the introduction and demonstration are untangled, the paper becomes far more interesting. Having both a theoritical analysis and well-detailled experiments are is highly appreciable. The provided code can be easily installed through Maven. The code quality is slightly above average and it can be easily plugged to other Java project. A ready-to-run example (with movieLens-10M) and a more detailed README.txt would have been appreciated. It is very frustrating to explore/debug the code to try running the experiments. In summary, the paper introduce a simple but efficient new loss that has theoretical motivations and *excellent* empirical results. The paper is very promising as the loss is agnostic to any LRMA algorithm. Thus, new empirical work can be built upon the paper. Theoretical motivations, even is they are a sometimes cumbersome and clumsy, are welcomed and the remarks are very helpful. It is unfortunate that the paper lack of clarity. It greatly lower its inherent quality. Moreover it would be interesting to have an idea of how much and where the profs about stability needs to be adapted to fit the low rank hypothesis. Finally, it is also very important to clarify how the \lambda parameters are defined/computed to make the paper re-usable in practice. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper introduces a notion of stability for low-rank matrix approximation(LRMA) problems. Authors also propose an algorithm to get a stable solution for LRMA task. Authors show that their proposed method has better RMSE compared to many matrix-factorization approaches proposed in the literature. They also show empirically that train and test errors for proposed method are closer, compared to stochastic SVD, and that the performance of the proposed method improves with increased rank. Also the performance of proposed method does not degrade as much with increased sparsity. Clarity - Justification: The paper is well-written. Significance - Justification: The paper introduces the notion of stability for low-rank matrix approximation problems and proposed an algorithm to get to stable solution. The algorithm gets very good empirical results. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper introduces a notion of stability for low-rank matrix approximation(LRMA) problems. Authors also propose an algorithm to get a stable solution for LRMA task. Authors show that their proposed method has better RMSE compared to many matrix-factorization approaches proposed in the literature. They also show empirically that train and test errors for proposed method are closer, compared to stochastic SVD, and that the performance of the proposed method improves with increased rank. Also the performance of proposed method does not degrade as much with increased sparsity. The main idea of algorithm is to first get the some low-rank solution for matrix-factorization (e.g. via regularized stochastic SVD) and then improve the solution by optimizing a different learning objective that focuses on hard predictable sets of the observed entries of training data. Questions to the authors: * How long in practice does it take to generate hard predictable sets? * How long does Step 10 of the algorithm take for MovieLens and Netflix? * I think there is a typo in the algorithm, line 566, line 10 in Algoritm 1: is it meant to be UV^T? =====