We would like to thank all reviewers for the detailed and insightful reviews. They are very helpful for further improvement of this work. We want to clarify a few issues that have been raised by the reviewers, and will incorporate these clarifications in the final version if accepted.$ 1. Stability definition in Figure 1 is not consistent with others (Assigned_Reviewer_3 & Assigned_Reviewer_7) We agree that Figure 1 may give an impression that stability/generalization error is correlated with model parameters (rank). Actually, the purpose of Figure 1 is to show the relationship between stability and generalization error in existing MA methods, for which we need a curve to show this trend and rank is one of the variables to help generate this curve. To avoid potential misunderstanding, we will revise Figure 1 to directly show the relationship between stability and generalization error by varying the set of observed entries in training data in the final version if accepted. 2. The Theorems and their demonstrations are correct but hard to follow (Assigned_Reviewer_3) Some of the intermediate steps in the proofs are omitted due to the page limit. We agree with the suggestions, will select the most interesting proofs and extend their presentations, and provide proof sketches for theorems which can be similarly proved as previous ones in the final version if accepted. 3. The \lambda value in Line 10 of Algorithm 1 (Assigned_Reviewer_3) We did not formally optimize the \lambda values, and all \lambda values are equally set to 1 in our experiments. This means that the whole set of entries and hard-predictable subsets of entries are set to be equally important to learn SMA in our experiments. Theoretically, equal \lambda values may not be optimal, but the tuning of the parameters is non-trivial because \lambda values should be related to the entry sets as well as learning rate and regularization coefficients. Therefore, we leave this part as our future work. We will report the \lambda values in the final version if accepted. 4. It would have been fair to point out that extreme scores were rounded (Assigned_Reviewer_3) Yes, the RMSE scores in the experiments are bounded within [1,5]. We will mention this in the final version if accepted. 5. A ready-to-run example (with movieLens-10M) and a more detailed README.txt would have been appreciated in the provided code (Assigned_Reviewer_3) We have created a ready-to-run example with MovieLens 10M dataset and a README file. Since we are not allowed to provide further materials during the review process, we will make the example and README file publicly available right after the end of the review process. 6. It would be interesting to have an idea of how much and where the proofs about stability needs to be adapted to fit the low rank hypothesis (Assigned_Reviewer_3) We believe that the idea of SMA is applicable to general matrix approximation problems and the key parts of proofs do not need to be adapted with/without low-rank hypothesis. We choose to apply SMA in low-rank matrix approximation because 1) it is one of the most popular MA methods and 2) limiting the scope of this paper to LRMA will make this paper more concise and easier to follow. 7. How long in practice does it take to generate hard predictable sets and how long does Step 10 of the algorithm 1 take? (Assigned_Reviewer_8) There are two key steps to select hard predictable subsets of entries: 1) run an existing MA method and 2) choose entries based on the RMSEs of entries by that method. The computation time for the second step is negligible. Therefore, the computation time mainly depends on the chosen MA method. For instance, if we choose Regularized SVD as an example, the computation time takes approximately 362s on MovieLens (10M) and 3,725s on Netflix with rank=20 on a PC with 2.7GHz CPU and 16GB memory. Note that, this step can be accelerated by 1) adjusting the learning rates of RSVD to make it converge with fewer number of iterations or 2) choosing more efficient MA methods. Our experiments showed that Step 10 of Algorithm 1 takes approximately 576s on MovieLens (10M) and 5,918s on Netflix with rank=20 and #subsets=3 on the same PC. It should be noted that the running time of SMA is comparable to that of RSVD (362s on MovieLens (10M) and 3,725s on Netflix with rank=20), which indicates that SMA will not dramatically increase the model running time compared with other methods. We will report efficiency analysis of SMA methods in the final version if space permits. 8. I think there is a typo in line 10 in Algorithm 1 (Assigned_Reviewer_8) Yes, it should be UV^T. Thanks for pointing out the typo.