Paper ID: 1036 Title: Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduce a new analysis framework for non convex low rank matrix estimation algorithm adapted from Xiao & Zhang(2013). The assumptions made are mild and satisfied by MCP, SCAD and other non-convex regularization operator. Experiments on simulation and real datasets showed the proved better convergence rate. Clarity - Justification: The presentation is good. Significance - Justification: The topic of this paper is important. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The contribution includes: 1. Proved better convergence rate with mild assumption. 2. Experiments on simulation and real datasets are both provided. 3. Provide applications on matrix completion and matrix sensing with theoretic analysis. But 1. Theorem 3.5 is not proved with experiment. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): A unified framework for low-rank estimation with non-convex penalty is proposed in this paper. The authors prove that the proposed estimator can attain a faster convergence rate than traditional low-rank estimation with nuclear norm penalty. In addition, the proposed estimator enjoys the oracle property if one more condition was added on the magnitude of singular value. The experiments on both synthetic and real dataset shows the superiority of proposed estimator. Clarity - Justification: The paper is will written but with some space for improvements. For example, the authors may want to include all of their notations in their paper, rather than put some of them into appendix. In Section 3, the definition of subspace is not completed. It would be better if the authors move the definition of subspace from appendix to Section 3 or mention that where it has been defined. Significance - Justification: This paper discusses the estimation of a class of low rank matrices with constraints on their singular values. Under certain conditions, the convergence rate of estimation is faster than traditional low-rank matrices estimation using convex penalty. I believe the progress in this paper will be well received in the community. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): In this paper, the authors propose an estimator that estimates low rank matrices using non-convex penalty. The authors prove a theory that shows the proposed estimator attains a faster convergence rate than traditional low-rank matrix estimator with nuclear norm, if the restricted strong convexity, restricted strong smoothness on the empirical loss function and several assumptions on penalty function are satisfied. Furthermore, if the singular value is larger than a certain threshold, the proposed estimator enjoys oracle property. It’s worth noting that the assumptions are clearly attainable. Utilizing the restricted strong convexity and restricted strong smoothness assumption on empirical loss function, the estimator with non-convex penalty is equivalent to a convex problem. The authors then obtain the global solution by a proximal gradient homotopy algorithm. A deterministic bound are proven in this paper, which gives the upper bound between the global optimal solution and the ground truth. The upper bound is smaller to the traditional estimator with nuclear norm penalty. The authors also give two specific low-rank matrix estimation examples: matrix completion and matrix sensing. By directly applying the main theory proposed, a faster convergence rate can be theoretically obtained in both two cases. To demonstrate the effectiveness of proposed estimator, the authors generated synthetic dataset to show that the proposed estimator reaches lower error bound on both matrix completion and matrix sensing cases. Also, the effectiveness of the estimator for matrix completion has been proved on real dataset. The results have clearly shown an advantage of proposed method towards the cutting edge methods. The fly in the ointment may be the lack of verifying the effectiveness using proposed estimator for matrix sensing on real datasets. The paper is well written but it would be clearer if the authors introduce all notations/definitions in the main text instead of introducing some of definitions in the appendix. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Non-convex penalty functions p(x) like SCAD/MCP have been observed to have superior empirical performance in practice compared to L1/nuclear norm. However there is not much theoretical understanding of such estimators for the low rank setting. For m-estimators, L(x)+p(x), where L is a ‘alpha’ restricted strong convex loss function and non-convex penalty function p(x), as long as p(x) + \zeta ||x||^2 is strongly convex and some more regularity assumptions, Loh & Wainwright 2013 (http://arxiv.org/abs/1305.2436) have shown that all stationary points have small statistical error as long as alpha > \zeta. Further one can use simple projected gradient descent style techniques to reach stationary points at geometric rate. This paper is an extension of these results to the low rank matrix recovery case, where the nonconvex regularizer is applied on the singular values. They present statistical error rates on closeness to optima using nonconvex regularizers under similar assumptions as in the above paper. In particular given measurements X_i(Theta), for some rank-r theta, the goal is to recover it from small number of measurements. The results in this paper guarantees recovery with rates for a proximal gradient based algorithm. The rates are better than the ones compared to nuclear norm regularization, for the singular vectors corresponding to the ’bigger’ singular values. This improvement is observed even in simulations on both real and synthetic data. Clarity - Justification: The model, assumptions and the results are clearly stated Significance - Justification: The paper provides rates for certain class of non-convex penalized low rank estimators. This offers a nontrivial improvement over the statistical rates of convex regularizers like nuclear norm. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The result in this paper is a bit surprising. In particular for the matrix completion problem, if the ground truth is rank-r with all singular values =1, noise variance\sigma =1/m and setting Lambda =1/m^2, implies a rate of (1/m) * \sqrt(r /m), for n =r m log(m). Whereas nuclear norm regularization has a rate of 1/m. Even though authors argue that we are not violating minimax bounds as we are considering specific type of matrices, this difference is big. Am I missing something here? If not this should be easily noticeable in simulations, by fixing the sample size to be r*mlog(m) and scaling m. Currently most simulations in paper show only the behavior as sample size increases for a fixed m, where it is hard to observe this phenomena. Figure 1e has some simulations to this effect, but it is not very clear. The paper should include some simulations with varying 'm' to show this difference of rates clearly. If such simulations are already available I encourage authors to share through some anonymous link like dropbox or goo.gl in the feedback. Another issue with the methods is that the choice of \lambda depends on the noise variance \sigma. How is it chosen for experiments in section 4.2? Is it possible to estimate it reliably and still get the improvement in rate in general? Similarly how should L_min and b be chosen in practice as they both depend on the restricted strong convexity parameter? Compare the assumptions on the regularizer (Assumption 3.3) with the assumptions in Loh & Wainwright 2013. =====