Paper ID: 1052 Title: Recovery guarantee of weighted low-rank approximation via alternating minimization Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): See below. Clarity - Justification: See below. Significance - Justification: See below. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper presents a method for low rank matrix recovery by minimizing a weighted Frobenius norm objective function using alternating minimization. The main result of the paper is to show that alternating minimization recovers the underlying low-rank matrix up to a certain noise norm as long as the weights satisfy a spectral gap property and the low-rank matrix is "non-degenerate" under the given weights. Compared to existing works, the key contribution is generalization of the spectral gap property to arbitrary weights and ability to handle noise. Comments: a. The paper is well-written and is easy to follow. b. The algorithm is primarily alternating minimization with a whitening step which requires solving SDP. This makes algorithm mostly impractical and loses the advantages of alternating minimization. c. The error of recovery is measured in the standard Frobenius norm. Hence, why would one put non-binary weights if an entry of a matrix is revealed (maybe because error in presence of noise is ||W\cdot N||?). The motivation for Arora et al'2015 do not fit very well in the framework. So, it will be useful to provide a concrete application for such a setting. Overall, the paper presents a nice result but lack of solid application as well as relatively impractical algorithm indicates that the paper is not ready for publication as of yet. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proves guarantees of weighted low-rank rank approximation using a nonconvex alternate minimization under a model where the input is corrupted by additive noise. The noise is not iid element-wise, so a weighted loss function is more appropriate for the estimation. Algorithmically the paper proposes two novelties: the initialization step and the whitening step that both depend on the weight matrix. The weight matrix depends on the noise: the higher the noise at an entry is presumed, the lower the weight should be. Clarity - Justification: Both algorithms and theoretical results are well-presented and connection with earlier results is properly established. Detail: do not write [in]coherenCY but [in]coherenCE Significance - Justification: Earlier results exist on how to initialize (non-weighted) alternate minimization to guarantee convergence. This work introduces a new initialization procedure, that is an extension of previous work, but is interesting per se. Connection to previous work is well established. The work is well motivated and presented in the context of a body of work that are directly (Srebro, Bhojanapalli) or indirectly (Pennington). Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The work is a good both theoretical and algorithmic contribution to a problem of importance. In weighted estimation problems often the noise model is related to the signal: heteroskedasticity. Here this is not discussed. It would be worth it to make at least a remark on whether the result explains standard heteroskedastic noise models, such as the one use in Glove (Pennington et al.). ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers the problem of weighted low rank matrix completion, i.e, instead of the standard least squares objective, the loss function is weighted over each entry. This is a generalization of the standard matrix completion setting, in that it allows to treat each observation differently. This is helpful when the measurement confidence is different across the entries, for ex., the case of noise being non uniform across the entries. An alternating minimization algorithm is proposed to minimize this objective with some modifications to handle the weights. In particular the iterates in each step go through whitening step, which constrains the iterates to satisfy certain incoherence properties. This is achieved by solving an SDP with constraints that enforce the incoherence on the iterates. While the general weighted low rank approximation is a hard problem, certain assumptions on the underlying matrix and the weight matrix allow to solve the problem in polynomial time. In particular the assumptions are 1) Standard incoherence assumption on the matrix M, 2) The weight matrix is close to all 1's matrix in spectral norm and 3) Each row of the weight matrix satisfies a form of restricted isometry over a k-dimensional subspace. Under these assumptions the main result guarantees the recovery of a low rank matrix with error proportional to norm of the weighted noise matrix. Clarity - Justification: The setting of the problem, assumptions and the results are easy to follow. The modifications and contributions with respect to existing work has been clearly presented. Significance - Justification: This paper considers an interesting generalization of the matrix completion problem, weighted matrix completion which has many applications. The main results strengthens existing results and improves the dependence on the noise. While the specific algorithm in this paper may not be very appealing, hopefully simpler alternatives will be available in future building on this work. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 1) The assumption A2 requires W to be close to all 1's matrix in spectral norm. Hence this is not a condition on the spectral gap of W as referred to in the paper. Further this also limits the choices of weight matrix significantly. Hence the noise distributions that can be normalized to minimize ||W o N||. This limitations need to clearly discussed in the introduction. 2) This paper considered an alternating minimization algorithm for its lower computational complexity. However the proposed modification of the whitening step will increase the complexity as it requires solving a O(nr^3) size SDP in each iteration. Why is a simple thresholding step (similar to the one in Keshavan et al., 2009) not sufficient to ensure incoherence of the iterates? 3) Finally the main application of recovery under non uniform noisy measurements needs more discussion. In particular compared to unweighted matrix completion, how small can the error be under the assumptions A2 and A3? Also, given knowledge of N and under uniform sampling, is there a easy way to set W, to both minimize the error and satisfy the assumptions? Some experiments would have helped to demonstrate the noise reduction. =====