Paper ID: 1033 Title: Robust Principal Component Analysis with Side Information Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies an extension of the classic PCP (principal component pursuit) problem, R = L (low-rank) + S (sparse), where L is assumed to be represented by the columns and rows of two basis matrices given in advance, i.e., L = XHY^T. The authors show that both the low-rank and sparse component can be exactly recovered by using a variation of PCP, called PCPF: min_{H,S} |H|_* + \lambda |S|_1, s.t., R = XHY^T + S. In the special case of X=Y=I, PCPF recovers the results of PCP. Experimental results on randomly generated matrices and hand written digits show that PCPF can make a distinct improvement over PCP, as long as the side information encoded in X and Y is useful. Clarity - Justification: The writing is generally good. But I have few questions need the the authors to clarify: 1. The idea of L = XHY^T is similar to Low-Rank Representation (LRR), which assumes that the target low-rank matrix L is represented by the linear combination of the columns in a given dictionary X, i.e., L = XH. The main difference is that PCPF uses two dictionaries, X and Y, instead of a single X. The connections to LRR should be made clear. 2. While standard, the incoherent condition is actually not so consistent with the structure of real data (e.g., see "Recovery of Coherent Data via Low-Rank Dictionary Pursuit, Liu et al., NIPS 2014"). So, it is important to justify the physical meaning of the coherence parameters \mu_0 and \mu_1 defined in (5), (6) and (7). Even more, it seems that there is a contradiction in \mu_1. For PCPF to outperform PCP, the value of d should be small, and the smaller the better. But when d goes small, the coherence parameter \mu_1 could increase. It is better if the authors can show under which conditions \mu_0 and \mu_1 are upper bounded by a numerical constant. Significance - Justification: It is interesting to provide a theorem for min_{H,S} |H|_* + \lambda |S|_1, s.t., R = XHY^T + S, which has not been studied before. The analysis is rather standard though. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Figure 1(c) looks strange to me. Why PCPF is better than PCP while d/n = 1? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a robust PCA method which uses side information. The side information is given as a known low-dimensional row and column subspaces (X and Y), and the low rank matrix is estimated in this limited space (3). The authors obtain a theoretical guarantee of this method, which is an extension of Candes et al.(2011). The good performance is emprically shown. Clarity - Justification: Clearly written. Significance - Justification: The proof of theoretical guarantee seems not trivial with some complications, but the main result is what readers naturally expect. (at least not surprising.) The authors seem to overlook the fact that the feasibility condition is so strong that it already provides the relevant (and sufficient) subspaces that the column and the rows of the low-rank matrix L lies. So, the rank of XHY is already upper bounded by d in the setting of Theorem 1. All points that the authors noted after Theorem 1 are what readers would already expect immeidately after they see (3) and (4). It is also natural that the proposed method empirically shows good performance when reliable low-dimensional subspaces for the low rank part is specified. But still, formaly proving Theorem 1 is a steady contribution, and the proof seems not trivial. So I think this work is worth publishing somethwere at least, but I'm not sure if this should be in ICML. My relative low score is because I see this work as an incremental work, rather than a breakthrough. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I would suggest the authors to sell this paper as a steady paper which extends the proof of Candes to the case when the low rank matrix is known to lie in a small dimensional subspace. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a convex problem with side information of row/column entiites incorporated within the context of robust principal component analysis (PCA) and conclude the usefulness of side information. Under regulazied feasibility and incoherence conditions, the authors study that to what extent the side information would be able to help the recovery of robust PCA in a theoretical manner. The proof of their main result is not trivial compared to proof structure without side informatioon. since it needs taking into consideration of mapping and projections with different dimensions. Clarity - Justification: In practice, side information or feature can alwasy exist. Disgard such information will be a loss to make further analysis and decision. The paper models robust PCA by incorparating the side information, and prove such auxiliary information would be helpful for the inference. Significance - Justification: In theory proof, this paper has two major difference to the analysis of standard robust PCA withoug side information at dual certification and the different dimensions of low rank space and sparse support. They overcome the tricks in the proof caused by the side information. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The result of this paper is important to support modeling the robust PCA incoprating the side information. However, it raises the challenge in theory proof altought the proof structure is standard. =====