We thank all reviewers for their kind comments. The following is the response to each reviewer.$ To Reviewer_1: 1. Thanks for pointing out some related work on LRR. We read some of them and they are indeed related to our work. As you pointed out, the main difference is that in our formulation we allow two sets of features for column and row entities present, while in LRR framework it assumes only one dictionary (that corresponds to row features) exists. We will state the connections between our algorithm and LRR in the final submission. 2. It is true that smaller d would make the bound better, but smaller d does not necessary imply larger mu_1. The reason is that when d becomes smaller, max_i {\|x_i\|} (with each x_i \in R^d) can also be smaller, and thus mu_1 does not necessarily increase. We verify this experimentally: we randomly create several problems with n = 1000, rank(L) = r = 100, and feature set X, Y \in R^{n*d} jointly spans L where d is from r to n. Specifically, let L = USV', then we create X = Q_U[U, U'] and V = Q_V[V, V'] where U' \perp U and V \perp V' and Q_U and Q_Y are some orthonormal transformation. We see that mu_1 is upper bounded by 1.5 all the times. In addition, here are some mu_1 corresponds to different d: d = 200: mu_1 = 1.268, d = 220: mu_1 = 1.297, d = 700: mu_1 = 1.085, d = 760: mu_1 = 1.091. This result shows that smaller d does not necessarily imply larger \mu_1. On the other hand, it is indeed more ideal to provide scenarios where mu_0, mu_1 are bounded by a constant. Currently, we can prove that if there is an RPCA problem where the underlying L=USV' is mu-incoherent (whose definition follows Candes et al. (2011)), then if the feature set X (Y) is given by X = [U, U'], Y = [V, V'] where U \perp U' and V \perp V', it will be incoherent in PCPF with mu_0 = mu and mu_1 = \max{mu, mu'}, where mu' is the incoherence measurement of U' (0 if no U'). Such scenario, of course, is quite special, but it gives a hint to show that if the L matrix in original RPCA is incoherent, the H, X, Y in PCPF will also be incoherent provided that X and Y are aligned with true row/column spaces plus some incoherent orthogonal space. We would like to try to provide a more general scenario where incoherence of mu_0, mu_1 are bounded. In addition, the NIPS paper you mentioned is also quite interesting, and we will take some time to read it in details, which may inspire us on providing such upper bounds for mu_0, mu_1. For Fig 1(c), we checked the values in experiments and found that the relative error of PCPF with d/n=1 is slightly numerically better than PCP, albeit not significant. One reason for this finding may be due to our feature construction procedure for X and Y in the experiment. Specifically, we construct feature so that first r columns in X and Y jointly span L, and the remaining d-r columns are orthogonal to L. When d=n, such feature, which is some rotation plus permutation of I, work slightly numerically better than using I as in PCP, although in theory there is no difference between these two. However, we also find that the difference is subtle, since if we define the recovery criterion to be \eps = 0.01 in eq (9), PCP and PCPF will have same recovery result at d/n=1 in Fig 1(c) and both are recoverable for rank(L)/n <= 0.16, i.e. no gray region at the top row of Fig 1(c). To Reviewer_3: We agree with you that the feasibility condition restricts the rank of low rank matrix already. Although not mentioned explicitly, in our discussion we did note that to satisfy feasibility, d could be in range between r and n (l.347~l.352, l.723). Perhaps we could state it more clearly in our analysis to avoid any confusion. For your concern on significance, we guess it is not surprising to the reviewer because intuitively having more information generally helps, or at least won't hurt, the performance. However, to what extent side information could help RPCA is certainly unknown before, and any result beyond r ~ O(n/(log n)^2) could be possible. In that sense, we analytically answer that question to resolve the conjecture, which is a steady contribution to RPCA as the reviewer mentioned. Moreover, as all reviewers noticed, we provide a non-trivial approach to show how to incorporate side information to the proof of RPCA, including carefully designing an alternative dual certificate, a more sophisticated construction of certificates, with fixed refinement to resolve dimensionality mismatch. We think the result is important to support the usefulness of side information in RPCA in both theory and practice. To Reviewer_4: Thanks for your comments and supporting reviews.