We appreciate Reviewers’ efforts and thank so much for the detailed and encouraging comments$
Both Reviewer 1 and 2 mentioned possible memory issue for spectral clustering in experiment. In the literature, a common pipeline has two steps:
1) Learn X s.t. Z=ZX
2) Optionally improve performance by e.g. spectral clustering

The main point of the paper is that our online method is comparable or even better than batch. To examine this, we fix Step 2 and use different X from different methods for experiments so as to focus on our main contribution. This allows us to conclude the improvement merely comes from Step 1

For Step 2, yes it is actually easy to derive a fully online and unified pipeline thanks to specific structure of OLRSC(D, U, V, E). The idea is performing well-known online k-means on V. Two benefits: 1) DV' can be seen as clean data (line 221) so V encodes the feature of DENOISED samples, hence better than using k-means on Z; 2) O(kd) memory suffices. Thus, in addition to update D during OLRSC, we maintain/update k centers for the past V that can be done online. Given a new sample, OLRSC solves v based on D and assigns a cluster based on the k centers. As typically k< p, total memory is still O(pd)

We also thank Reviewer 2 for suggesting online spectral clustering and the work [B] that shows another way for Step 2


Reviewer_1: Thank you for the review. It looks there are mainly two concerns:

- Whether OLRSC is fully online? Thanks for discussing the interesting problem. Yes, as explained at the beginning of rebuttal, it can be fully online (with light change). In the current version, we fix Step 2 so that we can compare Step 1 for different methods. We hope this answers your concern

- Table 2 might be confusing? In the caption we wrote “Table 2. Clustering accuracy (%) and computational time (seconds)” So the unit is second. When the numbers are too big (for SSC), we use min/hours explicitly (eg 32 min). We hope this clarifies the confusion. In any case, we can try harder to make the caption more clear

In the paper we clarified why ORPCA cannot be used here (line 207 &amp; Remark 4). Also Issue 3&amp;4 are new challenges not handled in prior work. Moreover, we suggest approximate solution u_t for efficiency and new technique to show convergence


Reviewer_2

1. Iter cost: batch vs online

Yes it's more informative to show total cost (see response to Reviewer_5). We will reword it

2. Superior to LRR

To intuitively see why LRR is inferior, note that D=YU is basis of clean data (line 221). For Y=Z and Z is grossly corrupted, LRR imposes D=ZU that yields bad basis estimation as the range of D is a subset of Z. In contrast, (2.2) can set small lambda_3 so D will be learned mainly from the first term of (2.2). In fact it is established in (Liu and Li 2014) that a good estimation of true subspace is important for LRR. They show a two-step way: learn Y so that Y contains true subspace; solve (1.1). (2.2) shares the merit but simultaneously learns the subspace and do subspace clustering

3. Relate OLRSC solution to (1.1)

Thanks for pointing out the work [A](arXiv:1404.4667) and we are more than happy to include it. Following proof of Prop. 3 in [A], we find OLRSC asymptotically fulfills first order optimality condition of (1.1). To see this, let X=UV', W1=UU', W2=VV', M1=M3=0.5I, M2=M4=0.5 lambda_1 Y'(YX+E-Z). Due to uniform bound (our Prop.7), we justify the optimality conditions. See pp. 27 of [A] for definition of W and M. Note that we replaced D with YU as lambda_3 is gradually increased to enforce D=YU

4. Implementation, parameter

We use author released code for all methods (SSC code from www.ccs.neu.edu/home/eelhami/codes.htm, ADMM version). OLRSC is FULLY written in Matlab for fair comparison on efficiency.

lambda_1 &amp; lambda_2 come from ORPCA's default setting. We have n samples so lambda_3 grows to fixed value sqrt(n/p)

5. Define problem, typo, Sec. 3, more methods/dataset, spectral clustering cost

Thanks for carefully reading our paper. We will clarify possible confusion, add pipeline PCP+SSC, face/motion data and spectral clustering cost


Reviewer_5

1. Non-asymptotic rate

We proved a numerical convergence rate of O(1/t) for {D_t} in Prop.13. We will note it in revision. We agree that regret bound is important. Yet, due to non-convexity (hard to characterize global optima) it still remains an open problem

2. Computation cost

Yes (3.1) is solved iteratively whose computation is O(pd^2 log(1/eps)) if we use the method in [I] and note that (3.1) is strongly convex over v and e. See Tab. 1 in [I]. (3.3) can be solved in O(pd^2) (line 444). So total cost of OLRSC is O(npd^2 log(1/eps)) which is better than LRR as soon as #iterLRR > d^2/p log(1/eps)


Ref
[I] Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Rich and Takac 2014
[A,B,C] See the end of Reviewer_2's comments