Assigned_Reviewer_1$We would like to thank the reviewer for the constructive comments. Should this paper be accepted for publication, we will address the reviewer’s comments in the final version as detailed next. 1) Thm 1: We apologize for any inaccuracy in comparing to theoretical results in prior work. This is certainly unintentional and will be attended to in the final version should this paper be accepted. We will explicitly indicate that the pioneering work of Mackey et al. does not assume a random orthogonal model for the column space, and that the result of our paper is achieved under this assumption. However, it is important to clarify that the derived sufficient conditions match a lower bound on sample complexity (order-wise). Specifically, the required number of sampled rows/columns scales linearly with the rank and the coherency parameter. Although our analysis uses the randomized model, the presented result matches the necessary conditions. Noise can be accounted for without much change in sample complexity by leveraging the results in [Candes et al 2011] and the results for sparse vector recovery. Due to space limitations, we are unable to include results for the noisy setting. 2) LINE 132: We thank the reviewer for pointing this out. In the final version, we will provide a more detailed and accurate description of the approaches presented in Mackey et al. for low rank approximation. 3) Line 187: Due to space limitations, we are unable to include the proof. However, we plan to include a reference to a technical report with the full proofs in the camera-ready version of this paper. 4) Line 298: In this part we are concerned about structured data matrices (or coherent data) where uniform random sampling cannot yield a small sketch that well-describes the data. 5) Section 3.2: We apologize for this oversight. We were not aware of the work by Kumar et al. and would like to thank the reviewer for bringing it to our attention. In the final version, we will certainly cite this work and related papers, and detail the novelty of Algorithm 3 over existing approaches, including its applicability to non-symmetric matrices, as well as underscoring that the focus of our paper is on L + S decomposition as the reviewer points out. We also would like to clarify that the presented process for column/row sampling based on exploiting the side information about the structure of the data is novel. For instance, consider matrix G \in R^(2000*6150) corresponding to Fig. 1 with n=60. In this example, we can capture the row space with almost 60 randomly sampled rows. Using these 60 rows we can easily locate the informative columns. However, if we were to start with column sampling, we would need to sample almost 4000 columns to capture the column space. Also, using other column sampling methods (like volume sampling), would lead to much higher complexity. 6) We have not really checked the applicability of adaptive sampling to the approach in Mackey et al but this is certainly worth investigating. 7) Section 4.3: We thank the reviewer for this question. We would like to clarify that in this section, we do not have a matrix decomposition problem. Here, we use (8) as a randomized linear decoding algorithm. In this simulation, we build the column space using the frames depicted in Fig. 6. Thus, we just need to perform the second step of the algorithm (representation learning). Accordingly, the complexity of our background subtraction step is O(r N_2) (N_2 is the number of frames) which is much smaller than the complexity of the matrix decomposition algorithm O(r (N_1 + N_2)) where N_1 is the length of a vectorized image. ------------ Assigned_Reviewer_3 We would like to thank the reviewer for the constructive feedback. Below we address the reviewer’s comments using an example. Suppose the low rank component is equal to the matrix G used for Fig. 1 with n= 60. In this case, if we want to capture the column space, we would need to decompose a 2000*4000 matrix (4000 columns need to be sampled randomly). However, if we sample almost 5r = 300 rows at random and decompose the matrix of sampled rows, we can learn the row space. Thus, the informative columns can be located using the learned row space. Therefore, we choose 5r=300 informative columns and decompose the matrix of sampled columns to obtain the column space. Accordingly, in lieu of decomposing a 2000*4000 matrix, we decompose a matrix of 300 rows and a matrix of 300 columns. If both column/row spaces are coherent, then we can use Algorithm 3 instead of sampling c_r r rows at random. In the final version of this paper, we plan to provide more examples to further highlight the benefits of the proposed sampling approach. --------------- Assigned_Reviewer_4 We would like to thank the reviewer for the positive feedback.