We thank the reviewers for their comments and positive assessment of our work.$ REVIEWER 1 1. As formulated in Eq 2, the problem is a sparse SVD [...] - We will add a comment to remark the connection to the sparse SVD problem. 2. The notations are counter-intuitive [...] Usually, u and v are preferred for the canonical variables. - Indeed. The x,y notation was selected to avoid confusion with the SVD used for the low-rank approximation. We can modify the notation for the final manuscript. 3. [...] the proposed method is of great interest. However, the impact of the proposed method could be broaden [...] - The focus of this paper was on the CCA application. But as mentioned above, we will remark the connection to sparse SVD. 4. in that spirit, I believe that the following papers could be referenced [...] - We thank the reviewer for the pointers. Appropriate citations will be added. 5. The method proposed for the sparse CCA problem is very interesting [...]. However, [...] the paper [...] could be used in a more general setup for sparse SVD [...] - (See points 1 and 3.) 6. The convergence for the bound is quite interesting but I would also have been interested in empirical experiments [...] to analyse the behaviour of the methods w.r.t changes in r and T. - Such an evaluation is indeed interesting. In short, we expect the results to be very data-dependent. We will include a short discussion. (See also Rev 4, point 5.) 7 [Detailed comments] 7.1 - in the abstract, to what corresponds the "ambient dimension"? [...] - It refers to the dimension of the original variables. It will be clarified. 7.2 the change of notation [...] between Eq 3 and Eq 4 is slightly confusing. - Such inconsistencies have been addressed. 7.3 The sentence "Theorem 1 [...] exponentially in r" is confusing. Should not the sentence be the contrary? [...] - We thank the reviewer for pointing this out. Indeed, we meant to write that the quality *improves* as r increases. 7.4 [...] I was wondering if the role of x and y could be swapped in the algorithm (using the transpose of A)? would the bounds be the same? [...] - Indeed, the role of the two variables could be swapped without impacting the guarantees. E.g., as the reviewer points out, if we run the algorithm on A^T, Thm 2 would be the same except with constraints on y instead of x. We will add a remark. REVIEWER 4 1. [...] The convergence result is helpful [...] while the iteration number is generally hard to attain if an accurate solution is needed. I suggest authors presenting more clarifications [...] - Indeed, this an acknowledged weakness of our guarantees. We will include a discussion following the main theorem. 2. Some expressions seem a little strange. E.g., in line 166, what is “the latter”? Also [...], “for any allowable time window” might be better to revised. - The "latter" refers to the "theoretical guarantees". Such expressions could indeed be improved and will be revised. 3. The contribution part might be too long [...]. - We agree and will modify that section accordingly. 4. The presented parallelization strategy might be a little trivial. Any algorithm [...] can utilize this way: [...] different initializations in parallel and pick up the best one [...] - The reviewer raises an interesting point. Indeed, any algorithm with a random/arbitrary initialization can be executed for multiple initializations in parallel. In general, however, to obtain guarantees in this way an exponential (in the dimension n) number of initializations may be required. In our case, the multiple "initializations/samples" concern a low (r) dimensional variable which allows for our theoretical guarantees. 5. Since the proposed algorithm introduces another parameter (rank), how to properly tune it in experiments? Is it sensitive to the performance? - This is an interesting point. In short, the higher the parameter r is, the better the output quality (at the cost of runtime) in theory. In practice, if T is smaller than required by the theory, then the tuning of parameter r needs to be investigated. We will add a brief discussion. 6. Besides PMD, there are many other methods proposed for sparse PCA/CCA tasks. [...] - We compared with PMD because i) it is widely used in the context of CCA and ii) we consider the same objective function. Further evaluation is indeed required and is ongoing. REVIEWER 5 1. Does the proposed algorithm outperform that of Witten et al. for [the brain imaging] application? Are there settings where it performs worse than that of Witten et al.? - The brain-imaging experiment aimed to qualitatively assess the performance of our algorithm on a real dataset. The quantitative comparison with Witten et al was limited to the breast-cancer dataset (partially because it was used by Witten et al in their own evaluation). Further evaluation is required and is ongoing work.