Reviewer 5:$Incremental PLS is based on the low-rank modification of thin SVD. So we already compare with the method you are suggesting. Reviewer 7: Reviewer expresses concerns regarding significance and suggests that the contributions are minimal. All concerns stem from misunderstanding on the part of the reviewer and are easily dismissed as we discuss below. One source of confusion for the reviewer is that he perceives PLS and PCA to be same/similar problems (they are not) and misinterprets empirical comparisons with incremental PCA. Main points: Empirical Comparison: Reviewer is concerned that proposed algorithms are underperformed by incremental PLS. That is not true. If you look at obj-vs-iteration plots, then you see that the proposed algorithms and incremental PLS reach close to the optimal in about the same number of iterations. The observed runtime for incremental PLS is slightly better even though both have per-iteration computational cost of O(dk^2); the wall-clock runtime is often not a good indicator. Instead of hiding that, we highlight the difference with obj-vs-runtime plots to give a fair picture. More importantly, there are no formal guarantees for incremental PLS. In fact, THERE CANNOT BE ANY GUARANTEES FOR INCREMENTAL PLS as it has been shown to get stuck for certain distributions; see [1]. Also, a small additional runtime is a small price to pay for a guarantee that you converge to the global optimum, almost always! We discuss all this on lines 796-804. Novelty: MSG of Sec 3.1 is a new algorithm. It is not similar to the preexisting online PLS algorithm of [1]. The proposed MSG algorithm is additive, online PLS is multiplicative. Online PLS is based on a self-adjoint dilation, MSG works with covariance matrices, and is more efficient. Online PLS of [1] was presented as a heuristic modification of an online PCA algorithm by Warmuth & Kuzmin. The MEG algorithm of Sec 3.2 is similar to online PLS but derived formally in a principled manner as an instance of mirror descent. [1] does not provide any guarantees for online PLS; quoting [1]: “The analysis of Warmuth and Kuzmin no longer applies to this modification…” Lemma 3.1 is very different from Lemma 3.2 of [2]. While both suggest similar shift-and-clip procedure to be an efficient projection, the problems they are applied to are very different: - Unlike PCA, in PLS, the parameter matrix M is not a symmetric matrix, in fact it is not even a square matrix! Considering the fact that nuclear norm is not a linear operator for non-symmetric matrices, we cannot have nuclear norm equality constraints, otherwise the problem becomes non-convex. Hence, we relaxed it in Problem 3 to be an inequality constraint. In [2], the matrix M is symmetric positive semi-definite and the nuclear norm is identical to the trace, which is a linear operator. - Deriving the shift-and-clip procedure in [2] is straightforward since one can easily form the Lagrangian and take the derivatives of the trace of M. However, here, we cannot apply the same techniques. Instead, we have to work with sub differentials of nuclear norm and spectral norm, which leads to a more complicated derivation. - More importantly, in [2] the shift could be positive or negative (due to the equality constraint), but in our case we can only degrade the spectrum since the inequality constraint requires the Lagrange multiplier to be non-negative. For same reasons as above, convex relaxation presented here is different from that studied in [2]. For MEG, the standard analysis actually does not apply because we have to deal with indefinite matrices. Instead, we use a regret bound (in Lemma 3.4) for MEG and take expectation to give the bound in the stochastic setting. In summary, contributions of the paper: (a) We give convex relaxation for PLS. This is novel as well as significant. It is by no means straightforward: the tricks that apply to PCA do not apply to PLS. (b) We give two SA algorithms for the convex relaxation. One is multiplicative and resembles online PLS of [1]; the other is additive, it is novel, bears no similarity with online PLS counter to the claim of the reviewer. (c) For both of these algorithms we provide an efficient update (including fast projection step) and present iteration complexity results for the first time for any algorithm for PLS, batch or online. (d) We compare the proposed algorithms against all known variants on both real and simulated data and show that they fare well. Misc: PLS is preferred over CCA in many settings. From [3]: “The success of PLS resulted in a lot of applications in other scientific areas including bioinformatics, food research, medicine, pharmacology, social sciences, physiology–to name but a few …” [1] (Arora et al 2012, Allerton) [2] (Arora et al 2013, NIPS) [3] Rosipal et al. "Overview and recent advances in partial least squares" Springer, 2006.