We thank all the reviewers for taking the time to study our paper and share their instructive and insightful comments.$ Assigned_Reviewer_1 ------------------- Thanks for your kind comments. The reviewer mentions some minor comments he has, but these seem to be missing -- we would appreciate if they can be included in the final review. Assigned_Reviewer_5 ------------------- Thank you for your kind comments. We would like to clarify your comment: 'Compared to ...bounds.' - whereas Arjevani et al. (2015) considers optimization algorithms whose step size are fixed throughout the optimization process (i.e., time invariant time steps), our results apply to oblivious optimization algorithms whose step sizes are allowed to change over time. In that sense, our framework forms is more general. We will make changes in the paper according to the comments, below are some specific responses: - it is not clear in eq 4) that...little confusing above. Answer: Our intention was to give a solid grasp of the basic ideas before presenting the generalized framework, so that it will be perceived more naturally. - Stochastic p-CLI, with... it not problematic ? Answer: Whereas our results give a full description of the complexity of the deterministic case, when analyzing stochastic algorithms, at the moment, we are only able to bound from below the bias term. We are currently working on extending the framework for stochastic algorithms. - how is the quadratic ...(Nesterov 2004) ? Answer: Our quadratic function is simpler. In fact, we use quadratic functions whose Hessian are scalar matrices. The explanation as to why we are able to use simpler functions is that whereas Nesterov's function is uniformly hard for a sub-class of 1st order algorithms - but with dependence on the dimension, ours are tailored for each algorithm - but do not depend on the dimension (our bound also applies to algorithms which incorporate certain conditioning mechanisms). - line 575: aren't we ... than xk-x* ? Answer: As our proof sketch applies for the smooth and strongly-convex case, both quantities are interchangeable, up a logarithmic factor which does not affect the rate of the linear convergence. - Connection to Chebyshev ... gradient. Answer: Indeed, once one establishes the fact that CG chooses the optimal candidate in the corresponding Krylov subspace, its convergence rate can be bounded from above using Chebyshev polynomials. - line 585: ... \vert s(eta)eta+1\vert. Answer: Will be fixed. Thanks! -588: a reference... Answer: Will be added. Thanks! - line 614: ...conjugate gradient. Answer: See answer above. - theorem 2: How does this result ... without knowing mu (O'Donoghue - ‎2012) ? Answer: Note that the restarting scheme in O'donoghue's 'Adaptive Restart for Accelerated Gradient Schemes' does depend on the strong convexity parameter (see equation (6), $k^*=e\sqrt{8L/\mu}$). - line 722: However (Flammarion - 2015) ... knowing mu. Answer: The results in 'From Averaging to Acceleration, There is Only a Step-size' (Corollary 1) address the non-strongly convex case solely (nothing is assumed regarding the strong convexity parameter). Note that, in accordance with our results, the 1/n^2 convergence rate is obtained by using non-stationary algorithms. - line 784 Schema 4.1...clear. Answer: We will fix and try to further clarify this scheme. Thanks! - line 798: how do the authors ... scheme 4.2 is a p-CLIs Answer: Essentially, scheme 4.2 executes a given optimization algorithm (not necessarily p-CLI) over and over again using different initialization points. Thus, running p-CLIs one after the other results in a new p-CLI (since the per-iteration dynamic stated in definition 2 is preserved). Note that, not only does scheme 4.2 preserves p-CLI but it also preserves stationarity (which is the key property exploited in the proof of corollary 1). 1) the model is designed... second order informations). Answer: We designed our framework in such way that it would allow one to investigate a wider class of optimization algorithms in the future. More specifically, although our results apply to oblivious p-CLI, it turns out that quasi-Newton methods and ellipsoid methods are also p-CLIs - although not oblivious. This gives hope that in the future we will be able to investigate these important methods using similar ideas. 2) Whereas Scheme ... - see answer above. Assigned_Reviewer_6 ------------------- Thanks for your kind comments. Minor comment: 1. How alpha... problem? Answer: Apart from the result stated in Lemma 1, i.e., alpha >= 1, it seems that alpha does not exhibit any functional dependence on the parameters of the problem. That is, one can use different values for alpha, e.g. alpha=1 in GD, and alpha=2 in AGD, with the whole range of the condition number (>=1). 2. line 584: ... min-max. Answer: Will be fixed. Thanks!