Paper ID: 421 Title: On the Iteration Complexity of Oblivious First-Order Optimization Algorithms Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This is a theoretical paper that gives a lower bound of iteration complexity of a class of first order optimization methods. The first order methods considered in this paper generates a solution sequence as a linear combination of the last p solutions and sub-gradients. The lower bound is characterized by the smoothness and the strong convexity of the objective, and the analysis shows how the information of these parameters affects the lower bound of iteration complexity. One particular consequence of the result is that, without knowledge of the strong convexity parameter, we cannot achieve the convergence rate as fast as the Nesterov acceleration method. Clarity - Justification: The motivation and results are well-written. It was relatively easy to follow. In particular, the proof sketch using the Chebyshev polynomial gives a good insight to the readers. Significance - Justification: The lower bound analysis is quite important. There have been a lot of papers about the complexity lower bound of the first order methods (the most relevant one is Arjevani et al., 2015). This is one of such papers. An interesting point of this paper is that it gives the lower bound with a limitation of side information (knowledge about the strong convexity parameter). And it discusses how the misspecification of the parameter affects the computational complexity. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall the paper looks interesting. The lower bound analysis is quite important for the development of optimization algorithms. In particular, it is revealed that the side-information of strong convexity (which is always difficult to obtain in practice) is crucial to obtain the best convergence. Thus the contribution of the paper will be of interest to the ICML participants. There seems still an open problem. For example, a lower bound of the stochastic optimization over a finite sum objectives (like Agarwal and Bottou, ICML2015) will be interesting to the readers. Minor comment: 1. How alpha is characterized by the parameters of the problem? 2. line 584: In the left hand side, there should be the objective function in the min-max. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors use a new framework introduced in Arjevani to propose new lower bounds for first-order optimization which are dimension-free, in contrary of the classical one of Nemirovsky, Yudin or Nesterov. This allows them to prove two interesting results: it is not possible to accelerate an algorithm: 1) without knowing the strong convexity constant in the strongly convex case, 2) with time invariant step size in the non strongly convex case. Compared to Arjevani, they take into account additional constraints on the step-size of their general algorithm to avoid mismatches between upper and lower bounds. Clarity - Justification: The paper is well-written and pleasant to read. Minor comments: - it is not clear in eq 4) that the step sizes could depend on time, this is solve in eq 10) but a little confusing above. - Stochastic p-CLI, with respect to line 394: f(Exk) is a lower bound of E(f(xk)) which is the quantity of interest. Is it not problematic ? - how is the quadratic function used to achieve the lower bound compared to the one used in (Nesterov 2004) ? - line 575: aren't we looking for the distance f(xk)-f(x*) rather than xk-x* ? - Connection to Chebyshev polynomials: it could be interesting to underline the link with the proof of conjugate gradient. - line 585: the authors forget the \vert s(eta)eta+1\vert. -588: a reference would be profitable - line 614: connection with conjugate gradient. - theorem 2: How does this result coexist with restart schema which ables to achieve optimal rate for quadratic function without knowing mu (O'Donoghue - ‎2012) ? - line 722: However (Flammarion - 2015) showed that Accelerated GD converge for quadratic function at speed (1-mu/L)^n/n^2 without knowing mu. - line 784 Schema 4.1 or 4.2 ? and this scheme is not so clear. - line 798: how do the authors known that the algorithm obtained throught scheme 4.2 is a p-CLIs Significance - Justification: The proven lower bounds are not new but solve some problem of the previous one regarding the part of the dimension. The two applications are still interesting. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is interesting, detailed and references are well chosen. I have two minors concerns. 1) the model is designed to really fit the known first order method, it is thus expected to find lower bounds which would match the upper bound known for these algorithms. It does not leave a lot of freedom to imagine smart modification of these algorithm which will improve the upper bound to the lower bound of Nemirovsky or Arjevani (such as incorporate cheap second order informations). 2) Whereas Scheme 4.2 implies the output of a restarting scheme would remain a p-CLI, the theoreme 2 is in contradiction with results on restart of accelerated method for quadratic function of O'Donoghue. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers first order convex optimization methods for smooth and smooth/strongly convex objectives and proves optimal dimension independent iteration complexity lower bounds. Clarity - Justification: As far as I checked, the paper seems technically sound. Regarding the quality of the writing, the paper is reasonably well written, the structure and language are good and overall the paper is an easy and enjoyable reading (except minore comments listed in the detailed comments). Significance - Justification: I think the paper has nice results from a theoretical point of view and helps us to better understand the iteration complexity of most of first order methods and worthy of publication. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper considers first order convex optimization methods for smooth and smooth/strongly convex objectives and proves iteration complexity lower bounds. Building on the p-Stationary Canonical Linear Iterative (p-SCLI) model which is proposed and and studied very recently in (Arjevani et al., 2015), the current paper shows optimal dimension independent $\sqrt{\kappa} \log (1/\epsilon)$ lower bound for smooth and strongly convex functions, improving on the $\kappa^{1/p} \log (1/\epsilon)$ bound in (Arjevani et al., 2015). To this end, the present paper introduces the p-Canonical Linear Iterative (p-CLI) model of optimization, a relaxation of p-SCLI to exploit strong convexity and/or smoothness parameters, that is then utilized to derive lower bounds on the iteration complexity of a wide variety of optimization algorithms. As far as I checked, the paper seems technically sound. Regarding the quality of the writing, the paper is reasonably well written, the structure and language are good and overall the paper is an easy and enjoyable reading (except minore comments listed in the detailed comments). I think the paper has nice results from a theoretical point of view and helps us to better understand the iteration complexity of most of first order methods and worthy of publication. =====