Paper ID: 315
Title: Fast Rate Analysis of Some Stochastic Optimization Algorithms
Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper looks at the problem of regret analysis for stochastic online proximal gradient, RDA and online version of ADMM and shows that we can obtain the regret rates of O(ln T/T) under local strong convexity assumption on the expected value of the objective around the optimum, along with Lipschitz condition on the loss and convexity assumptions on the loss and the regularization term. They modify their analysis by considering the difference between the term to optimize (which they call Diff_T) and the regret and showing that part of the difference is a Martingale and the difference has a strongly convex quadratic term using the notion of local strong convexity which helps them obtain the O(ln T/T) rates finally. The analysis of RDA and online ADMM are similar with an extra strongly convex term added in the updates for RDA. They perform some experimental simulations to show that the experimental results seem to agree with the theoretical results.
Clarity - Justification: The math is well explained and easy to follow. The connections with the previous papers is properly explained.
Significance - Justification: This paper mainly looks at the results of online methods and obtains similar results as strongly convex optimization in a weakened condition of local strong convexity around the optimum. They exploit the fact that the local strong convexity is enough since they can use the other terms to show a martingale property so that there is a probabilistic upper bound on the remaining terms of O(lnT/T). The method seems closely related to previous analysis, however there seems to be reasonable novelty. It would be interesting to see how this method performs in the context of other classes of functions particularly the ones with smoothness and whether notions of local smoothness can be similarly exploited to get equivalent results for existing smooth function classes.
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See above.
=====
Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper analyzes three popular optimization algorithms: OPG, RDA, and OPG-ADMM. These were shown in previous works to achieve log(T)/T rate under a strongly convex loss function and 1/sqrt(T) rate under a weakly convex loss function. The present work analyzes a case of interest in between --- where the loss function is locally strongly convex in expectation (but may be weakly convex globally). This is the case for example for LASSO and logistic regression. The main result is that under this weakened condition of local strong convexity, the three analyzed algorithms achieve the same rate bounds as in the strongly convex case. In other words, the 1/sqrt(T) rate is very conservative. Synthetic experiments are used to validate the improved bound.
Clarity - Justification: The paper is well organized, contributions clearly explained and well justified. The paper should however be proof-read because there are some grammatical errors. e.g. lines 171-174, lines 260-270, and others.
Significance - Justification: The paper shows that three popular algorithms actually perform better in practice than one might expect based on existing bounds, because functions that are weakly convex are often locally strongly convex in cases of practical interest. This is a nice result because the proof approach has broad applicability (beyond the three algorithms presented) and because they are well validated by numerical experiments.
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I have no major comments. The paper should be proof-read, as I mentioned above. Another typo that I found; in line 105: should read O(1/T) rather than O(1/N) to be consistent.
=====
Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper "Fast Rate analysis of some stochastic optimization" extends the result of convergence of three stochastic approximation algorithms from strongly convex function to locally strongly convex function (around the optimum) and provides some experiments taking the Lasso framework as example.
Clarity - Justification: The paper is well written and pleasant to read. However I do not understand why the experiments corresponds to the setting studied in the paper. Since the function f is quadratic, its Hessian is constant and thus local and global strong convexity are equivalent. Moreover the result of convergence for strongly convex function of type 1/T/mu are not so meaningful in machine learning since mu is often of order 1/T. This point should have been discussed. It would be interesting to investigate if this dependence could be decreased.
Significance - Justification: I am unconvinced of this paper contributions since it just extends previous results to local strong convexity and it is in this sense to incremental. It forgets to cite that it has already been done for logistic regression in "Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression" (Bach 2014). Moreover since the authors assume the domain is bounded, the logistic function would be strongly convex. It would have been interesting to provide some interesting and used examples of functions f defined on a bounded space which are locally strongly convex and not globally.
Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is well organized and seriously written however I doubt its contributions are good enough to be accepted at ICML.
=====