Skip to yearly menu bar Skip to main content


Poster

Tuning-Free Stochastic Optimization

Ahmed Khaled · Chi Jin


Abstract:

Large-scale machine learning problems make the cost of hyperparameter tuningever more prohibitive. This creates a need for algorithms that can tunethemselves on-the-fly. We formalize the notion of ``tuning-free'' algorithmsthat can match the performance of optimally-tuned Stochastic Gradient Descent(SGD) at only a polylogarithmic cost given loose hints on the relevantproblem parameters. We prove that for the task of minimizing a convex and smoothor Lipschitz function over an unbounded domain, tuning-free optimization isimpossible. We discuss conditions on the stochastic gradient noise under whichtuning-free optimization is possible. In particular, we show that the recentlyproposed DoG and DoWG algorithms are tuning-free when the noise distribution issufficiently well-behaved. For the task of finding a stationary point of asmooth and potentially nonconvex function, we give a variant of SGD that matchesthe best-known high-probability convergence rate for tuned SGD at only anadditional polylogarithmic cost. However, we also give an impossibility resultthat shows no algorithm can hope to match the optimal expected convergence ratefor tuned SGD with high probability.

Live content is unavailable. Log in and register to view live content