Timezone: »
Stochastic gradient descent (SGD) and its many variants serve as the workhorses of deep learning. One of the foremost pain points in using these methods in practice is hyperparameter tuning, especially the learning rate (step size). We propose a statistical adaptive procedure called SALSA to automatically schedule the learning rate for a broad family of stochastic gradient methods. SALSA first uses a smoothed line-search procedure to find a good initial learning rate, then automatically switches to a statistical method, which detects stationarity of the learning process under a fixed learning rate, and drops the learning rate by a constant factor whenever stationarity is detected. The combined procedure is highly robust and autonomous, and it matches the performance of the best hand-tuned methods in several popular deep learning tasks.
Author Information
Lin Xiao (Microsoft Research)
More from the Same Authors
-
2020 Poster: Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization »
Hadrien Hendrikx · Lin Xiao · Sebastien Bubeck · Francis Bach · Laurent Massoulié -
2019 Poster: A Composite Randomized Incremental Gradient Method »
Junyu Zhang · Lin Xiao -
2019 Oral: A Composite Randomized Incremental Gradient Method »
Junyu Zhang · Lin Xiao -
2018 Poster: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2018 Oral: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Bo Dai · Albert Shaw · Lihong Li · Lin Xiao · Niao He · Zhen Liu · Jianshu Chen · Le Song -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms »
Jialei Wang · Lin Xiao -
2017 Talk: Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms »
Jialei Wang · Lin Xiao