Timezone: »

Industry Panel - Talk by Lin Xiao - Statistical Adaptive Stochastic Gradient Methods
Lin Xiao

Fri Jul 17 02:45 PM -- 03:00 PM (PDT) @

Stochastic gradient descent (SGD) and its many variants serve as the workhorses of deep learning. One of the foremost pain points in using these methods in practice is hyperparameter tuning, especially the learning rate (step size). We propose a statistical adaptive procedure called SALSA to automatically schedule the learning rate for a broad family of stochastic gradient methods. SALSA first uses a smoothed line-search procedure to find a good initial learning rate, then automatically switches to a statistical method, which detects stationarity of the learning process under a fixed learning rate, and drops the learning rate by a constant factor whenever stationarity is detected. The combined procedure is highly robust and autonomous, and it matches the performance of the best hand-tuned methods in several popular deep learning tasks.

Author Information

Lin Xiao (Microsoft Research)

More from the Same Authors