Timezone: »
Poster
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali · Edgar Dobriban · Ryan Tibshirani
Thu Jul 16 09:00 AM -- 09:45 AM & Thu Jul 16 08:00 PM -- 08:45 PM (PDT) @ Virtual
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).
Author Information
Alnur Ali (Stanford University)
Edgar Dobriban (University of Pennsylvania)
Ryan Tibshirani (Carnegie Mellon University)
More from the Same Authors
-
2022 Poster: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2022 Spotlight: Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces »
Yinshuang Xu · Jiahui Lei · Edgar Dobriban · Kostas Daniilidis -
2020 Poster: One-shot Distributed Ridge Regression in High Dimensions »
Yue Sheng · Edgar Dobriban -
2020 Poster: DeltaGrad: Rapid retraining of machine learning models »
Yinjun Wu · Edgar Dobriban · Susan B Davidson -
2017 Poster: A Semismooth Newton Method for Fast, Generic Convex Programming »
Alnur Ali · Eric Wong · Zico Kolter -
2017 Talk: A Semismooth Newton Method for Fast, Generic Convex Programming »
Alnur Ali · Eric Wong · Zico Kolter