Skip to yearly menu bar Skip to main content


Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

Fast Test Error Rates for Gradient-based Algorithms on Separable Data

Puneesh Deora · Bhavya Vasudeva · Vatsal Sharan · Christos Thrampoulidis


Abstract:

Recent works have shown that when training a linear predictor over separable data using a gradient method and an exponentially-tailed loss function, the predictor asymptotically converges in direction to the max-margin predictor. Consequently, the predictors asymptotically do not overfit. A recent study revealed that for certain gradient methods, overfitting does not occur even non-asymptotically, by obtaining finite-time generalization bounds for gradient flow, gradient descent (GD) and stochastic GD. In this work, we continue this line of research and obtain finite-time generalization bounds for other first-order methods, namely normalized GD and Nesterov's accelerated GD. Our results show that for these methods, faster train loss convergence aligns with improved generalization in terms of the test error.

Chat is not available.