Poster
in
Workshop: HiLD: High-dimensional Learning Dynamics Workshop
Fast Test Error Rates for Gradient-based Algorithms on Separable Data
Puneesh Deora · Bhavya Vasudeva · Vatsal Sharan · Christos Thrampoulidis
Recent works have shown that when training a linear predictor over separable data using a gradient method and an exponentially-tailed loss function, the predictor asymptotically converges in direction to the max-margin predictor. Consequently, the predictors asymptotically do not overfit. A recent study revealed that for certain gradient methods, overfitting does not occur even non-asymptotically, by obtaining finite-time generalization bounds for gradient flow, gradient descent (GD) and stochastic GD. In this work, we continue this line of research and obtain finite-time generalization bounds for other first-order methods, namely normalized GD and Nesterov's accelerated GD. Our results show that for these methods, faster train loss convergence aligns with improved generalization in terms of the test error.