Timezone: »
Generalization and Stability of Interpolating Neural Networks with Minimal Width
Hossein Taheri · Christos Thrampoulidis
We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$,we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2\big/T)$ and generalization error $O(g(1/T)^2\big/n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=\Omega(\log^4 (T))$. Moreover, with $m=\Omega(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/ n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.Although our primary focus is on two-layer neural networks, the analysis we present has a broad applicability and can be extended to address other non-convex problems, including deep networks.
Author Information
Hossein Taheri (UC Santa Barbara)
Christos Thrampoulidis (University of British Columbia)
More from the Same Authors
-
2021 : Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation »
Ke Wang · Vidya Muthukumar · Christos Thrampoulidis -
2021 : Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization »
Ke Wang · Christos Thrampoulidis -
2021 : Label-Imbalanced and Group-Sensitive Classification under Overparameterization »
Ganesh Ramachandra Kini · Orestis Paraskevas · Samet Oymak · Christos Thrampoulidis -
2023 : Supervised-Contrastive Loss Learns Orthogonal Frames and Batching Matters »
Ganesh Ramachandra Kini · Vala Vakilian · Tina Behnia · Jaidev Gill · Christos Thrampoulidis -
2023 : Fast Test Error Rates for Gradient-based Algorithms on Separable Data »
Puneesh Deora · Bhavya Vasudeva · Vatsal Sharan · Christos Thrampoulidis -
2023 : On the Training and Generalization Dynamics of Multi-head Attention »
Puneesh Deora · Rouzbeh Ghaderi · Hossein Taheri · Christos Thrampoulidis -
2023 Poster: On the Role of Attention in Prompt-tuning »
Samet Oymak · Ankit Singh Rawat · Mahdi Soltanolkotabi · Christos Thrampoulidis -
2022 Poster: FedNest: Federated Bilevel, Minimax, and Compositional Optimization »
Davoud Ataee Tarzanagh · Mingchen Li · Christos Thrampoulidis · Samet Oymak -
2022 Oral: FedNest: Federated Bilevel, Minimax, and Compositional Optimization »
Davoud Ataee Tarzanagh · Mingchen Li · Christos Thrampoulidis · Samet Oymak -
2021 Poster: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2021 Spotlight: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2020 Poster: Quantized Decentralized Stochastic Learning over Directed Graphs »
Hossein Taheri · Aryan Mokhtari · Hamed Hassani · Ramtin Pedarsani