Timezone: »
Many modern learning tasks involve fitting nonlinear models which are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Due to this overparameterization, the training loss may have infinitely many global minima and it is critical to understand the properties of the solutions found by first-order optimization schemes such as (stochastic) gradient descent starting from different initializations. In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optimum. As part of our proof technique, we introduce a new potential function which captures the tradeoff between the loss function and the distance to the initial point as the iterations progress. The utility of our general theory is demonstrated for a variety of problem domains spanning low-rank matrix recovery to shallow neural network training.
Author Information
Samet Oymak (University of California, Riverside)
Mahdi Soltanolkotabi (University of Southern California)
Mahdi Soltanolkotabi is an assistant professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. His research focuses on developing the mathematical foundations of data analysis at the confluence of optimization, machine learning, signal processing, high dimensional statistics, computational imaging and artificial intelligence. Mahdi is the recipient of the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), and a Google faculty research award.
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? »
Fri. Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom #95
More from the Same Authors
-
2021 Poster: PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models »
Chaoyang He · Shen Li · Mahdi Soltanolkotabi · Salman Avestimehr -
2021 Spotlight: PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models »
Chaoyang He · Shen Li · Mahdi Soltanolkotabi · Salman Avestimehr -
2021 Poster: Generalization Guarantees for Neural Architecture Search with Train-Validation Split »
Samet Oymak · Mingchen Li · Mahdi Soltanolkotabi -
2021 Spotlight: Generalization Guarantees for Neural Architecture Search with Train-Validation Split »
Samet Oymak · Mingchen Li · Mahdi Soltanolkotabi -
2021 Poster: Data augmentation for deep learning based accelerated MRI reconstruction with limited data »
Zalan Fabian · Reinhard Heckel · Mahdi Soltanolkotabi -
2021 Spotlight: Data augmentation for deep learning based accelerated MRI reconstruction with limited data »
Zalan Fabian · Reinhard Heckel · Mahdi Soltanolkotabi -
2020 Poster: Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation »
Reinhard Heckel · Mahdi Soltanolkotabi -
2018 Poster: Learning Compact Neural Networks with Regularization »
Samet Oymak -
2018 Oral: Learning Compact Neural Networks with Regularization »
Samet Oymak