Many modern learning tasks involve fitting nonlinear models which are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Due to this overparameterization, the training loss may have infinitely many global minima and it is critical to understand the properties of the solutions found by first-order optimization schemes such as (stochastic) gradient descent starting from different initializations. In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optimum. As part of our proof technique, we introduce a new potential function which captures the tradeoff between the loss function and the distance to the initial point as the iterations progress. The utility of our general theory is demonstrated for a variety of problem domains spanning low-rank matrix recovery to shallow neural network training.
Samet Oymak (University of California, Riverside)
Mahdi Soltanolkotabi (University of Southern California)
Mahdi Soltanolkotabi is an assistant professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. His research focuses on developing the mathematical foundations of data analysis at the confluence of optimization, machine learning, signal processing, high dimensional statistics, computational imaging and artificial intelligence. Mahdi is the recipient of the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), and a Google faculty research award.
Related Events (a corresponding poster, oral, or spotlight)
2019 Oral: Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? »
Thu Jun 13th 09:35 -- 09:40 AM Room Room 104
More from the Same Authors
2018 Poster: Learning Compact Neural Networks with Regularization »
2018 Oral: Learning Compact Neural Networks with Regularization »