Recovery Guarantees for One-hidden-layer Neural Networks
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon

Mon Aug 7th 01:30 -- 01:48 PM @ C4.8

In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to \emph{local strong convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective and most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (\ReLU s), leaky \ReLU s, squared \ReLU s and sigmoids. For activation functions that are also smooth, we show \emph{local linear convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity $ d \cdot \log(1/\epsilon) \cdot \poly(k,\lambda )$ and computational complexity $n\cdot d \cdot \poly(k,\lambda) $ for smooth homogeneous activations with high probability, where $d$ is the dimension of the input, $k$ ($k\leq d$) is the number of hidden nodes, $\lambda$ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, $\epsilon$ is the targeted precision and $n$ is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity \emph{linear} in the input dimension and \emph{logarithmic} in the precision.

Author Information

Kai Zhong (University of Texas at Austin)
Zhao Song (UT-Austin)
Prateek Jain (Microsoft Research)
Peter Bartlett (UC Berkeley)
Inderjit Dhillon (UT Austin & Amazon)

Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 160 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow and an AAAS Fellow.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors