Skip to yearly menu bar Skip to main content


Talk

Recovery Guarantees for One-hidden-layer Neural Networks

Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon

C4.8

Abstract: In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to \emph{local strong convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective and most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (\ReLU s), leaky \ReLU s, squared \ReLU s and sigmoids. For activation functions that are also smooth, we show \emph{local linear convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity $ d \cdot \log(1/\epsilon) \cdot \poly(k,\lambda )$ and computational complexity $n\cdot d \cdot \poly(k,\lambda) $ for smooth homogeneous activations with high probability, where $d$ is the dimension of the input, $k$ ($k\leq d$) is the number of hidden nodes, $\lambda$ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, $\epsilon$ is the targeted precision and $n$ is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity \emph{linear} in the input dimension and \emph{logarithmic} in the precision.

Live content is unavailable. Log in and register to view live content