Skip to yearly menu bar Skip to main content


Talk

Recovery Guarantees for One-hidden-layer Neural Networks

Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon

C4.8

Abstract: In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to \emph{local strong convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective and most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (\ReLU s), leaky \ReLU s, squared \ReLU s and sigmoids. For activation functions that are also smooth, we show \emph{local linear convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity dlog(1/ϵ)\poly(k,λ) and computational complexity nd\poly(k,λ) for smooth homogeneous activations with high probability, where d is the dimension of the input, k (kd) is the number of hidden nodes, λ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, ϵ is the targeted precision and n is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity \emph{linear} in the input dimension and \emph{logarithmic} in the precision.

Live content is unavailable. Log in and register to view live content