ICML Talk Recovery Guarantees for One-hidden-layer Neural Networks

Talk

Recovery Guarantees for One-hidden-layer Neural Networks

Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon

C4.8

[ Abstract ]

Abstract: In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to \emph{local strong convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective and most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (\ReLU s), leaky \ReLU s, squared \ReLU s and sigmoids. For activation functions that are also smooth, we show \emph{local linear convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity

d \cdot \log (1 / ϵ) \cdot \poly (k, λ)

$d \cdot \log(1/\epsilon) \cdot \poly(k,\lambda )$ and computational complexity

n \cdot d \cdot \poly (k, λ)

$n\cdot d \cdot \poly(k,\lambda)$ for smooth homogeneous activations with high probability, where

d

$d$ is the dimension of the input,

k

$k$ (

k \leq d

$k\leq d$ ) is the number of hidden nodes,

λ

$\lambda$ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer,

ϵ

$\epsilon$ is the targeted precision and

n

$n$ is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity \emph{linear} in the input dimension and \emph{logarithmic} in the precision.

Live content is unavailable. Log in and register to view live content