Timezone: »
Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization. We demonstrate this theoretically for a 3-layer convolutional neural network with max-pooling, in a novel setting which extends the XOR problem. We show that this interplay implies that with overparamterization, gradient descent converges to global minima with better generalization performance compared to global minima of small networks. Empirically, we demonstrate these phenomena for a 3-layer convolutional neural network in the MNIST task.
Author Information
Alon Brutzkus (Tel Aviv University)
Amir Globerson (Tel Aviv University, Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem »
Fri Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2019 Poster: Low Latency Privacy Preserving Inference »
Alon Brutzkus · Ran Gilad-Bachrach · Oren Elisha -
2019 Oral: Low Latency Privacy Preserving Inference »
Alon Brutzkus · Ran Gilad-Bachrach · Oren Elisha -
2018 Poster: Learning to Optimize Combinatorial Functions »
Nir Rosenfeld · Eric Balkanski · Amir Globerson · Yaron Singer -
2018 Poster: Predict and Constrain: Modeling Cardinality in Deep Structured Prediction »
Nataly Brukhim · Amir Globerson -
2018 Oral: Learning to Optimize Combinatorial Functions »
Nir Rosenfeld · Eric Balkanski · Amir Globerson · Yaron Singer -
2018 Oral: Predict and Constrain: Modeling Cardinality in Deep Structured Prediction »
Nataly Brukhim · Amir Globerson -
2017 Poster: Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs »
Alon Brutzkus · Amir Globerson -
2017 Poster: Learning Infinite Layer Networks without the Kernel Trick »
Roi Livni · Daniel Carmon · Amir Globerson -
2017 Talk: Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs »
Alon Brutzkus · Amir Globerson -
2017 Talk: Learning Infinite Layer Networks without the Kernel Trick »
Roi Livni · Daniel Carmon · Amir Globerson