Timezone: »
Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization. We demonstrate this theoretically for a 3-layer convolutional neural network with max-pooling, in a novel setting which extends the XOR problem. We show that this interplay implies that with overparamterization, gradient descent converges to global minima with better generalization performance compared to global minima of small networks. Empirically, we demonstrate these phenomena for a 3-layer convolutional neural network in the MNIST task.
Author Information
Alon Brutzkus (Tel Aviv University)
Amir Globerson (Tel Aviv University, Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem »
Thu. Jun 13th 06:00 -- 06:20 PM Room Grand Ballroom
More from the Same Authors
-
2022 Poster: Efficient Learning of CNNs using Patch Based Features »
Alon Brutzkus · Amir Globerson · Eran Malach · Alon Regev Netser · Shai Shalev-Shwartz -
2022 Spotlight: Efficient Learning of CNNs using Patch Based Features »
Alon Brutzkus · Amir Globerson · Eran Malach · Alon Regev Netser · Shai Shalev-Shwartz -
2021 Poster: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent »
Shahar Azulay · Edward Moroshko · Mor Shpigel Nacson · Blake Woodworth · Nati Srebro · Amir Globerson · Daniel Soudry -
2021 Oral: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent »
Shahar Azulay · Edward Moroshko · Mor Shpigel Nacson · Blake Woodworth · Nati Srebro · Amir Globerson · Daniel Soudry -
2021 Poster: Compositional Video Synthesis with Action Graphs »
Amir Bar · Roi Herzig · Xiaolong Wang · Anna Rohrbach · Gal Chechik · Trevor Darrell · Amir Globerson -
2021 Spotlight: Compositional Video Synthesis with Action Graphs »
Amir Bar · Roi Herzig · Xiaolong Wang · Anna Rohrbach · Gal Chechik · Trevor Darrell · Amir Globerson -
2021 Poster: Towards Understanding Learning in Neural Networks with Linear Teachers »
Roei Sarussi · Alon Brutzkus · Amir Globerson -
2021 Spotlight: Towards Understanding Learning in Neural Networks with Linear Teachers »
Roei Sarussi · Alon Brutzkus · Amir Globerson -
2019 Poster: Low Latency Privacy Preserving Inference »
Alon Brutzkus · Ran Gilad-Bachrach · Oren Elisha -
2019 Oral: Low Latency Privacy Preserving Inference »
Alon Brutzkus · Ran Gilad-Bachrach · Oren Elisha -
2018 Poster: Learning to Optimize Combinatorial Functions »
Nir Rosenfeld · Eric Balkanski · Amir Globerson · Yaron Singer -
2018 Poster: Predict and Constrain: Modeling Cardinality in Deep Structured Prediction »
Nataly Brukhim · Amir Globerson -
2018 Oral: Learning to Optimize Combinatorial Functions »
Nir Rosenfeld · Eric Balkanski · Amir Globerson · Yaron Singer -
2018 Oral: Predict and Constrain: Modeling Cardinality in Deep Structured Prediction »
Nataly Brukhim · Amir Globerson -
2017 Poster: Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs »
Alon Brutzkus · Amir Globerson -
2017 Poster: Learning Infinite Layer Networks without the Kernel Trick »
Roi Livni · Daniel Carmon · Amir Globerson -
2017 Talk: Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs »
Alon Brutzkus · Amir Globerson -
2017 Talk: Learning Infinite Layer Networks without the Kernel Trick »
Roi Livni · Daniel Carmon · Amir Globerson