Timezone: »
Mean-field theory is widely used in theoretical studies of neural networks. In this paper, we analyze the role of depth in the concentration of mean-field predictions for Gram matrices of hidden representations in deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. It is postulated that the mean-field predictions suffer from layer-wise errors that amplify with depth. We demonstrate that BN avoids this error amplification with depth. When the chain of hidden representations is rapidly mixing, we establish a concentration bound for a mean-field model of Gram matrices. To our knowledge, this is the first concentration bound that does not become vacuous with depth for standard MLPs with a finite width.
Author Information
Amir Joudaki (Swiss Federal Institute of Technology)
Hadi Daneshmand (MIT)
Francis Bach (INRIA - Ecole Normale Supérieure)
More from the Same Authors
-
2023 : Differentiable Clustering and Partial Fenchel-Young Losses »
Lawrence Stewart · Francis Bach · Felipe Llinares-Lopez · Quentin Berthet -
2023 Poster: Efficient displacement convex optimization with particle gradient descent »
Hadi Daneshmand · Jason Lee · Chi Jin -
2023 Poster: Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy »
Blake Woodworth · Konstantin Mishchenko · Francis Bach -
2022 Poster: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2022 Spotlight: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2022 Poster: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2022 Spotlight: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2021 Poster: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2021 Spotlight: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2020 : Q&A with Francis Bach »
Francis Bach -
2020 : Talk by Francis Bach - Second Order Strikes Back - Globally convergent Newton methods for ill-conditioned generalized self-concordant Losses »
Francis Bach -
2020 Poster: Stochastic Optimization for Regularized Wasserstein Estimators »
Marin Ballu · Quentin Berthet · Francis Bach -
2020 Poster: Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization »
Hadrien Hendrikx · Lin Xiao · Sebastien Bubeck · Francis Bach · Laurent Massoulié -
2020 Poster: Consistent Structured Prediction with Max-Min Margin Markov Networks »
Alex Nowak · Francis Bach · Alessandro Rudi -
2020 Poster: Structured Prediction with Partial Labelling through the Infimum Loss »
Vivien Cabannnes · Alessandro Rudi · Francis Bach -
2019 Invited Talk: Online Dictionary Learning for Sparse Coding »
Julien Mairal · Francis Bach · Jean Ponce · Guillermo Sapiro -
2018 Poster: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2018 Oral: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2017 Poster: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié -
2017 Talk: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié