Timezone: »
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos' can lead to good performance. While the work by Samuel et al. (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.
Author Information
Soufiane Hayou (University of Oxford)
Arnaud Doucet (Oxford University)
Judith Rousseau (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: On the Impact of the Activation function on Deep Neural Networks Training »
Thu. Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom #95
More from the Same Authors
-
2022 : Riemannian Diffusion Schr\"odinger Bridge »
James Thornton · Valentin De Bortoli · Michael Hutchinson · Emile Mathieu · Yee Whye Teh · Arnaud Doucet -
2023 : Diffusion Generative Inverse Design »
Marin Vlastelica · Tatiana Lopez-Guevara · Kelsey Allen · Peter Battaglia · Arnaud Doucet · Kimberly Stachenfeld -
2023 : Categorical SDEs with Simplex Diffusion »
Pierre Richemond · Sander Dieleman · Arnaud Doucet -
2023 Poster: Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC »
Yilun Du · Conor Durkan · Robin Strudel · Josh Tenenbaum · Sander Dieleman · Rob Fergus · Jascha Sohl-Dickstein · Arnaud Doucet · Will Grathwohl -
2023 Poster: SE(3) diffusion model with application to protein backbone generation »
Jason Yim · Brian Trippe · Valentin De Bortoli · Emile Mathieu · Arnaud Doucet · Regina Barzilay · Tommi Jaakkola -
2022 Poster: Feature Learning and Signal Propagation in Deep Neural Networks »
Yizhang Lou · Chris Mingard · Soufiane Hayou -
2022 Spotlight: Feature Learning and Signal Propagation in Deep Neural Networks »
Yizhang Lou · Chris Mingard · Soufiane Hayou -
2021 Poster: Monte Carlo Variational Auto-Encoders »
Achille Thin · Nikita Kotelevskii · Arnaud Doucet · Alain Durmus · Eric Moulines · Maxim Panov -
2021 Spotlight: Monte Carlo Variational Auto-Encoders »
Achille Thin · Nikita Kotelevskii · Arnaud Doucet · Alain Durmus · Eric Moulines · Maxim Panov -
2021 Poster: Differentiable Particle Filtering via Entropy-Regularized Optimal Transport »
Adrien Corenflos · James Thornton · George Deligiannidis · Arnaud Doucet -
2021 Oral: Differentiable Particle Filtering via Entropy-Regularized Optimal Transport »
Adrien Corenflos · James Thornton · George Deligiannidis · Arnaud Doucet -
2021 Poster: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2021 Oral: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2020 Poster: Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows »
Rob Cornish · Anthony Caterini · George Deligiannidis · Arnaud Doucet -
2019 Poster: Replica Conditional Sequential Monte Carlo »
Alex Shestopaloff · Arnaud Doucet -
2019 Poster: Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets »
Rob Cornish · Paul Vanetti · Alexandre Bouchard-Côté · George Deligiannidis · Arnaud Doucet -
2019 Oral: Replica Conditional Sequential Monte Carlo »
Alex Shestopaloff · Arnaud Doucet -
2019 Oral: Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets »
Rob Cornish · Paul Vanetti · Alexandre Bouchard-Côté · George Deligiannidis · Arnaud Doucet