Timezone: »
A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, the gradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new ``looks linear'' (LL) initialization that prevents shattering, with preliminary experiments showing the new initialization allows to train very deep networks without the addition of skip-connections.
Author Information
David Balduzzi (Victoria University Wellington)
Marcus Frean (Victoria University Wellington)
Wan-Duo Ma (Victoria University of Wellington)
Brian McWilliams (Disney Research)
Lennox Leary (VUW)
John Lewis (Frostbite Labs and Victoria University)
J.P.Lewis is a numerical programmer and researcher working in computer graphics and computer vision. He has received credits on a few movies including Avatar and The Matrix Sequels, and several of his algorithms have been adopted in commercial software (Maya, Matlab). He has also worked in academic research, most recently with Victoria University in New Zealand. Lewis is currently Principal Technical Director at Frostbite Labs, Electronic Arts.
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: The Shattered Gradients Problem: If resnets are the answer, then what is the question? »
Mon. Aug 7th 01:42 -- 02:00 AM Room C4.8
More from the Same Authors
-
2021 Poster: From PoincarĂ© Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Spotlight: From PoincarĂ© Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2019 Poster: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2019 Oral: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2017 Poster: Strongly-Typed Agents are Guaranteed to Interact Safely »
David Balduzzi -
2017 Poster: Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks »
David Balduzzi · Brian McWilliams · Tony Butler-Yeoman -
2017 Talk: Strongly-Typed Agents are Guaranteed to Interact Safely »
David Balduzzi -
2017 Talk: Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks »
David Balduzzi · Brian McWilliams · Tony Butler-Yeoman