Timezone: »
Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex; standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets, which furthermore matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation -- a straightforward application of Taylor expansions to neural networks -- and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets -- that gradients are shattered -- and investigates the hypothesis that, by exploring the space of activation configurations more thoroughly, adaptive optimizers such as RMSProp and Adam are able to converge to better solutions.
Author Information
David Balduzzi (Victoria University Wellington)
Brian McWilliams (Disney Research)
Tony Butler-Yeoman (Victoria University of Wellington)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks »
Mon. Aug 7th 12:48 -- 01:06 AM Room C4.8
More from the Same Authors
-
2021 Poster: From PoincarĂ© Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Spotlight: From PoincarĂ© Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2019 Poster: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2019 Oral: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2017 Poster: Strongly-Typed Agents are Guaranteed to Interact Safely »
David Balduzzi -
2017 Poster: The Shattered Gradients Problem: If resnets are the answer, then what is the question? »
David Balduzzi · Marcus Frean · Wan-Duo Ma · Brian McWilliams · Lennox Leary · John Lewis -
2017 Talk: Strongly-Typed Agents are Guaranteed to Interact Safely »
David Balduzzi -
2017 Talk: The Shattered Gradients Problem: If resnets are the answer, then what is the question? »
David Balduzzi · Marcus Frean · Wan-Duo Ma · Brian McWilliams · Lennox Leary · John Lewis