Timezone: »
Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks
Andrew Saxe
In this talk I will describe several phenomena related to learning dynamics in deep networks. Among these are (a) large transient training error spikes during full batch gradient descent, with implications for the training error surface; (b) surprisingly strong generalization performance of large networks with modest label noise even with infinite training time; (c) a training speed/test accuracy trade off in vanilla deep networks; (d) the inability of deep networks to learn known efficient representations of certain functions; and finally (e) a trade off between training speed and multitasking ability.
Author Information
Andrew Saxe (University of Oxford)
More from the Same Authors
-
2022 Poster: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2022 Poster: The Neural Race Reduction: Dynamics of Abstraction in Gated Networks »
Andrew Saxe · Shagun Sodhani · Sam Lewallen -
2022 Spotlight: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2022 Spotlight: The Neural Race Reduction: Dynamics of Abstraction in Gated Networks »
Andrew Saxe · Shagun Sodhani · Sam Lewallen -
2017 Poster: Hierarchy Through Composition with Multitask LMDPs »
Andrew Saxe · Adam Earle · Benjamin Rosman -
2017 Talk: Hierarchy Through Composition with Multitask LMDPs »
Andrew Saxe · Adam Earle · Benjamin Rosman