Invited talk
in
Workshop: Identifying and Understanding Deep Learning Phenomena
Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks
Andrew Saxe
[
Abstract
]
Abstract:
In this talk I will describe several phenomena related to learning dynamics in deep networks. Among these are (a) large transient training error spikes during full batch gradient descent, with implications for the training error surface; (b) surprisingly strong generalization performance of large networks with modest label noise even with infinite training time; (c) a training speed/test accuracy trade off in vanilla deep networks; (d) the inability of deep networks to learn known efficient representations of certain functions; and finally (e) a trade off between training speed and multitasking ability.
Live content is unavailable. Log in and register to view live content