Skip to yearly menu bar Skip to main content


Invited talk
in
Workshop: Identifying and Understanding Deep Learning Phenomena

Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks

Andrew Saxe


Abstract:

In this talk I will describe several phenomena related to learning dynamics in deep networks. Among these are (a) large transient training error spikes during full batch gradient descent, with implications for the training error surface; (b) surprisingly strong generalization performance of large networks with modest label noise even with infinite training time; (c) a training speed/test accuracy trade off in vanilla deep networks; (d) the inability of deep networks to learn known efficient representations of certain functions; and finally (e) a trade off between training speed and multitasking ability.

Live content is unavailable. Log in and register to view live content