Talk
in
Workshop: Principled Approaches to Deep Learning

Invited Talk 2 - Surya Ganguli

2017 Talk
in
Workshop: Principled Approaches to Deep Learning

Abstract

On the Beneficial Role of Dynamic Criticality and Chaos in Deep Learning

What does a generic deep function “look like” and how can we understand and exploit such knowledge to obtain practical benefits in deep learning? By combining Riemannian geometry with dynamic mean field theory, we show that generic nonlinear deep networks exhibit an order to chaos phase transition as synaptic weights vary from small to large. In the chaotic phase, deep networks acquire very high expressive power: measures of functional curvature and the ability to disentangle classification boundaries both grow exponentially with depth, but not with width. Moreover, we apply tools from free probability theory to study the propagation of error gradients through generic deep networks. We find, at the phase transition boundary between order and chaos, that not only the norms of gradients, but also angles between pairs of gradients are preserved even in infinitely deep sigmoidal networks with orthogonal weights. In contrast, ReLu networks do not enjoy such isometric propagation of gradients. In turn, this isometric propagation at the edge of chaos leads to training benefits, where very deep sigmoidal networks outperform ReLu networks, thereby pointing to a potential path to resurrecting saturating nonlinearities in deep learning.

Chat is not available.