ICML Analyzing and Improving Surrogate Gradient Training in Discrete Neural Networks Using Dynamical Systems Theory

Poster
in
Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators

Analyzing and Improving Surrogate Gradient Training in Discrete Neural Networks Using Dynamical Systems Theory

Rainer Engelken · Larry Abbott

Keywords: [ surrogate gradients ] [ exploding/vanishing gradients ] [ chaos ] [ Jacobian ] [ Lyapunov spectrum ] [ Lyapunov exponents ] [ differentiable linear algebra ] [ RNN ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Training Binary and Spiking Recurrent Networks and on tasks that bridge long time-horizons is challenging, as the discrete activation function renders the error landscape non-differentiable. Surrogate gradient training addresses this by replacing the discrete activation function by a differentiable function in the backward pass, but still suffers from exploding and vanishing gradients.Using dynamical systems theory, we establish a link between the vanishing and exploding gradient problem and Lyapunov exponents, that quantify divergence of nearby trajectories. We leverage differentiable linear algebra to to regularize surrogate Lyapunov exponents in a method we call surrogate gradient flossing and show that this creates slow modes in the tangent dynamics.Finally, we show that surrogate gradient flossing improves training speed and success rate on temporally challenging tasks.

Chat is not available.

Poster in Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators

Analyzing and Improving Surrogate Gradient Training in Discrete Neural Networks Using Dynamical Systems Theory

Rainer Engelken · Larry Abbott

Poster
in
Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators