Poster
in
Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
Analyzing and Improving Surrogate Gradient Training in Discrete Neural Networks Using Dynamical Systems Theory
Rainer Engelken · Larry Abbott
Keywords: [ surrogate gradients ] [ exploding/vanishing gradients ] [ chaos ] [ Jacobian ] [ Lyapunov spectrum ] [ Lyapunov exponents ] [ differentiable linear algebra ] [ RNN ]
Training Binary and Spiking Recurrent Networks and on tasks that bridge long time-horizons is challenging, as the discrete activation function renders the error landscape non-differentiable. Surrogate gradient training addresses this by replacing the discrete activation function by a differentiable function in the backward pass, but still suffers from exploding and vanishing gradients.Using dynamical systems theory, we establish a link between the vanishing and exploding gradient problem and Lyapunov exponents, that quantify divergence of nearby trajectories. We leverage differentiable linear algebra to to regularize surrogate Lyapunov exponents in a method we call surrogate gradient flossing and show that this creates slow modes in the tangent dynamics.Finally, we show that surrogate gradient flossing improves training speed and success rate on temporally challenging tasks.