Log-Normal Multiplicative Dynamics for Stable Low-Precision Deep Learning
Abstract
We propose a new algorithm enabling stable training under low-precision computations. We call this algorithm Log-normal Multiplicative Dynamics (LMD), and derive it by taking inspiration from the robustness of biological neural networks. Such networks use synapses whose size follow log-normal distribution and whose fluctuations follow noisy multiplicative dynamics. There has been no scalable algorithm to train modern deep networks that incorporates all such synaptic properties. We are able to include them in LMD by using a variational formulation where a log-normal posterior distribution is used. We show several results that involve low-precision matrix multiplications in the forward passes. This includes results for training Vision Transformer and GPT-2 from scratch. Our findings suggest that biologically inspired multiplicative dynamics offer a promising direction for future energy-efficient hardware.