Timezone: »
Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to backpropagation for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory of feedback alignment algorithms. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a net-work trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment, and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.
Author Information
Maria Refinetti (Laboratoire de Physique de l’Ecole Normale Supérieure Paris)
Stéphane d'Ascoli (ENS / FAIR, Paris)
Ruben Ohana (Ecole Normale Supérieure & LightOn)
Sebastian Goldt (International School of Advanced Studies (SISSA))
I'm an assistant professor working on theories of learning in neural networks.
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Align, then memorise: the dynamics of learning with feedback alignment »
Wed. Jul 21st 01:20 -- 01:25 PM Room
More from the Same Authors
-
2021 : ROPUST: Improving Robustness through Fine-tuning with Photonic Processors and Synthetic Gradients »
Alessandro Cappelli · Ruben Ohana · Julien Launay · Laurent Meunier · Iacopo Poli -
2021 : On the interplay between data structure and loss function: an analytical study of generalization for classification »
Stéphane d'Ascoli · Marylou Gabrié · Levent Sagun · Giulio Biroli -
2023 Poster: Neural networks trained with SGD learn distributions of increasing complexity »
Maria Refinetti · Alessandro Ingrosso · Sebastian Goldt -
2023 Poster: Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances »
Ruben Ohana · Kimia Nadjahi · alain rakotomamonjy · Ralaivola Liva -
2022 Poster: The dynamics of representation learning in shallow, non-linear autoencoders »
Maria Refinetti · Sebastian Goldt -
2022 Poster: Deep symbolic regression for recurrence prediction »
Stéphane d'Ascoli · Pierre-Alexandre Kamienny · Guillaume Lample · Francois Charton -
2022 Poster: Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension »
Bruno Loureiro · Cedric Gerbelot · Maria Refinetti · Gabriele Sicuro · FLORENT KRZAKALA -
2022 Poster: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2022 Spotlight: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2022 Spotlight: Deep symbolic regression for recurrence prediction »
Stéphane d'Ascoli · Pierre-Alexandre Kamienny · Guillaume Lample · Francois Charton -
2022 Spotlight: The dynamics of representation learning in shallow, non-linear autoencoders »
Maria Refinetti · Sebastian Goldt -
2022 Spotlight: Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension »
Bruno Loureiro · Cedric Gerbelot · Maria Refinetti · Gabriele Sicuro · FLORENT KRZAKALA -
2021 Poster: Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed »
Maria Refinetti · Sebastian Goldt · FLORENT KRZAKALA · Lenka Zdeborova -
2021 Spotlight: Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed »
Maria Refinetti · Sebastian Goldt · FLORENT KRZAKALA · Lenka Zdeborova -
2021 Poster: ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases »
Stéphane d'Ascoli · Hugo Touvron · Matthew Leavitt · Ari Morcos · Giulio Biroli · Levent Sagun -
2021 Poster: Continual Learning in the Teacher-Student Setup: Impact of Task Similarity »
Sebastian Lee · Sebastian Goldt · Andrew Saxe -
2021 Spotlight: ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases »
Stéphane d'Ascoli · Hugo Touvron · Matthew Leavitt · Ari Morcos · Giulio Biroli · Levent Sagun -
2021 Spotlight: Continual Learning in the Teacher-Student Setup: Impact of Task Similarity »
Sebastian Lee · Sebastian Goldt · Andrew Saxe -
2020 Poster: Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime »
Stéphane d'Ascoli · Maria Refinetti · Giulio Biroli · Florent Krzakala -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Analyzing the dynamics of online learning in over-parameterized two-layer neural networks »
Sebastian Goldt