Timezone: »
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective. This Monotonic Linear Interpolation (MLI) property, first observed by Goodfellow et al. 2014, persists in spite of the non-convex objectives and highly non-linear training dynamics of neural networks. Extending this work, we evaluate several hypotheses for this property that, to our knowledge, have not yet been explored. Using tools from differential geometry, we draw connections between the interpolated paths in function space and the monotonicity of the network --- providing sufficient conditions for the MLI property under mean squared error. While the MLI property holds under various settings (e.g., network architectures and learning problems), we show in practice that networks violating the MLI property can be produced systematically, by encouraging the weights to move far from initialization. The MLI property raises important questions about the loss landscape geometry of neural networks and highlights the need to further study their global properties.
Author Information
James Lucas (University of Toronto and Vector Institute)
Juhan Bae (University of Toronto and Vector Institute)
Michael Zhang (University of Toronto)
Stanislav Fort (Google AI)
Richard Zemel (Vector Institute)
Roger Grosse (University of Toronto and Vector Institute)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: On Monotonic Linear Interpolation of Neural Network Parameters »
Tue. Jul 20th 01:25 -- 01:30 PM Room
More from the Same Authors
-
2021 : Online Algorithmic Recourse by Collective Action »
Elliot Creager · Richard Zemel -
2022 : Towards Environment-Invariant Representation Learning for Robust Task Transfer »
Benjamin Eyre · Richard Zemel · Elliot Creager -
2022 : Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-training »
Philip Fradkin · Lazar Atanackovic · Michael Zhang -
2023 : Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift »
Benjamin Eyre · Elliot Creager · David Madras · Vardan Papyan · Richard Zemel -
2023 : Statistics estimation in neural network training: a recursive identification approach »
Ruth Crasto · Xuchan Bao · Roger Grosse -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 Test Of Time: Learning Fair Representations »
Richard Zemel · Yu Wu · Kevin Swersky · Toniann Pitassi · Cynthia Dwork -
2023 Poster: Efficient Parametric Approximations of Neural Network Function Space Distance »
Nikita Dhawan · Sicong Huang · Juhan Bae · Roger Grosse -
2022 : Invited talks 3, Q/A, Amy, Rich and Liting »
Liting Sun · Amy Zhang · Richard Zemel -
2022 : Invited talks 3, Amy Zhang, Rich Zemel and Liting Sun »
Amy Zhang · Richard Zemel · Liting Sun -
2022 Poster: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2022 Spotlight: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: SketchEmbedNet: Learning Novel Concepts by Imitating Drawings »
Alexander Wang · Mengye Ren · Richard Zemel -
2021 Poster: Learning a Universal Template for Few-shot Dataset Generalization »
Eleni Triantafillou · Hugo Larochelle · Richard Zemel · Vincent Dumoulin -
2021 Poster: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Spotlight: Learning a Universal Template for Few-shot Dataset Generalization »
Eleni Triantafillou · Hugo Larochelle · Richard Zemel · Vincent Dumoulin -
2021 Spotlight: Environment Inference for Invariant Learning »
Elliot Creager · Joern-Henrik Jacobsen · Richard Zemel -
2021 Spotlight: SketchEmbedNet: Learning Novel Concepts by Imitating Drawings »
Alexander Wang · Mengye Ren · Richard Zemel -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2020 : Invited Talk 4: Prof. Richard Zemel from University of Toronto »
Richard Zemel -
2020 Workshop: Participatory Approaches to Machine Learning »
Angela Zhou · David Madras · Deborah Raji · Smitha Milli · Bogdan Kulynych · Richard Zemel -
2020 Poster: Causal Modeling for Fairness In Dynamical Systems »
Elliot Creager · David Madras · Toniann Pitassi · Richard Zemel -
2020 Poster: Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach »
Martin Mladenov · Elliot Creager · Omer Ben-Porat · Kevin Swersky · Richard Zemel · Craig Boutilier -
2020 Poster: Evaluating Lossy Compression Rates of Deep Generative Models »
Sicong Huang · Alireza Makhzani · Yanshuai Cao · Roger Grosse -
2020 Poster: Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling »
Will Grathwohl · Kuan-Chieh Wang · Joern-Henrik Jacobsen · David Duvenaud · Richard Zemel -
2019 Workshop: Learning and Reasoning with Graph-Structured Representations »
Ethan Fetaya · Zhiting Hu · Thomas Kipf · Yujia Li · Xiaodan Liang · Renjie Liao · Raquel Urtasun · Hao Wang · Max Welling · Eric Xing · Richard Zemel -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: Lorentzian Distance Learning for Hyperbolic Representations »
Marc Law · Renjie Liao · Jake Snell · Richard Zemel -
2019 Poster: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2019 Oral: Lorentzian Distance Learning for Hyperbolic Representations »
Marc Law · Renjie Liao · Jake Snell · Richard Zemel -
2019 Oral: Flexibly Fair Representation Learning by Disentanglement »
Elliot Creager · David Madras · Joern-Henrik Jacobsen · Marissa Weis · Kevin Swersky · Toniann Pitassi · Richard Zemel -
2019 Poster: Understanding the Origins of Bias in Word Embeddings »
Marc-Etienne Brunet · Colleen Alkalay-Houlihan · Ashton Anderson · Richard Zemel -
2019 Poster: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2019 Poster: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: Understanding the Origins of Bias in Word Embeddings »
Marc-Etienne Brunet · Colleen Alkalay-Houlihan · Ashton Anderson · Richard Zemel -
2019 Oral: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2018 Poster: Learning Adversarially Fair and Transferable Representations »
David Madras · Elliot Creager · Toniann Pitassi · Richard Zemel -
2018 Oral: Learning Adversarially Fair and Transferable Representations »
David Madras · Elliot Creager · Toniann Pitassi · Richard Zemel -
2018 Poster: Reviving and Improving Recurrent Back-Propagation »
Renjie Liao · Yuwen Xiong · Ethan Fetaya · Lisa Zhang · Kijung Yoon · Zachary S Pitkow · Raquel Urtasun · Richard Zemel -
2018 Poster: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Poster: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Oral: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Oral: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Oral: Reviving and Improving Recurrent Back-Propagation »
Renjie Liao · Yuwen Xiong · Ethan Fetaya · Lisa Zhang · Kijung Yoon · Zachary S Pitkow · Raquel Urtasun · Richard Zemel -
2018 Poster: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2018 Poster: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse -
2018 Oral: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2018 Oral: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse -
2017 Poster: Deep Spectral Clustering Learning »
Marc Law · Raquel Urtasun · Richard Zemel -
2017 Talk: Deep Spectral Clustering Learning »
Marc Law · Raquel Urtasun · Richard Zemel