Timezone: »
An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.
Author Information
Yang Song (Stanford University)
Jiaming Song (Stanford)
Stefano Ermon (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Accelerating Natural Gradient with Higher-Order Invariance »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #70
More from the Same Authors
-
2022 : Transform Once: Efficient Operator Learning in Frequency Domain »
Michael Poli · Stefano Massaroli · Federico Berto · Jinkyoo Park · Tri Dao · Christopher Re · Stefano Ermon -
2023 : The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-language Models »
Chenwei Wu · Li Li · Stefano Ermon · Patrick Haffner · Rong Ge · Zaiwei Zhang -
2023 : Parallel Sampling of Diffusion Models »
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari -
2023 : On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization »
Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Naoki Murata · Yuki Mitsufuji · Stefano Ermon -
2023 : Parallel Sampling of Diffusion Models »
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari -
2023 : Direct Preference Optimization: Your Language Model is Secretly a Reward Model »
Rafael Rafailov · Archit Sharma · Eric Mitchell · Stefano Ermon · Christopher Manning · Chelsea Finn -
2023 : Invited Talk by Stefano Ermon »
Stefano Ermon -
2023 Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators »
Felix Petersen · Marco Cuturi · Mathias Niepert · Hilde Kuehne · Michael Kagan · Willie Neiswanger · Stefano Ermon -
2023 Oral: Hyena Hierarchy: Towards Larger Convolutional Language Models »
Michael Poli · Stefano Massaroli · Eric Nguyen · Daniel Y Fu · Tri Dao · Stephen Baccus · Yoshua Bengio · Stefano Ermon · Christopher Re -
2023 Poster: Geometric Latent Diffusion Models for 3D Molecule Generation »
Minkai Xu · Alexander Powers · Ron Dror · Stefano Ermon · Jure Leskovec -
2023 Poster: Reflected Diffusion Models »
Aaron Lou · Stefano Ermon -
2023 Poster: Long Horizon Temperature Scaling »
Andy Shih · Dorsa Sadigh · Stefano Ermon -
2023 Poster: Hyena Hierarchy: Towards Larger Convolutional Language Models »
Michael Poli · Stefano Massaroli · Eric Nguyen · Daniel Y Fu · Tri Dao · Stephen Baccus · Yoshua Bengio · Stefano Ermon · Christopher Re -
2023 Poster: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration »
Naoki Murata · Koichi Saito · Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Poster: FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation »
Chieh-Hsin Lai · Yuhta Takida · Naoki Murata · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Poster: Deep Latent State Space Models for Time-Series Generation »
Linqi Zhou · Michael Poli · Winnie Xu · Stefano Massaroli · Stefano Ermon -
2023 Oral: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration »
Naoki Murata · Koichi Saito · Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Poster: CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations »
Gengchen Mai · Ni Lao · Yutong He · Jiaming Song · Stefano Ermon -
2022 : FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness »
Tri Dao · Daniel Y Fu · Stefano Ermon · Atri Rudra · Christopher Re -
2022 : Generative Modeling with Stochastic Differential Equations »
Stefano Ermon -
2022 : Neural Geometric Embedding Flows »
Aaron Lou · Yang Song · Jiaming Song · Stefano Ermon -
2022 Workshop: Adaptive Experimental Design and Active Learning in the Real World »
Mojmir Mutny · Willie Neiswanger · Ilija Bogunovic · Stefano Ermon · Yisong Yue · Andreas Krause -
2022 Poster: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 Spotlight: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 Poster: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Oral: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Poster: ButterflyFlow: Building Invertible Layers with Butterfly Matrices »
Chenlin Meng · Linqi Zhou · Kristy Choi · Tri Dao · Stefano Ermon -
2022 Poster: Bit Prioritization in Variational Autoencoders via Progressive Coding »
Rui Shu · Stefano Ermon -
2022 Poster: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2022 Spotlight: Bit Prioritization in Variational Autoencoders via Progressive Coding »
Rui Shu · Stefano Ermon -
2022 Spotlight: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2022 Spotlight: ButterflyFlow: Building Invertible Layers with Butterfly Matrices »
Chenlin Meng · Linqi Zhou · Kristy Choi · Tri Dao · Stefano Ermon -
2021 : Invited Talk 5 (Stefano Ermon): Maximum Likelihood Training of Score-Based Diffusion Models »
Stefano Ermon -
2021 Poster: Temporal Predictive Coding For Model-Based Planning In Latent Space »
Tung Nguyen · Rui Shu · Tuan Pham · Hung Bui · Stefano Ermon -
2021 Spotlight: Temporal Predictive Coding For Model-Based Planning In Latent Space »
Tung Nguyen · Rui Shu · Tuan Pham · Hung Bui · Stefano Ermon -
2021 Poster: Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information »
Willie Neiswanger · Ke Alexander Wang · Stefano Ermon -
2021 Spotlight: Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information »
Willie Neiswanger · Ke Alexander Wang · Stefano Ermon -
2021 Poster: Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving »
Yang Song · Chenlin Meng · Renjie Liao · Stefano Ermon -
2021 Spotlight: Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving »
Yang Song · Chenlin Meng · Renjie Liao · Stefano Ermon -
2021 Poster: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2021 Spotlight: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2020 Poster: Predictive Coding for Locally-Linear Control »
Rui Shu · Tung Nguyen · Yinlam Chow · Tuan Pham · Khoat Than · Mohammad Ghavamzadeh · Stefano Ermon · Hung Bui -
2020 Poster: Bridging the Gap Between f-GANs and Wasserstein GANs »
Jiaming Song · Stefano Ermon -
2020 Poster: Individual Calibration with Randomized Forecasting »
Shengjia Zhao · Tengyu Ma · Stefano Ermon -
2020 Poster: Domain Adaptive Imitation Learning »
Kuno Kim · Yihong Gu · Jiaming Song · Shengjia Zhao · Stefano Ermon -
2020 Poster: Training Deep Energy-Based Models with f-Divergence Minimization »
Lantao Yu · Yang Song · Jiaming Song · Stefano Ermon -
2020 Poster: Fair Generative Modeling via Weak Supervision »
Kristy Choi · Aditya Grover · Trisha Singh · Rui Shu · Stefano Ermon -
2019 Poster: Calibrated Model-Based Deep Reinforcement Learning »
Ali Malik · Volodymyr Kuleshov · Jiaming Song · Danny Nemer · Harlan Seymour · Stefano Ermon -
2019 Poster: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Poster: Adaptive Antithetic Sampling for Variance Reduction »
Hongyu Ren · Shengjia Zhao · Stefano Ermon -
2019 Oral: Adaptive Antithetic Sampling for Variance Reduction »
Hongyu Ren · Shengjia Zhao · Stefano Ermon -
2019 Oral: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Oral: Calibrated Model-Based Deep Reinforcement Learning »
Ali Malik · Volodymyr Kuleshov · Jiaming Song · Danny Nemer · Harlan Seymour · Stefano Ermon -
2019 Poster: Multi-Agent Adversarial Inverse Reinforcement Learning »
Lantao Yu · Jiaming Song · Stefano Ermon -
2019 Poster: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Oral: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Oral: Multi-Agent Adversarial Inverse Reinforcement Learning »
Lantao Yu · Jiaming Song · Stefano Ermon -
2018 Poster: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Oral: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Poster: Accurate Uncertainties for Deep Learning Using Calibrated Regression »
Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon -
2018 Oral: Accurate Uncertainties for Deep Learning Using Calibrated Regression »
Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon -
2017 Poster: Learning Hierarchical Features from Deep Generative Models »
Shengjia Zhao · Jiaming Song · Stefano Ermon -
2017 Talk: Learning Hierarchical Features from Deep Generative Models »
Shengjia Zhao · Jiaming Song · Stefano Ermon