Timezone: »
We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.
Author Information
Hadi Daneshmand (ETH Zurich)
Jonas Kohler (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
Thomas Hofmann (ETH Zurich)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Escaping Saddles with Stochastic Gradients »
Wed. Jul 11th 12:50 -- 01:10 PM Room A9
More from the Same Authors
-
2021 : This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks »
· Adrian Hoffmann · Claudio Fanconi · Rahul Rade · Jonas Kohler -
2021 : Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces »
Athina Nisioti · Dario Pavllo · Jonas Kohler -
2023 Poster: The Hessian perspective into the Nature of Convolutional Neural Networks »
Sidak Pal Singh · Thomas Hofmann · Bernhard Schölkopf -
2023 Poster: Random Teachers are Good Teachers »
Felix Sarnthein · Gregor Bachmann · Sotiris Anagnostidis · Thomas Hofmann -
2022 Poster: How Tempering Fixes Data Augmentation in Bayesian Neural Networks »
Gregor Bachmann · Lorenzo Noci · Thomas Hofmann -
2022 Oral: How Tempering Fixes Data Augmentation in Bayesian Neural Networks »
Gregor Bachmann · Lorenzo Noci · Thomas Hofmann -
2021 Poster: Uniform Convergence, Adversarial Spheres and a Simple Remedy »
Gregor Bachmann · Seyed Moosavi · Thomas Hofmann -
2021 Spotlight: Uniform Convergence, Adversarial Spheres and a Simple Remedy »
Gregor Bachmann · Seyed Moosavi · Thomas Hofmann -
2021 Spotlight: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2021 Poster: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2020 Poster: Randomized Block-Diagonal Preconditioning for Parallel Learning »
Celestine Mendler-Dünner · Aurelien Lucchi -
2020 Poster: An Accelerated DFO Algorithm for Finite-sum Convex Functions »
Yuwen Chen · Antonio Orvieto · Aurelien Lucchi -
2019 Poster: The Odds are Odd: A Statistical Test for Detecting Adversarial Examples »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2019 Oral: The Odds are Odd: A Statistical Test for Detecting Adversarial Examples »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2018 Poster: A Distributed Second-Order Algorithm You Can Trust »
Celestine Mendler-Dünner · Aurelien Lucchi · Matilde Gargiani · Yatao Bian · Thomas Hofmann · Martin Jaggi -
2018 Oral: A Distributed Second-Order Algorithm You Can Trust »
Celestine Mendler-Dünner · Aurelien Lucchi · Matilde Gargiani · Yatao Bian · Thomas Hofmann · Martin Jaggi -
2018 Poster: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann -
2018 Oral: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann -
2017 Poster: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi -
2017 Talk: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi