Timezone: »
Oral
A Distributed Second-Order Algorithm You Can Trust
Celestine Mendler-Dünner · Aurelien Lucchi · Matilde Gargiani · Yatao Bian · Thomas Hofmann · Martin Jaggi
Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the computation and the communication of a large Hessian matrix. In this paper we present a new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers. To deal with this approximate information we propose an adaptive approach that - akin to trust-region methods - dynamically adapts the auxiliary model to compensate for modeling errors. We provide theoretical rates of convergence for a wide class of problems including $L_1$-regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.
Author Information
Celestine Mendler-Dünner (IBM Research)
Aurelien Lucchi (ETH Zurich)
Matilde Gargiani (University of Freiburg)
Yatao Bian (ETH Zürich)
Thomas Hofmann (ETH Zurich)
Martin Jaggi (EPFL)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: A Distributed Second-Order Algorithm You Can Trust »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #219
More from the Same Authors
-
2021 : iFedAvg – Interpretable Data-Interoperability for Federated Learning »
David Roschewitz · Mary-Anne Hartley · Luca Corinzia · Martin Jaggi -
2022 : The Gap Between Continuous and Discrete Gradient Descent »
Amirkeivan Mohtashami · Martin Jaggi · Sebastian Stich -
2023 : Layerwise Linear Mode Connectivity »
Linara Adilova · Asja Fischer · Martin Jaggi -
2023 : Landmark Attention: Random-Access Infinite Context Length for Transformers »
Amirkeivan Mohtashami · Martin Jaggi -
2023 : 🎤 Fast Causal Attention with Dynamic Sparsity »
Daniele Paliotta · Matteo Pagliardini · Martin Jaggi · François Fleuret -
2023 Oral: Second-Order Optimization with Lazy Hessians »
Nikita Doikov · El Mahdi Chayti · Martin Jaggi -
2023 Poster: The Hessian perspective into the Nature of Convolutional Neural Networks »
Sidak Pal Singh · Thomas Hofmann · Bernhard Schölkopf -
2023 Poster: Second-Order Optimization with Lazy Hessians »
Nikita Doikov · El Mahdi Chayti · Martin Jaggi -
2023 Poster: Special Properties of Gradient Descent with Large Learning Rates »
Amirkeivan Mohtashami · Martin Jaggi · Sebastian Stich -
2023 Poster: Random Teachers are Good Teachers »
Felix Sarnthein · Gregor Bachmann · Sotiris Anagnostidis · Thomas Hofmann -
2022 Poster: How Tempering Fixes Data Augmentation in Bayesian Neural Networks »
Gregor Bachmann · Lorenzo Noci · Thomas Hofmann -
2022 Oral: How Tempering Fixes Data Augmentation in Bayesian Neural Networks »
Gregor Bachmann · Lorenzo Noci · Thomas Hofmann -
2021 : Exact Optimization of Conformal Predictors via Incremental and Decremental Learning (Spotlight #13) »
Giovanni Cherubin · Konstantinos Chatzikokolakis · Martin Jaggi -
2021 Poster: Uniform Convergence, Adversarial Spheres and a Simple Remedy »
Gregor Bachmann · Seyed Moosavi · Thomas Hofmann -
2021 Poster: Exact Optimization of Conformal Predictors via Incremental and Decremental Learning »
Giovanni Cherubin · Konstantinos Chatzikokolakis · Martin Jaggi -
2021 Spotlight: Uniform Convergence, Adversarial Spheres and a Simple Remedy »
Gregor Bachmann · Seyed Moosavi · Thomas Hofmann -
2021 Poster: Consensus Control for Decentralized Deep Learning »
Lingjing Kong · Tao Lin · Anastasiia Koloskova · Martin Jaggi · Sebastian Stich -
2021 Poster: Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data »
Tao Lin · Sai Praneeth Reddy Karimireddy · Sebastian Stich · Martin Jaggi -
2021 Spotlight: Exact Optimization of Conformal Predictors via Incremental and Decremental Learning »
Giovanni Cherubin · Konstantinos Chatzikokolakis · Martin Jaggi -
2021 Spotlight: Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data »
Tao Lin · Sai Praneeth Reddy Karimireddy · Sebastian Stich · Martin Jaggi -
2021 Spotlight: Consensus Control for Decentralized Deep Learning »
Lingjing Kong · Tao Lin · Anastasiia Koloskova · Martin Jaggi · Sebastian Stich -
2021 Poster: Learning from History for Byzantine Robust Optimization »
Sai Praneeth Reddy Karimireddy · Lie He · Martin Jaggi -
2021 Spotlight: Learning from History for Byzantine Robust Optimization »
Sai Praneeth Reddy Karimireddy · Lie He · Martin Jaggi -
2021 Spotlight: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2021 Poster: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2020 Poster: Extrapolation for Large-batch Training in Deep Learning »
Tao Lin · Lingjing Kong · Sebastian Stich · Martin Jaggi -
2020 Poster: Randomized Block-Diagonal Preconditioning for Parallel Learning »
Celestine Mendler-Dünner · Aurelien Lucchi -
2020 Poster: Optimizer Benchmarking Needs to Account for Hyperparameter Tuning »
Prabhu Teja Sivaprasad · Florian Mai · Thijs Vogels · Martin Jaggi · François Fleuret -
2020 Poster: A Unified Theory of Decentralized SGD with Changing Topology and Local Updates »
Anastasiia Koloskova · Nicolas Loizou · Sadra Boreiri · Martin Jaggi · Sebastian Stich -
2020 Poster: An Accelerated DFO Algorithm for Finite-sum Convex Functions »
Yuwen Chen · Antonio Orvieto · Aurelien Lucchi -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Poster: Overcoming Multi-model Forgetting »
Yassine Benyahia · Kaicheng Yu · Kamil Bennani-Smires · Martin Jaggi · Anthony C. Davison · Mathieu Salzmann · Claudiu Musat -
2019 Poster: The Odds are Odd: A Statistical Test for Detecting Adversarial Examples »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2019 Oral: Overcoming Multi-model Forgetting »
Yassine Benyahia · Kaicheng Yu · Kamil Bennani-Smires · Martin Jaggi · Anthony C. Davison · Mathieu Salzmann · Claudiu Musat -
2019 Oral: The Odds are Odd: A Statistical Test for Detecting Adversarial Examples »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2019 Poster: Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication »
Anastasiia Koloskova · Sebastian Stich · Martin Jaggi -
2019 Poster: Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference »
Yatao Bian · Joachim Buhmann · Andreas Krause -
2019 Poster: Error Feedback Fixes SignSGD and other Gradient Compression Schemes »
Sai Praneeth Reddy Karimireddy · Quentin Rebjock · Sebastian Stich · Martin Jaggi -
2019 Oral: Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication »
Anastasiia Koloskova · Sebastian Stich · Martin Jaggi -
2019 Oral: Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference »
Yatao Bian · Joachim Buhmann · Andreas Krause -
2019 Oral: Error Feedback Fixes SignSGD and other Gradient Compression Schemes »
Sai Praneeth Reddy Karimireddy · Quentin Rebjock · Sebastian Stich · Martin Jaggi -
2018 Poster: On Matching Pursuit and Coordinate Descent »
Francesco Locatello · Anant Raj · Sai Praneeth Reddy Karimireddy · Gunnar Ratsch · Bernhard Schölkopf · Sebastian Stich · Martin Jaggi -
2018 Oral: On Matching Pursuit and Coordinate Descent »
Francesco Locatello · Anant Raj · Sai Praneeth Reddy Karimireddy · Gunnar Ratsch · Bernhard Schölkopf · Sebastian Stich · Martin Jaggi -
2018 Poster: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2018 Poster: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann -
2018 Oral: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2018 Oral: Hyperbolic Entailment Cones for Learning Hierarchical Embeddings »
Octavian-Eugen Ganea · Gary Becigneul · Thomas Hofmann -
2017 Poster: Guarantees for Greedy Maximization of Non-submodular Functions with Applications »
Yatao Bian · Joachim Buhmann · Andreas Krause · Sebastian Tschiatschek -
2017 Talk: Guarantees for Greedy Maximization of Non-submodular Functions with Applications »
Yatao Bian · Joachim Buhmann · Andreas Krause · Sebastian Tschiatschek -
2017 Poster: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi -
2017 Poster: Approximate Steepest Coordinate Descent »
Sebastian Stich · Anant Raj · Martin Jaggi -
2017 Talk: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi -
2017 Talk: Approximate Steepest Coordinate Descent »
Sebastian Stich · Anant Raj · Martin Jaggi