Timezone: »
Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learningsuch bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.
Author Information
Lam Nguyen (Lehigh University & IBM T.J. Watson Research Center)
PHUONG_HA NGUYEN (University of Connecticut)
Marten van Dijk (University of Connecticut)
Peter Richtarik (King Abdullah University of Science and Technology (KAUST))
Peter Richtarik is an Associate Professor of Computer Science and Mathematics at KAUST and an Associate Professor of Mathematics at the University of Edinburgh. He is an EPSRC Fellow in Mathematical Sciences, Fellow of the Alan Turing Institute, and is affiliated with the Visual Computing Center and the Extreme Computing Research Center at KAUST. Dr. Richtarik received his PhD from Cornell University in 2007, and then worked as a Postdoctoral Fellow in Louvain, Belgium, before joining Edinburgh in 2009, and KAUST in 2017. Dr. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high performance computing and applied probability. Through his recent work on randomized decomposition algorithms (such as randomized coordinate descent methods, stochastic gradient descent methods and their numerous extensions, improvements and variants), he has contributed to the foundations of the emerging field of big data optimization, randomized numerical linear algebra, and stochastic methods for empirical risk minimization. Several of his papers attracted international awards, including the SIAM SIGEST Best Paper Award, the IMA Leslie Fox Prize (2nd prize, twice), and the INFORMS Computing Society Best Student Paper Award (sole runner up). He is the founder and organizer of the Optimization and Big Data workshop series.
Katya Scheinberg (Lehigh University)
Martin Takac (Lehigh University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: SGD and Hogwild! Convergence Without the Bounded Gradients Assumption »
Wed. Jul 11th 04:15 -- 07:00 PM Room Hall B #116
More from the Same Authors
-
2022 : Fast Convergence for Unstable Reinforcement Learning Problems by Logarithmic Mapping »
Wang Zhang · Lam Nguyen · Subhro Das · Alexandre Megretsky · Luca Daniel · Tsui-Wei Weng -
2023 Poster: ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction »
Wang Zhang · Lily Weng · Subhro Das · Alexandre Megretsky · Luca Daniel · Lam Nguyen -
2022 Poster: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization »
Trang Tran · Katya Scheinberg · Lam Nguyen -
2022 Spotlight: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization »
Trang Tran · Katya Scheinberg · Lam Nguyen -
2022 Poster: The power of first-order smooth optimization for black-box non-smooth problems »
Alexander Gasnikov · Anton Novitskii · Vasilii Novitskii · Farshed Abdukhakimov · Dmitry Kamzolov · Aleksandr Beznosikov · Martin Takac · Pavel Dvurechenskii · Bin Gu -
2022 Spotlight: The power of first-order smooth optimization for black-box non-smooth problems »
Alexander Gasnikov · Anton Novitskii · Vasilii Novitskii · Farshed Abdukhakimov · Dmitry Kamzolov · Aleksandr Beznosikov · Martin Takac · Pavel Dvurechenskii · Bin Gu -
2021 Poster: SMG: A Shuffling Gradient-Based Method with Momentum »
Trang Tran · Lam Nguyen · Quoc Tran-Dinh -
2021 Spotlight: SMG: A Shuffling Gradient-Based Method with Momentum »
Trang Tran · Lam Nguyen · Quoc Tran-Dinh -
2020 Poster: Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization »
Quoc Tran-Dinh · Nhan H Pham · Lam Nguyen -
2019 Poster: Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD »
Marten van Dijk · Lam Nguyen · PHUONG_HA NGUYEN · Dzung Phan -
2019 Poster: PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach »
Tsui-Wei Weng · Pin-Yu Chen · Lam Nguyen · Mark Squillante · Akhilan Boopathy · Ivan Oseledets · Luca Daniel -
2019 Oral: Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD »
Marten van Dijk · Lam Nguyen · PHUONG_HA NGUYEN · Dzung Phan -
2019 Oral: PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach »
Tsui-Wei Weng · Pin-Yu Chen · Lam Nguyen · Mark Squillante · Akhilan Boopathy · Ivan Oseledets · Luca Daniel -
2017 Poster: SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient »
Lam Nguyen · Jie Liu · Katya Scheinberg · Martin Takac -
2017 Talk: SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient »
Lam Nguyen · Jie Liu · Katya Scheinberg · Martin Takac