Timezone: »
In machine learning applications, it is well known that carefully designed learning rate (step size) schedules can significantly improve the convergence of commonly used first-order optimization algorithms. Therefore how to set step size adaptively becomes an important research question. A popular and effective method is the Polyak step size, which sets step size adaptively for gradient descent or stochastic gradient descent without the need to estimate the smoothness parameter of the objective function. However, there has not been a principled way to generalize the Polyak step size for algorithms with momentum accelerations. This paper presents a general framework to set the learning rate adaptively for first-order optimization methods with momentum, motivated by the derivation of Polyak step size. It is shown that the resulting techniques are much less sensitive to the choice of momentum parameter and may avoid the oscillation of the heavy-ball method on ill-conditioned problems. These adaptive step sizes are further extended to the stochastic settings, which are attractive choices for stochastic gradient descent with momentum. Our methods are demonstrated to be more effective for stochastic gradient methods than prior adaptive step size algorithms in large-scale machine learning tasks.
Author Information
Xiaoyu Wang (Hong Kong University of Science and Technology)
Mikael Johansson (KTH Royal Institute of Technology)
Tong Zhang (HKUST)

Tong Zhang is a professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research interests are machine learning, big data and their applications. He obtained a BA in Mathematics and Computer Science from Cornell University, and a PhD in Computer Science from Stanford University. Before joining HKUST, Tong Zhang was a professor at Rutgers University, and worked previously at IBM, Yahoo as research scientists, Baidu as the director of Big Data Lab, and Tencent as the founding director of AI Lab. Tong Zhang was an ASA fellow and IMS fellow, and has served as the chair or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has served as associate editors in top machine learning journals such as PAMI, JMLR, and Machine Learning Journal.
More from the Same Authors
-
2021 : Efficient Exploration by HyperDQN in Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Hao Liang · Tong Zhang -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? »
Rui Yang · Yong LIN · Xiaoteng Ma · Hao Hu · Chongjie Zhang · Tong Zhang -
2023 Poster: Learning in POMDPs is Sample-Efficient with Hindsight Observability »
Jonathan Lee · Alekh Agarwal · Christoph Dann · Tong Zhang -
2023 Poster: Delay-agnostic Asynchronous Coordinate Update Algorithm »
Xuyang Wu · Changxin Liu · Sindri Magnússon · Mikael Johansson -
2023 Poster: On the Convergence of Federated Averaging with Cyclic Client Participation »
Yae Jee Cho · PRANAY SHARMA · Gauri Joshi · Zheng Xu · Satyen Kale · Tong Zhang -
2023 Poster: Weakly Supervised Disentangled Generative Causal Representation Learning »
Xinwei Shen · Furui Liu · Hanze Dong · Qing Lian · Zhitang Chen · Tong Zhang -
2023 Poster: Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes »
Chenlu Ye · Wei Xiong · Quanquan Gu · Tong Zhang -
2022 Poster: Delay-Adaptive Step-sizes for Asynchronous Learning »
Xuyang Wu · Sindri Magnússon · Hamid Reza Feyzmahdavian · Mikael Johansson -
2022 Poster: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: Delay-Adaptive Step-sizes for Asynchronous Learning »
Xuyang Wu · Sindri Magnússon · Hamid Reza Feyzmahdavian · Mikael Johansson -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Spotlight: Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 Poster: A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization »
Renzhe Xu · Xingxuan Zhang · Zheyan Shen · Tong Zhang · Peng Cui -
2022 Poster: Sparse Invariant Risk Minimization »
Xiao Zhou · Yong LIN · Weizhong Zhang · Tong Zhang -
2022 Poster: Model Agnostic Sample Reweighting for Out-of-Distribution Learning »
Xiao Zhou · Yong LIN · Renjie Pi · Weizhong Zhang · Renzhe Xu · Peng Cui · Tong Zhang -
2022 Poster: Probabilistic Bilevel Coreset Selection »
Xiao Zhou · Renjie Pi · Weizhong Zhang · Yong LIN · Zonghao Chen · Tong Zhang -
2022 Spotlight: A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization »
Renzhe Xu · Xingxuan Zhang · Zheyan Shen · Tong Zhang · Peng Cui -
2022 Spotlight: Probabilistic Bilevel Coreset Selection »
Xiao Zhou · Renjie Pi · Weizhong Zhang · Yong LIN · Zonghao Chen · Tong Zhang -
2022 Spotlight: Model Agnostic Sample Reweighting for Out-of-Distribution Learning »
Xiao Zhou · Yong LIN · Renjie Pi · Weizhong Zhang · Renzhe Xu · Peng Cui · Tong Zhang -
2022 Spotlight: Sparse Invariant Risk Minimization »
Xiao Zhou · Yong LIN · Weizhong Zhang · Tong Zhang -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness »
Vien Mai · Mikael Johansson -
2021 Oral: Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness »
Vien Mai · Mikael Johansson -
2020 Poster: Anderson Acceleration of Proximal Gradient Methods »
Vien Mai · Mikael Johansson -
2020 Poster: Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization »
Vien Mai · Mikael Johansson -
2020 Poster: Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization »
Rie Johnson · Tong Zhang -
2019 Poster: $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression »
Hanlin Tang · Chen Yu · Xiangru Lian · Tong Zhang · Ji Liu -
2019 Oral: $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression »
Hanlin Tang · Chen Yu · Xiangru Lian · Tong Zhang · Ji Liu -
2019 Poster: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2019 Poster: Curvature-Exploiting Acceleration of Elastic Net Computations »
Vien Mai · Mikael Johansson -
2019 Oral: Curvature-Exploiting Acceleration of Elastic Net Computations »
Vien Mai · Mikael Johansson -
2019 Oral: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2019 Tutorial: Causal Inference and Stable Learning »
Tong Zhang · Peng Cui -
2018 Poster: An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method »
Li Shen · Peng Sun · Yitong Wang · Wei Liu · Tong Zhang -
2018 Poster: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Poster: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method »
Li Shen · Peng Sun · Yitong Wang · Wei Liu · Tong Zhang -
2018 Oral: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Poster: Graphical Nonconvex Optimization via an Adaptive Convex Relaxation »
Qiang Sun · Kean Ming Tan · Han Liu · Tong Zhang -
2018 Poster: Composite Functional Gradient Learning of Generative Adversarial Models »
Rie Johnson · Tong Zhang -
2018 Poster: Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization »
Jiaxiang Wu · Weidong Huang · Junzhou Huang · Tong Zhang -
2018 Oral: Graphical Nonconvex Optimization via an Adaptive Convex Relaxation »
Qiang Sun · Kean Ming Tan · Han Liu · Tong Zhang -
2018 Oral: Composite Functional Gradient Learning of Generative Adversarial Models »
Rie Johnson · Tong Zhang -
2018 Oral: Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization »
Jiaxiang Wu · Weidong Huang · Junzhou Huang · Tong Zhang -
2018 Poster: Safe Element Screening for Submodular Function Minimization »
Weizhong Zhang · Bin Hong · Lin Ma · Wei Liu · Tong Zhang -
2018 Poster: End-to-end Active Object Tracking via Reinforcement Learning »
Wenhan Luo · Peng Sun · Fangwei Zhong · Wei Liu · Tong Zhang · Yizhou Wang -
2018 Oral: End-to-end Active Object Tracking via Reinforcement Learning »
Wenhan Luo · Peng Sun · Fangwei Zhong · Wei Liu · Tong Zhang · Yizhou Wang -
2018 Oral: Safe Element Screening for Submodular Function Minimization »
Weizhong Zhang · Bin Hong · Lin Ma · Wei Liu · Tong Zhang -
2017 Poster: Projection-free Distributed Online Learning in Networks »
Wenpeng Zhang · Peilin Zhao · Wenwu Zhu · Steven Hoi · Tong Zhang -
2017 Talk: Projection-free Distributed Online Learning in Networks »
Wenpeng Zhang · Peilin Zhao · Wenwu Zhu · Steven Hoi · Tong Zhang -
2017 Poster: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang -
2017 Talk: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang