Timezone: »
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost ``dimension-free''). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.
Author Information
Chi Jin (UC Berkeley)
Rong Ge (Duke University)
Praneeth Netrapalli (Microsoft Research)
Sham Kakade (University of Washington)
Sham Kakade is a Gordon McKay Professor of Computer Science and Statistics at Harvard University and a co-director of the recently announced Kempner Institute. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped in laying the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods, Conservative Policy Iteration, for reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award (2020), the IBM Pat Goldberg best paper award (in 2007), INFORMS Revenue Management and Pricing Prize (2014). He has been program chair for COLT 2011. Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. in computational neuroscience at the Gatsby Unit at University College London, under the supervision of Peter Dayan. He was a postdoc at the Dept. of Computer Science, University of Pennsylvania , where he broadened his studies to include computational game theory and economics from the guidance of Michael Kearns. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.
Michael Jordan (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: How to Escape Saddle Points Efficiently »
Tue. Aug 8th 08:30 AM -- 12:00 PM Room Gallery #139
More from the Same Authors
-
2021 : A Short Note on the Relationship of Information Gain and Eluder Dimension »
Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2022 : Representation Learning as Finding Necessary and Sufficient Causes »
Yixin Wang · Michael Jordan -
2022 : Robust Calibration with Multi-domain Temperature Scaling »
Yaodong Yu · Stephen Bates · Yi Ma · Michael Jordan -
2023 Poster: Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup »
Muthu Chidambaram · Xiang Wang · Chenwei Wu · Rong Ge -
2023 Poster: Hiding Data Helps: On the Benefits of Masking for Sparse Coding »
Muthu Chidambaram · Chenwei Wu · Yu Cheng · Rong Ge -
2023 Poster: Online Learning in Stackelberg Games with an Omniscient Follower »
Geng Zhao · Banghua Zhu · Jiantao Jiao · Michael Jordan -
2023 Poster: Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression »
Mo Zhou · Rong Ge -
2023 Poster: Federated Conformal Predictors for Distributed Uncertainty Quantification »
Charles Lu · Yaodong Yu · Sai Karimireddy · Michael Jordan · Ramesh Raskar -
2023 Poster: Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization »
Chris Junchi Li · Angela Yuan · Gauthier Gidel · Quanquan Gu · Michael Jordan -
2023 Poster: Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons »
Banghua Zhu · Michael Jordan · Jiantao Jiao -
2022 : Michael I. Jordan: Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control »
Michael Jordan -
2022 Poster: Online Algorithms with Multiple Predictions »
Keerti Anand · Rong Ge · Amit Kumar · Debmalya Panigrahi -
2022 Poster: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2022 Spotlight: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2022 Spotlight: Online Algorithms with Multiple Predictions »
Keerti Anand · Rong Ge · Amit Kumar · Debmalya Panigrahi -
2022 Poster: Extracting Latent State Representations with Linear Dynamics from Rich Observations »
Abraham Frandsen · Rong Ge · Holden Lee -
2022 Spotlight: Extracting Latent State Representations with Linear Dynamics from Rich Observations »
Abraham Frandsen · Rong Ge · Holden Lee -
2022 Poster: Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging »
Anastasios Angelopoulos · Amit Pal Kohli · Stephen Bates · Michael Jordan · Jitendra Malik · Thayer Alshaabi · Srigokul Upadhyayula · Yaniv Romano -
2022 Poster: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging »
Anastasios Angelopoulos · Amit Pal Kohli · Stephen Bates · Michael Jordan · Jitendra Malik · Thayer Alshaabi · Srigokul Upadhyayula · Yaniv Romano -
2022 Spotlight: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data »
Esther Rolf · Theodora Worledge · Benjamin Recht · Michael Jordan -
2021 Poster: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Spotlight: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Oral: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data »
Esther Rolf · Theodora Worledge · Benjamin Recht · Michael Jordan -
2021 Poster: Guarantees for Tuning the Step Size using a Learning-to-Learn Approach »
Xiang Wang · Shuai Yuan · Chenwei Wu · Rong Ge -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: Guarantees for Tuning the Step Size using a Learning-to-Learn Approach »
Xiang Wang · Shuai Yuan · Chenwei Wu · Rong Ge -
2021 Spotlight: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2020 : QA for invited talk 8 Kakade »
Sham Kakade -
2020 : Invited talk 8 Kakade »
Sham Kakade -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 : Exploration, Policy Gradient Methods, and the Deadly Triad - Sham Kakade »
Sham Kakade -
2020 Poster: Soft Threshold Weight Reparameterization for Learnable Sparsity »
Aditya Kusupati · Vivek Ramanujan · Raghav Somani · Mitchell Wortsman · Prateek Jain · Sham Kakade · Ali Farhadi -
2020 Poster: On Thompson Sampling with Langevin Algorithms »
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: High-dimensional Robust Mean Estimation via Gradient Descent »
Yu Cheng · Ilias Diakonikolas · Rong Ge · Mahdi Soltanolkotabi -
2020 Poster: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems »
Tianyi Lin · Chi Jin · Michael Jordan -
2020 Poster: Calibration, Entropy Rates, and Memory in Language Models »
Mark Braverman · Xinyi Chen · Sham Kakade · Karthik Narasimhan · Cyril Zhang · Yi Zhang -
2020 Poster: The Implicit and Explicit Regularization Effects of Dropout »
Colin Wei · Sham Kakade · Tengyu Ma -
2020 Poster: Continuous-time Lower Bounds for Gradient-based Algorithms »
Michael Muehlebach · Michael Jordan -
2020 Poster: Provable Representation Learning for Imitation Learning via Bi-level Optimization »
Sanjeev Arora · Simon Du · Sham Kakade · Yuping Luo · Nikunj Umesh Saunshi -
2020 Poster: Stochastic Gradient and Langevin Processes »
Xiang Cheng · Dong Yin · Peter Bartlett · Michael Jordan -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games »
Tianyi Lin · Zhengyuan Zhou · Panayotis Mertikopoulos · Michael Jordan -
2020 Poster: Meta-learning for Mixed Linear Regression »
Weihao Kong · Raghav Somani · Zhao Song · Sham Kakade · Sewoong Oh -
2020 Poster: What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? »
Chi Jin · Praneeth Netrapalli · Michael Jordan -
2020 Poster: Efficient Domain Generalization via Common-Specific Low-Rank Decomposition »
Vihari Piratla · Praneeth Netrapalli · Sunita Sarawagi -
2020 Poster: Customizing ML Predictions for Online Algorithms »
Keerti Anand · Rong Ge · Debmalya Panigrahi -
2020 Test Of Time: Test of Time: Gaussian Process Optimization in the Bandit Settings: No Regret and Experimental Design »
Niranjan Srinivas · Andreas Krause · Sham Kakade · Matthias Seeger -
2019 : Keynote by Sham Kakade: Prediction, Learning, and Memory »
Sham Kakade -
2019 Poster: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Poster: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Poster: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Poster: SGD without Replacement: Sharper Rates for General Smooth Convex Functions »
Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2019 Poster: A Dynamical Systems Perspective on Nesterov Acceleration »
Michael Muehlebach · Michael Jordan -
2019 Poster: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2019 Poster: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: Provably Efficient Maximum Entropy Exploration »
Elad Hazan · Sham Kakade · Karan Singh · Abby Van Soest -
2019 Oral: A Dynamical Systems Perspective on Nesterov Acceleration »
Michael Muehlebach · Michael Jordan -
2019 Oral: SGD without Replacement: Sharper Rates for General Smooth Convex Functions »
Dheeraj Nagaraj · Prateek Jain · Praneeth Netrapalli -
2019 Oral: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Oral: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Oral: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2019 Poster: On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms »
Tianyi Lin · Nhat Ho · Michael Jordan -
2019 Poster: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Poster: Rao-Blackwellized Stochastic Gradients for Discrete Distributions »
Runjing Liu · Jeffrey Regier · Nilesh Tripuraneni · Michael Jordan · Jon McAuliffe -
2019 Poster: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Rao-Blackwellized Stochastic Gradients for Discrete Distributions »
Runjing Liu · Jeffrey Regier · Nilesh Tripuraneni · Michael Jordan · Jon McAuliffe -
2019 Oral: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Oral: On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms »
Tianyi Lin · Nhat Ho · Michael Jordan -
2018 Poster: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Poster: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Poster: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Oral: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator »
Maryam Fazel · Rong Ge · Sham Kakade · Mehran Mesbahi -
2018 Oral: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: SAFFRON: an Adaptive Algorithm for Online Control of the False Discovery Rate »
Aaditya Ramdas · Tijana Zrnic · Martin Wainwright · Michael Jordan -
2018 Poster: Stronger Generalization Bounds for Deep Nets via a Compression Approach »
Sanjeev Arora · Rong Ge · Behnam Neyshabur · Yi Zhang -
2018 Oral: SAFFRON: an Adaptive Algorithm for Online Control of the False Discovery Rate »
Aaditya Ramdas · Tijana Zrnic · Martin Wainwright · Michael Jordan -
2018 Oral: Stronger Generalization Bounds for Deep Nets via a Compression Approach »
Sanjeev Arora · Rong Ge · Behnam Neyshabur · Yi Zhang -
2018 Poster: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2018 Oral: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2017 Workshop: Principled Approaches to Deep Learning »
Andrzej Pronobis · Robert Gens · Sham Kakade · Pedro Domingos -
2017 Poster: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2017 Poster: Breaking Locality Accelerates Block Gauss-Seidel »
Stephen Tu · Shivaram Venkataraman · Ashia Wilson · Alex Gittens · Michael Jordan · Benjamin Recht -
2017 Poster: No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis »
Rong Ge · Chi Jin · Yi Zheng -
2017 Poster: Generalization and Equilibrium in Generative Adversarial Nets (GANs) »
Sanjeev Arora · Rong Ge · Yingyu Liang · Tengyu Ma · Yi Zhang -
2017 Talk: No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis »
Rong Ge · Chi Jin · Yi Zheng -
2017 Talk: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2017 Talk: Breaking Locality Accelerates Block Gauss-Seidel »
Stephen Tu · Shivaram Venkataraman · Ashia Wilson · Alex Gittens · Michael Jordan · Benjamin Recht -
2017 Talk: Generalization and Equilibrium in Generative Adversarial Nets (GANs) »
Sanjeev Arora · Rong Ge · Yingyu Liang · Tengyu Ma · Yi Zhang