Timezone: »
How Does Adaptive Optimization Impact Local Neural Network Geometry?
Kaiqi Jiang · Dhruv Malik · Yuanzhi Li
Adaptive optimization methods are well known to achieve superior convergence relative to vanilla gradient methods. The traditional viewpoint explains this improved performance by arguing that adaptive algorithms mimic the behavior of a second-order method by adapting to the global geometry of the loss function. We argue that in the context of neural network optimization, this traditional viewpoint is insufficient. Instead, we advocate for a local trajectory analysis. For iterate trajectories produced by running a generic optimization algorithm OPT, we introduce $R^{\text{OPT}}_{\text{med}}$, a statistic that is analogous to the condition number of the loss Hessian evaluated at the iterates. Through extensive experiments on language models, we show that adaptive methods such as Adam bias the trajectories towards regions where $R^{\text{Adam}}_{\text{med}}$ is small, where one might expect faster optimization. By contrast, SGD (with momentum) biases the trajectories towards regions where $R^{\text{SGD}}_{\text{med}}$ is comparatively large. We complement these empirical observations with a theoretical result that provably demonstrates this phenomenon in the simplified setting of a two-layer linear network.
Author Information
Kaiqi Jiang (Princeton University)
Dhruv Malik (Carnegie Mellon University)
Yuanzhi Li (CMU)
More from the Same Authors
-
2021 : When Is Generalizable Reinforcement Learning Tractable? »
Dhruv Malik · Yuanzhi Li · Pradeep Ravikumar -
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2023 : Plan, Eliminate, and Track --- Language Models are Good Teachers for Embodied Agents. »
Yue Wu · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Yuanzhi Li · Tom Mitchell · Shrimai Prabhumoye -
2023 : SPRING: Studying Papers and Reasoning to play Games »
Yue Wu · Shrimai Prabhumoye · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Tom Mitchell · Yuanzhi Li -
2023 : How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding »
Yuchen Li · Yuanzhi Li · Andrej Risteski -
2023 Poster: Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality »
Dhruv Malik · Conor Igoe · Yuanzhi Li · Aarti Singh -
2023 Poster: How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding »
Yuchen Li · Yuanzhi Li · Andrej Risteski -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2022 Poster: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2022 Spotlight: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2021 : Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li -
2021 Spotlight: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li