Timezone: »
Poster
A Kernel-Based View of Language Model Fine-Tuning
Sadhika Malladi · Alexander Wettig · Dingli Yu · Danqi Chen · Sanjeev Arora
It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings. There is minimal theoretical understanding of empirical success, e.g., why fine-tuning a model with $10^8$ or more parameters on a couple dozen training points does not result in overfitting. We investigate whether the Neural Tangent Kernel (NTK)---which originated as a model to study the gradient descent dynamics of infinitely wide networks with suitable random initialization---describes fine-tuning of pre-trained LMs. This study was inspired by the decent performance of NTK for computer vision tasks (Wei et al., 2022). We extend the NTK formalism to Adam and use Tensor Programs (Yang, 2020) to characterize conditions under which the NTK lens may describe fine-tuning updates to pre-trained language models. Extensive experiments on 14 NLP tasks validate our theory and show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning. Finally, we use this kernel view to propose an explanation for the success of parameter-efficient subspace-based fine-tuning methods.
Author Information
Sadhika Malladi (Princeton University)
Alexander Wettig (Princeton University)
Dingli Yu (Princeton University)
Danqi Chen (Princeton University)
Sanjeev Arora (Princeton University)
More from the Same Authors
-
2023 : Scaling In-Context Demonstrations with Structured Attention »
Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen -
2023 : Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 : 🎤 Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 : High-dimensional Optimization in the Age of ChatGPT, Sanjeev Arora »
Sanjeev Arora -
2023 Poster: Task-Specific Skill Localization in Fine-tuned Language Models »
Abhishek Panigrahi · Nikunj Saunshi · Haoyu Zhao · Sanjeev Arora -
2022 : On the SDEs and Scaling Rules for Adaptive Gradient Algorithms »
Sadhika Malladi · Kaifeng Lyu · Abhishek Panigrahi · Sanjeev Arora -
2022 : Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence toMirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Understanding Gradient Descent on the Edge of Stability in Deep Learning »
Sanjeev Arora · Zhiyuan Li · Abhishek Panigrahi -
2022 Spotlight: Understanding Gradient Descent on the Edge of Stability in Deep Learning »
Sanjeev Arora · Zhiyuan Li · Abhishek Panigrahi -
2020 Poster: Provable Representation Learning for Imitation Learning via Bi-level Optimization »
Sanjeev Arora · Simon Du · Sham Kakade · Yuping Luo · Nikunj Umesh Saunshi -
2020 Poster: InstaHide: Instance-hiding Schemes for Private Distributed Learning »
Yangsibo Huang · Zhao Song · Kai Li · Sanjeev Arora -
2020 Poster: A Sample Complexity Separation between Non-Convex and Convex Meta-Learning »
Nikunj Umesh Saunshi · Yi Zhang · Mikhail Khodak · Sanjeev Arora -
2019 : Is Optimization a sufficient language to understand Deep Learning? »
Sanjeev Arora -
2019 Poster: A Theoretical Analysis of Contrastive Unsupervised Representation Learning »
Nikunj Umesh Saunshi · Orestis Plevrakis · Sanjeev Arora · Mikhail Khodak · Hrishikesh Khandeparkar -
2019 Oral: A Theoretical Analysis of Contrastive Unsupervised Representation Learning »
Nikunj Umesh Saunshi · Orestis Plevrakis · Sanjeev Arora · Mikhail Khodak · Hrishikesh Khandeparkar -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2018 Poster: Stronger Generalization Bounds for Deep Nets via a Compression Approach »
Sanjeev Arora · Rong Ge · Behnam Neyshabur · Yi Zhang -
2018 Oral: Stronger Generalization Bounds for Deep Nets via a Compression Approach »
Sanjeev Arora · Rong Ge · Behnam Neyshabur · Yi Zhang -
2018 Poster: On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization »
Sanjeev Arora · Nadav Cohen · Elad Hazan -
2018 Oral: On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization »
Sanjeev Arora · Nadav Cohen · Elad Hazan -
2018 Tutorial: Toward Theoretical Understanding of Deep Learning »
Sanjeev Arora -
2017 Poster: Generalization and Equilibrium in Generative Adversarial Nets (GANs) »
Sanjeev Arora · Rong Ge · Yingyu Liang · Tengyu Ma · Yi Zhang -
2017 Talk: Generalization and Equilibrium in Generative Adversarial Nets (GANs) »
Sanjeev Arora · Rong Ge · Yingyu Liang · Tengyu Ma · Yi Zhang