Timezone: »
Predicting Task Forgetting in Large Language Models
Anat Kleiman · Jonathan Frankle · Sham Kakade · Mansheej Paul
Event URL: https://openreview.net/forum?id=0BMg0OgNTP »
In this paper, we offer a comprehensive evaluation of forgetting in large language models (LLMs) during sequential learning of finetuning tasks in a pretrained model. We empirically track the degradation of performance across diverse tasks and find that the validation perplexity can be predicted using a linear function, regardless of the specific task, model architecture, or task order. This knowledge sheds light on the dynamics of knowledge acquisition and retention, offering practical implications for managing and mitigating task forgetting in LLM-based systems.
Author Information
Anat Kleiman (Harvard University)
Jonathan Frankle (MosaicML / Databricks)
Sham Kakade (Harvard University and Amazon Scholar)
Mansheej Paul (Stanford University)
More from the Same Authors
-
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 : Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2023 : RapidBERT: How to Train BERT with a Lunch Money Budget »
Alexander Trott · Jacob Portes · Sam Havens · DANIEL KING · Abhinav Venigalla · Moin Nadeem · Nikhil Sardana · Daya Khudia · Jonathan Frankle -
2023 : Soft prompting might be a bug, not a feature »
Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: On Provable Copyright Protection for Generative Models »
Nikhil Vyas · Sham Kakade · Boaz Barak -
2023 Poster: Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games »
Dylan Foster · Noah Golowich · Sham Kakade -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang