Adversarial Latent Embedding Repair for LLM Continual Learning
Xilin Xia ⋅ Tong Xialiang ⋅ Jie Wang ⋅ Chi Ma ⋅ Shengxue Li ⋅ 白 寅岐 ⋅ Yuhang Jiang ⋅ Xing Li ⋅ Jianye Hao ⋅ Mingxuan Yuan ⋅ Feng Wu
Abstract
Research on continual learning for LLMs seeks to acquire new skills without catastrophic forgetting of established prior knowledge. However, domain-specific fine-tuning still triggers severe, long-tailed forgetting issues even under narrow updates, particularly when the pre-training data is inaccessible. To tackle this challenge, we propose **ALER**, a data-free continual learning framework that adversarially searches for a small set of latent prompt embeddings to maximize logit divergence from a frozen reference model, proactively exposing high-risk forgetting modes at each step. It then performs online distillation from the frozen reference using the discovered embeddings to retain prior behaviors while preserving target-domain adaptation. We provide theoretical guarantees on the efficiency of our targeted repair, and extensive experiments demonstrate consistent improvements in the retention–adaptation frontier over representative baselines across $2$ domain-specific fine-tuning datasets and $6$ general-purpose benchmarks, suggesting a more proactive approach for LLM continual learning.
Successful Page Load