Don't Forget Why You Started: Tackling Dual Forgetting in Vision-Language Continual Learning
Abstract
Continual learning of Vision-Language Model (VLM) aspires to empower foundation models with new expertise without compromising their universal zero-shot capabilities. However, this pursuit faces a critical ''dual-forgetting'' challenge: the catastrophic forgetting of newly acquired classes (Incremental Knowledge Forgetting, IKF) and the insidious erosion of foundational zero-shot capabilities (Pre-trained Knowledge Forgetting, PKF). Existing evaluations often ignore PKF or assess it via confounded protocols where positive transfer on semantically similar domains creates an illusion of retention, masking severe foundational degradation. To address this, we propose the Dual-Forgetting-Aware Class-Incremental Learning (DFA-CIL) framework and the Similarity-Calibrated Retention (SCR) metric. Unlike standard averaging, SCR utilizes the frozen pre-trained feature space to inversely weight performance based on semantic similarity, effectively mitigating the confounding gains to stress-test foundational stability. Building on this, we propose DFA-MoE, a functionally heterogeneous Parameter-Efficient Fine-Tuning (PEFT) method. DFA-MoE architecturally decouples optimization objectives by assigning a momentum-enhanced contrastive expert for feature alignment, and separate plasticity experts that combine classification with auxiliary contrastive learning to adapt to new tasks while retaining historical knowledge. Extensive experiments demonstrate that our framework uncovers the hidden fragility of existing methods and achieves a state-of-the-art balance in preserving both incremental and pre-trained knowledge.