Persistent Backdoor Attacks in Class-Incremental Learning via Structural Invariant Anchoring
Abstract
Continual Learning (CL) continually performs parameter updates, posing a significant challenge to backdoor persistence. In this paper, we reveal that the most advanced attack relies on an implicit assumption that task-critical neurons remain stable across task learning; however, it does not hold in class-incremental learning (CIL). This exposes a critical research gap: the backdoor persistence in CIL is still an open question. Inspired by the function stability despite neuron instability, we discover that the CIL models preserve task knowledge in shallow, structurally invariant subspaces. Motivated by the findings, we propose PBTO, the first persistent and targeted backdoor attack in CIL. PBTO trains a surrogate model on proxy tasks to obtain the parameter trajectory. Then, it optimizes a universal trigger that ensures misclassification to the target label across all model states and anchors trigger embeddings in shallow layers. Experimental results verify that PBTO maintains effectiveness even after learning multiple tasks, while existing methods degrade to below 10\%.