Merge to Remember: Sharpness-Aware Isotropic Merging for Continual Learning
Abstract
Continual learning with large pre-trained models offers significant potential for cross-task knowledge accumulation, but faces critical challenges such as catastrophic forgetting and parameter interference, especially when historical data is unavailable. Existing approaches typically rely on sequential fine-tuning or model merging strategies, yet often overlook the impact of loss landscape sharpness and dominant singular value directions, which leads to subspace misalignment and severe knowledge forgetting. In this paper, we propose the Sharpness-Aware Isotropic Merging (SAIM) framework, which introduces targeted optimizations in both the fine-tuning and merging stages to address these issues. Specifically, SAIM consists of two synergistic modules: (1) a Sharpness-Aware Block Coordinate Descent (SA-BCD) optimizer that guides the model toward flatter minima and selectively updates the most task-sensitive parameters, thereby mitigating parameter interference and enhancing robustness; (2) an adaptive isotropic merging algorithm that dynamically balances the singular value spectrum across tasks, effectively preventing the model from overemphasizing any single task direction, maintaining balanced knowledge representation, and improving subspace alignment. Extensive experiments on vision and language benchmarks demonstrate that SAIM achieves 5-10% higher accuracy than existing methods and maintains robust performance as the number of tasks increases. Ablation studies further validate the effectiveness of the SA-BCD fine-tuning strategy in promoting flat minima and reducing parameter interference, as well as its compatibility with various merging approaches.