Saliency-Aware Model Merging
Abstract
Model merging aims to consolidate multiple task-specific models fine-tuned on different datasets into a unified architecture that performs cross-domain proficiency. Current data-free model merging methods often struggle to scale as they rely on simple parameter-level heuristics that ignore inter-layer dependencies and non-uniform distribution of expertise. To address this, we propose SA-Merging, a new basis for model merging that estimates the saliency of each parameter through a differentiable inter-layer interaction function. By leveraging the gradients of this function with merged parameters, we derive a saliency score that identifies parameters critical to preserving end-to-end information flows. Building on this signal, SA-Merging recursively eliminates non-informative parameters in a purely data-free manner. Notably, our method is inherently modular, seamlessly integrating with existing sign-based and sparsification-based interference mitigation strategies. Furthermore, we extend SA-Merging to introduce rank-wise saliency decomposition for LoRA, enabling the integration of low-rank adapters without compromising their structural integrity. Extensive experiments on vision and language tasks demonstrate the effectiveness of our saliency-based approach, further reducing the gap between data-free and test-time adaptation methods.