Revisiting the Role of Pretrained Weights in Model Merging: On Near-Optimality within the Core Subspace
Abstract
Model merging offers an efficient solution for integrating task-specific knowledge from multiple fine-tuned models. Most existing approaches focus on manipulating the difference vectors between fine-tuned and pre-trained weights, often overlooking the generalization capabilities inherent in the pretrained parameters themselves. In this work, we revisit the role of pretrained weights in model merging and investigate their efficacy from a subspace perspective. We find that the components of pretrained weights residing in the core subspace—defined by the dominant singular vectors—are essential for maintaining generalization across diverse tasks. Specifically, we present empirical evidence that pretrained weights are nearly first-order stationary and exhibit predominantly non-negative curvature within this core subspace with respect to multi-task loss landscapes, indicating near-optimality. These findings suggest that task-specific adaptations should be injected primarily into the orthogonal complement of the core subspace, thereby preserving the generalization properties of the pretrained model. Extensive experiments on vision and vision-language tasks show that this subspace-aware strategy consistently yields improvements over state-of-the-art training-free merging methods, including Task Arithmetic, LOT Merging, ISO, and TSV.