Oral
in
Affinity Workshop: LatinX in AI (LXAI) Research Workshop
Non-Uniform Parameter-Wise Model Merging
Albert Manuel Orozco Camacho · Stefan Horoi · Guy Wolf · Eugene Belilovsky
Keywords: [ multi-model merging ] [ model fusion ] [ linear mode connectivity ]
Combining multiple machine learning models has been a long-standing technique to enhance performance, particularly in distributed settings. Traditional approaches such as model ensembles work well but are expensive regarding memory and computing. Recently, methods based on averaging model parameters have achieved good results in some settings and have since gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after the alignment of their neurons. In this paper, we introduce a novel approach that merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings and extend it to the merging of many models, outperforming existing state-of-the-art techniques. Our findings offer a promising avenue for improving model fusion in distributed learning settings.