Skip to yearly menu bar Skip to main content


Oral
in
Affinity Workshop: LatinX in AI (LXAI) Research Workshop

Non-Uniform Parameter-Wise Model Merging

Albert Manuel Orozco Camacho · Stefan Horoi · Guy Wolf · Eugene Belilovsky

Keywords: [ multi-model merging ] [ model fusion ] [ linear mode connectivity ]


Abstract:

Combining multiple machine learning models has been a long-standing technique to enhance performance, particularly in distributed settings. Traditional approaches such as model ensembles work well but are expensive regarding memory and computing. Recently, methods based on averaging model parameters have achieved good results in some settings and have since gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after the alignment of their neurons. In this paper, we introduce a novel approach that merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings and extend it to the merging of many models, outperforming existing state-of-the-art techniques. Our findings offer a promising avenue for improving model fusion in distributed learning settings.

Chat is not available.