Sketch-Based Low-Rank Model Merging with Shared Circulant Transforms
Abstract
Merging multiple low-rank adapters (LoRA) provides a practical route to scaling multi-task learning and deployment more efficiently than full-model weight merging, while avoiding reliance on task-specific training data. However, most existing approaches either treat LoRA updates as dense weight deltas or depend on expensive subspace factorizations, making the merge step a primary latency bottleneck. To address this issue, this paper establishes a theoretically positive relationship between merging quality and the effective rank of the matrices being merged. Motivated by this insight, we propose CircuMerge, a sketch-based framework for low-rank model merging built on shared circulant transforms. Especially, this approach treats each adapter as a pair of low-rank matrices and applies a shared circulant transform to align all tasks in a common coordinate system. This alignment enables more efficient sampling, allowing us to generate compact sketches that effectively summarize the interactions between tasks. These compact sketches enable applying the merging rules directly to them and reconstructing a standard low-rank adapter, preserving the essential information while significantly reducing computational overhead. Across a broad multi-task LoRA benchmarks covering both vision and language settings, extensive empirical results demonstrate that CircuMerge reduces the overall merging time by at least 44\% compared to the state-of-the-art approaches, with accuracy matching or exceeding the optimal level.