Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter
Zhengbao He ⋅ Ruiqi Ding ⋅ Zhehao Huang ⋅ Ruikai Yang ⋅ Tao Li ⋅ Xiaolin Huang
Abstract
Low-rank adaptation (LoRA) enables parameter-efficient specialization of foundation models, but the proliferation of task-specific adapters fragments capabilities across many adapters, complicating reuse and deployment. We study the problem of merging $T$ LoRAs into **a single rank-$r$ LoRA**, thereby preserving the benefits of low-rank structure. Existing Merge-then-Compress pipelines treat the rank constraint as an afterthought: they merge adapters in the full parameter space, then compress the merged result to rank $r$ via truncated SVD. However, full-parameter merging may destroy the low-rank structure, making it difficult for subsequent compression to recover an effective rank-$r$ LoRA. We propose Compress-then-Merge (CtM), a reversed paradigm that enforces the rank-$r$ bottleneck _before_ merging: CtM computes shared $r$-dimensional subspaces using only the LoRA weights to capture cross-adapter common structure, projects each adapter into the shared subspaces to obtain $r\times r$ coordinates, and then applies standard merging rules in this reduced space. CtM guarantees a rank-$r$ LoRA by construction, avoiding post-hoc truncation, and enables efficient computation in the core space spanned by concatenated LoRA factors. Experiments on ViT-B/32 and LLaMA3-8B demonstrate consistent improvements over single-LoRA-output baselines, while remaining competitive with (and in some cases surpassing) full-parameter merging methods.
Successful Page Load