Poster Wed, Jul 8, 2026 • 10:30 PM – 12:15 AM PDT HALL A #4316

ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Shangqian Gao ⋅ Ting Hua ⋅ Reza Shirkavand ⋅ Chi-Heng Lin ⋅ Zheng Tang ⋅ Zhengao Li ⋅ Longge Yuan ⋅ Fangyi Li ⋅ Zeyu Zhang ⋅ Alireza Ganjdanesh ⋅ Qian Lou ⋅ Jie Xu ⋅ Yen-Chang Hsu

Project Page

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities but face deployment challenges due to their high computational demands. Traditional pruning methods reduce these costs by permanently removing parameters, which inevitably leads to performance degradation. To mitigate this issue, we propose ToMoE, a method that transforms dense LLMs into Mixture-of-Experts (MoE) models by uncovering experts inherently present within dense models, without requiring any weight updates. ToMoE leverages dynamic structural pruning to unify expert construction and router training in a single stage, achieving consistently strong performance. Remarkably, even without fine-tuning \revise{the model weights}, ToMoE consistently outperforms state-of-the-art pruning and MoE techniques across Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5 models. The code for this paper is available at https://github.com/gaosh/ToMoE.