Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning
Abstract
Combining existing pre-trained LLMs is a promising avenue for tackling diverse reasoning tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable instance-level mixing of LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE uses inferred skills, i.e., specialized knowledge such as algebra in mathematics, for expert selection. Each expert is selected based on how relevant its expertise is to the query, and then generates its own reasoning. This results in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator, chosen based on its ability to integrate diverse outputs. We show that instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, ensuring each model will only be loaded once. This allows us to integrate 16 expert models on a single GPU with a time cost comparable to prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), Symbolic-MoE shows an absolute average improvement of 8.15% over the best baseline. Moreover, Symbolic-MoE generalizes well to unseen tasks and removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.