Causal Dependency-Aware Unsupervised Routing for Large Reasoning Models
Abstract
As Large Language Model (LLM) ecosystems grow, routing queries to the most suitable model in a diverse pool has become a critical strategy for building efficient and high-performing AI systems. A common approach is to train a supervised router; however, this requires vast, expensive human-annotated preference data and creates models that are notoriously brittle, failing to generalize when faced with inevitable distribution shifts in user queries. Consequently, developing robust, unsupervised routing methods that adapt without retraining is a crucial research frontier. This challenge is severely amplified by Large Reasoning Models (LRMs), which introduce a dual problem for any label-free method: their outputs have a causal “thinking”→“answer” structure that must be modeled, and a structural imbalance where long reasoning text can dominate the final answer signal. We introduce ReasoningRouter, a novel framework that resolves these issues with a length-balanced embedding strategy and a probabilistic model capturing the thinking-to-answer dependency. Our key theoretical advance, the Causal Triangulation Property, enables the label-free estimation of component qualities and their causal link. Beyond competitive routing accuracy, ReasoningRouter offers unprecedented insights into model behavior, enabling separate quality assessment of reasoning and answer components while maintaining computational efficiency. The code is provided in the supplementary materials.