Sparse by Design: Relevance-Driven Scaling for Recommender Systems
Abstract
Sparse Mixture-of-Experts (SMoE) has emerged as a powerful conditional computation paradigm for scaling Large Language Models efficiently. While recent efforts have begun exploring SMoE architectures in recommender systems, achieving commensurate efficiency-performance tradeoffs has proven considerably more challenging than in language modeling. We attribute this difficulty to two structural impediments: (i) conventional token-level routing mechanisms poorly align with the fundamental objective of user-item relevance prediction; and (ii) relevance signals in recommendation models emerge through distributed, multi-stage interactions rather than through a single, consistently traversed transformation, limiting the effectiveness of standard expert selection strategies. To address these challenges, we propose the Massive Routing Network (MRN), a scalable sparse framework that explicitly aligns conditional computation with the unique computational topology of recommendation models. Extensive evaluations on public benchmarks and billion-user-scale industrial datasets demonstrate that MRN consistently outperforms competitive dense and sparse baselines under comparable compute budgets. Crucially, MRN overcomes commonly observed performance saturation and exhibits much more favorable scaling laws than prior state-of-the-art dense and sparse baselines.