Toward Calibrated Mixture-of-Experts Under Distribution Shift
Abstract
Calibration aligns a model's predictive uncertainty with the frequencies of its empirical outcomes and is important toward understanding and trusting reported probabilities. Recent work shows that enforcing calibration at the level of individual predictors can substantially improve ensemble performance, with mixture-of-calibrated-experts (MoCE) models showing strong empirical improvements in particular. However, the conditions under which calibration helps MoCE are not well understood. In this work, we study MoCE models from a distributional robustness perspective, focusing on how routing mechanisms interact with expert-level calibration. We show that for hard routing, expert calibration is sufficient to ensure calibration of the overall model under a broad class of distribution shifts but is insufficient for calibrating a soft-routed model. We characterize the conditions that must hold for a soft-routed MoCE to be calibrated, and we discuss how reframing calibration as a distributionally robust objective helps recover robustness guarantees for soft-routed mixtures.