Sample Margin-Aware Recalibration of Temperature Scaling
Abstract
Deep neural networks frequently exhibit overconfidence, undermining reliability in safety-critical applications. Existing adaptive methods rely on indirectly learned proxies of sample difficulty. We establish the logit margin as a direct and principled hardness indicator. We prove that margin tightly bounds the feasible temperature range for any target confidence. Empirically, margin strongly correlates with decision boundary proximity and reveals systematic calibration patterns across difficulty levels. We further identify a fundamental flaw in NLL-based optimization: minimizing NLL can paradoxically worsen calibration. To address this, we introduce Charbonnier-Smoothed SoftECE, a smooth objective that provably upper-bounds the smooth calibration error (smCE). Building on these insights, we propose SMART (Sample Margin-Aware Recalibration of Temperature), a lightweight method that learns a sample-wise margin-to-temperature mapping guided by our calibration-centric objective. Experiments demonstrate state-of-the-art calibration across CNNs and ViTs on standard, long-tailed, and distribution-shifted benchmarks, with a minimal inference-time data consumption. Code: https://anonymous.4open.science/r/SMART-8B11.