Rethinking LLM Confidence: From Calibration to Coherence
Krish Matta ⋅ Atharv Naphade ⋅ Andy Zou
Abstract
Calibration is the primary criterion for evaluating LLM confidence, but it is insufficient: it admits trivially incoherent estimators, depends on the evaluation distribution, and does not test the extent to which the estimation can be interpreted as a consistent, underlying probability function. To use in real scenarios, we care more how well LLM confidence estimates satisfy the conditions required of coherent probabilistic beliefs. We formalize these conditions along three axes (structural coherence, faithfulness, and usefulness) and operationalize them in $\textbf{CoherenceBench}.$ Widely used estimators systematically violate these conditions despite appearing well-calibrated: models assign lower confidence to logically easier questions 31% of the time, and common interventions reducing RMSCE leave structural violations unchanged, suggesting calibration is orthogonal to probabilistic validity. RLHF and chain-of-thought improve usefulness metrics without restoring coherence. To close this gap, we introduce Reinforcement Learning from Exploitation (RLE), which post-trains a model by directly penalizing Dutch-book exploitability across four coherence templates. RLE outperforms Brier-score fine-tuning on structural coherence in- and out-of-distribution, demonstrating that training against axiom violations is more effective than fitting labelled correctness data alone.
Successful Page Load