Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Rethinking LLM Confidence: From Calibration to Coherence

Krish Matta ⋅ Atharv Naphade ⋅ Andy Zou

Project Page

Abstract

Calibration is the primary criterion for evaluating LLM confidence, but it is insufficient: it admits trivially incoherent estimators, depends on the evaluation distribution, and does not test the extent to which the estimation can be interpreted as a consistent, underlying probability function. To use in real scenarios, we care more how well LLM confidence estimates satisfy the conditions required of coherent probabilistic beliefs. We formalize these conditions along three axes (structural coherence, faithfulness, and usefulness) and operationalize them in $\textbf{CoherenceBench}.$ Widely used estimators systematically violate these conditions despite appearing well-calibrated: models assign lower confidence to logically easier questions 31% of the time, and common interventions reducing RMSCE leave structural violations unchanged, suggesting calibration is orthogonal to probabilistic validity. RLHF and chain-of-thought improve usefulness metrics without restoring coherence. To close this gap, we introduce Reinforcement Learning from Exploitation (RLE), which post-trains a model by directly penalizing Dutch-book exploitability across four coherence templates. RLE outperforms Brier-score fine-tuning on structural coherence in- and out-of-distribution, demonstrating that training against axiom violations is more effective than fitting labelled correctness data alone.