“very likely” Means “uncertain”? How LLMs Diverge from Humans in Linguistic Uncertainty Quantification
Abstract
Humans express uncertainty verbally via markers (e.g., "possible", "likely"), yet most LLM uncertainty quantification (UQ) relies on costing likelihood- or consistency-based signals. From a cognitive perspective, accurate verbal uncertainty reflects metacognitive monitoring, representing knowledge boundaries ("knowing that you don't know") to support regulation and information seeking. In this paper, we investigate: How LLMs diverge from humans in verbal uncertainty quantification? Can verbal markers reliably quantify LLM uncertainty? We curate a corpus of human uncertainty markers from psychology and decision-science literature and benchmark LLMs against it. We observe that LLMs encode verbal uncertainty with numerical levels that differ substantially from those of humans. We then introduce VOCAL, a novel optimization-based algorithm that learns an optimal uncertainty profile over uncertainty markers directly from LLM outputs. By fitting a marker–uncertainty mapping to best explain empirical correctness, VOCALdiscovers how much probability mass each verbal marker should convey, rather than estimating uncertainty via repeated sampling. VOCAL enables a direct, marker-level comparison of confidence semantics between humans and LLMs, disentangling mismatch and revealing systematic confidence disparities in verbal expressions.