Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication
Leon Chlon ⋅ Ahmed Karim ⋅ MarcAntonio Awada
Abstract
Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers (“hallucinations” under a Bernoulli predicate). We treat evidence order as a nuisance variable and show that next-token training minimizes expected conditional description length over orderings, which can be close to Bayes-optimal in expectation while deviating under any fixed ordering. We quantify this expectation–realization gap via a Quantified Martingale Violation (QMV) bound that predicts $O(\log n)$ growth in permutation dispersion under harmonic positional sensitivity. We then derive the Expectation-level Decompression Law (EDFL), relating expected information budget to achievable reliability for Bernoulli predicates, and use it to define Bits-to-Trust (B2T), Risk-of-Hallucination (RoH), and the Information Sufficiency Ratio (ISR), with a fixed ISR-gating rule for answer/abstain decisions under permutation mixtures. On 3,059 grounded items from a five-benchmark evidence-grounded QA suite (FEVER, HotpotQA, NQ-Open, PopQA, and Controls), we observe logarithmic dispersion and Jensen gains from uniform permutation mixtures. In a pre-specified held-out audit (528 items), an $ISR=1$ gate attains 0.0–0.7% hallucination with 20.6–27.9% abstention (95% CIs).
Successful Page Load