CiteGuard: Conformal False-Discovery Control for Faithful Retrieval-Augmented Generation
Abstract
Large language models increasingly rely on retrieval-augmented generation (RAG) to ground responses in external corpora. Yet, even with strong retrievers, generated statements can remain unsupported, and the resulting citations are often not reliable indicators of evidence. We introduce CiteGuard, a RAG decoding layer that treats sentence-level factuality as a multiple-testing problem and combines conformal calibration with false-discovery-rate control. CiteGuard converts claim–evidence scores into p-values for the null hypothesis "unsupported" and uses BH/BY procedures to decide which claims to keep (with citations) and which to abstain on. On FEVER and Natural Questions, CiteGuard reduces the false-discovery rate among accepted claims from 28–31% (vanilla RAG) to below 10% at α=0.10, while retaining 86–92% of supported claims. This yields a user-controlled risk budget: practitioners can trade off faithfulness and coverage via α, with finite-sample guarantees under standard exchangeability assumptions.