Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Fri, Jul 10, 2026 • 12:00 AM – 1:00 AM PDT

Beyond Answer Correctness: Measuring and Reducing Explanation Faithfulness Gaps in Chart Understanding VLMs

Kshitij Dahiya ⋅ Vinay K Saini

Project Page

Abstract

Vision-language models (VLMs) increasingly generate free-form explanations for chart understanding tasks, yet their evaluation relies almost entirely on answer correctness. We argue that QA accuracy constrains only a low-dimensional projection of the response space, leaving explanation-level hallucinations invisible to standard benchmarks. We formalize this by decomposing the response space into a QA-consistent set $R_{QA}$ and a faithful manifold $R_F(E)$. The hallucination region is defined as: $H(E) = R_{QA} \setminus R_F(E)$ capturing responses that are answer-correct but contain claims unsupported by visual evidence. Empirically, on a 200-instance ChartQA-Pro subset, we find that $87.1%$ of QA-correct responses from a base VLM lie in $H(E)$, despite achieving near-perfect QA scores. To address this, we propose a staged alignment strategy that first ensures domain competence and then enforces faithfulness. This reduces the faithfulness gap $\Delta(\pi_\theta)$ from $87.1%$ to $35.7%$. Our results demonstrate that faithfulness is an independent evaluation axis that cannot be inferred from answer correctness alone, and must be explicitly measured and optimized for reliable VLM reasoning.