The Truth Stays in the Family: Enhancing Contextual Truthfulness via Inherited Heads in Model Lineages
Abstract
Recent advances in large language models (LLMs) have led to the emergence of specialized multimodal LLMs (MLLMs), forming distinct model families that share a common foundation language models. Despite this evolutionary trend, it remains unexplored whether a fundamental behavioral link exists between derived MLLMs and their foundational LLMs. This work investigates the inheritance of truthfulness traits along this trajectory by quantifying the degree of context-truthfulness across individual attention heads. Our analysis of the Vicuna and Qwen families reveals a striking finding: MLLMs maintain a high correlation in truthfulness scores with their base LLMs, even after multi-modal fine-tuning and when evaluated on disparate data sources. Building on this insight, we propose a Soft Gating strategy that utilizes these inherited Truth Scores to amplify the influence of context-truthful heads while preserving the contributions of other heads. We validate our approach on base LLMs on HaluEval benchmark to demonstrate improved truthful reasoning. Subsequently, we show that Truth Scores derived from a base LLM can be effectively transferred to its multimodal descendants as a plug-and-play gate, achieving performance gains on POPE and CHAIR benchmark comparable to probing the MLLMs directly. Our work highlights a novel, systemic approach to enhancing reliability across an entire model family by leveraging its inherent, inherited traits.