RLSF-V: Mitigating Hallucinations in MLLMs via Fuzzy Semantic Self-Feedback
Changhao He ⋅ ShuhaoYan ⋅ Shuxian Li ⋅ Xi Peng ⋅ Peng Hu
Abstract
Multimodal large language models (MLLMs) extend large language models (LLMs) with visual perception for open-world understanding, but exacerbate LLMs' hallucinations, in which generated text contradicts visual evidence or common sense. To mitigate hallucinations, a dominant strategy is Direct Preference Optimization (DPO) using hallucination-labeled responses. Existing pipelines, however, face two key limitations: they either (i) rely on human inspection or proprietary models to correct hallucinated outputs, producing off-policy preference data that violate the basic assumptions of DPO, or (ii) depend on stronger peer models to evaluate responses, leading to an unfavorable trade-off between performance and scalability. Departing from these paradigms, we propose an on-policy \emph{self-feedback} framework that constructs preference data for hallucination mitigation without any external supervision (\textit{e.g.}, large models or humans). Specifically, we present a novel \emph{local fuzzy semantic} evaluation paradigm that derives a hallucination-sensitive confidence signal directly from the model's own logits, which is then used to automatically rank diverse generated responses to build high-quality preference pairs for fine-tuning. Trained on a 10k-scale self-generated preference dataset, our self-feedback pipeline achieves over a 50\% relative reduction in \textit{HalRate}$\downarrow$ on AMBER compared to the GPT-4V feedback baselines. Models, code, and datasets will be released upon acceptance.
Successful Page Load