Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild
The Effect of Data Corruption on Multimodal Long Form Responses
Daniel Kaplan · Alexis Roger · Mohamed Osman · Irina Rish
Keywords: [ Data corruption ] [ Vision language models ] [ hallucinations ]
Despite significant progress, Vision-Language Models (VLMs) still struggle with hallucinations, especially in long-form responses. Existing strategies have had limited successes in specific cases, and long-form generation remains problematic. In this work we attempt to establish the link between the data used to train the model and the hallucinations in the model's output.To this end, we examine hallucinations through data corruption. We develop a method to corrupt training data and then train models with this data to see the effect on performance. We will show that corrupting only a small portion of the long-form training data significantly impairs the performance of the model on long-form tasks, while leaving simpler tasks like visual question-answering and multiple choice relatively intact. All training code and models are released for reproducibility and future research.