VLFEEDBACK-EEG: Neural Signals as Implicit Feedback for Vision-Language Model Alignment
Abstract
We introduce VLFEEDBACK-EEG, the first dataset to pair EEG recordings with LVLM preference annotation, enabling the study of physiological signals as an implicit feedback source for multimodal alignment. Twenty-one participants evaluated response pairs from the VLFEEDBACK dataset while their neural activity was recorded, yielding a unique resource that captures neural responses associated with preference judgments. Our analysis suggests that EEG features under pairwise evaluation contain preference information, particularly with respect to human preference labels relative to AI-generated annotations. An exploratory fusion with pretrained reward models does not yet surpass RM-only baselines. However, real EEG consistently outperforms noise controls, suggesting that physiological signals may provide complementary information for preference modelling. We release VLFEEDBACK-EEG publicly to support future research on cognitively-grounded multimodal alignment.