NeuroSupport: EEG-Based Evaluation of Mental Health LLMs for Youth in Muslim-Majority Indonesia
Abstract
Indonesia, the world’s largest Muslim‑majority country, faces a severe shortage of accessible mental health support. While large language models (LLMs) offer a scalable path to AI‑assisted supportive dialogue, current evaluation relies heavily on text‑based heuristics or subjective self-reports, failing to capture whether generated responses actively engage the human brain’s neural validation and emotional regulation circuits. To address this gap, we introduce \emph{NeuroBench}, a dataset of electroencephalograms (EEG) signals from 60 Indonesian youth (ages 16–24) reading psychologist- and LLM-generated supportive responses to twelve culturally grounded distress vignettes in Bahasa Indonesia. For each response, we record three reward-relevant neural signals, yielding approximately 12,000 response–EEG pairs with full clinician annotations for safety and cultural appropriateness. In the proposed competition, participants are tasked to build models that predict the EEG signals that were actually recorded when responses were read by the recruited youths. Specifically, given the distress vignette, a candidate supportive response, and basic participant demographics, models must forecast three well‑established neural signatures of therapeutic engagement: the feedback‑related negativity, the late positive potential, and frontal alpha asymmetry, which index valuation, emotional salience, and approach motivation, respectively. The competition will be evaluated using mean squared error and Pearson correlation against the true EEG features on a held‑out set of 20\% of participants, with an additional bonus track for models that jointly predict the clinician‑rated safety and cultural appropriateness scores. By anchoring evaluation in direct neural evidence rather than surface‑level text similarity, NeuroBench aims to catalyze the development of LLMs that genuinely resonate with the brain’s reward and emotion‑regulation systems, pushing the frontier of neurally validated, culturally‑informed mental health applications.