NeuroCLUS: A Foundation Model with Functional Clustering for Intracranial Neural Decoding
Abstract
Foundation models for intracranial neural recordings aim to learn generalizable representations from large-scale unlabeled data. However, existing approaches rely on suboptimal tokenization schemes -- treating individual electrode channels as independent tokens or aggregating them into a single brain-wide representation -- which fail to capture the brain’s inherent functional modularity. We introduce NeuroCLUS, a foundation model that learns to represent neural activity through data-driven functional clusters. NeuroCLUS is built on a novel two-stage pre-training framework. First, a spatial-temporal model learns a functional context graph between channels via a functional context prediction task. Second, this graph guides a soft clustering of channels into a set of learnable prototype tokens, enabling the transformer backbone to process coherent functional units rather than raw channels. Evaluated across a diverse range of decoding paradigms -- including speech perception, speech production, and seizure detection -- NeuroCLUS consistently achieves state-of-the-art performance. The discovered functional clusters align with established neurophysiology and offer enhanced interpretability. Our work demonstrates that explicitly modeling functional neural groupings significantly improves the efficiency, generalization, and interpretability of foundation models for intracranial decoding.