Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Scaling Vision Transformers for Functional MRI with Flat Maps

Connor Lane ⋅ Ratna Grandhi ⋅ Leema Krishna Murali ⋅ Mihir Tripathy ⋅ Shamus Zi Yang Sim ⋅ Will Beddow ⋅ Gianfranco Cortes ⋅ Suin Cho ⋅ Debojyoti Das ⋅ Sam Gijsen ⋅ Manish Ram ⋅ Utkarsh Singh ⋅ Cesar Kadir Torrico Villanueva ⋅ YUXIANG WEI ⋅ Daniel Kaplan ⋅ Benjamin Warner ⋅ Tanishq Abraham ⋅ Paul Scotti

Abstract

We propose a simple strategy for training a foundation model on functional MRI (fMRI) data: we adapt the standard Vision Transformer to fMRI by first converting each 3D fMRI volume to a 2D map using a standard cortical flat map projection. We train spatiotemporal masked autoencoders (MAE) on 2.3K hours of fMRI flat map videos. Our model (CortexMAE) outperforms identical MAE models trained on parcel-averaged or native volume data. We perform the first quantitative scaling analyses for fMRI and observe strict power law scaling. Finally, we develop the first open evaluation suite for fMRI foundation models and use it to perform a comprehensive comparison. On cognitive state decoding, our model outperforms all models by a wide margin. On clinical trait prediction, however, we report an important mixed result: all models show inconsistent performance (including our own). We hope that by introducing reproducible benchmarks and a strong, simple baseline, we can help establish a clear frontier for fMRI foundation models. Code is available at \url{https://anonymous.4open.science/r/cortex_mae}.