Multimodal Scaling Laws for Task & Data-Optimized Models of Visual Cortex
Abstract
Task-optimized neural networks are the leading in-silico models of sensory cortex, yet the field lacks a unified understanding of which modeling choices drive improved brain alignment. Prior NeuroAI work is fragmented across datasets and modalities, making it difficult to determine robust scaling trends. Here, we systematically investigate the scaling laws of model-to-brain alignment across 8 neural datasets (spanning electrophysiology, fMRI, EEG, and MEG) and over 600 models with diverse architectures and pretraining configurations. We report three key scaling trends: (1) Pretraining saturation: Alignment improves with pretraining compute and data scale but saturates across all recording modalities. (2) Complementary fine-tuning: Hybrid task & neural data optimization yields consistent improvements in alignment that generalize across datasets and modalities. (3) Mapping scaling: Increasing the number of neural samples to fit model-to-brain mappings yields log-linear gains with the largest impact on alignment. Finally, we propose a novel subject-shared cross-attention mapping which drastically reduces parameter count and improves alignment. Taken together, these results establish multimodal scaling laws that guide resource allocation for next-generation brain models.