A Cognitive Battery for Foundation Models: Theory-Grounded Benchmarks for Attention, Learning, Metacognition, Executive Function, and Social Cognition
Abstract
We present a cognitive benchmark battery for foundation models: five procedurally generated evaluations totalling 25,390 items across 1,138 sub-task types, each operationalising a widely studied family of cognitive constructs: selective attention (Controlled Distractor Injection, CDI), fluid learning (Alien Grammar Induction, ALGIn), metacognitive calibration (Epistemic Calibration Under Uncertainty, ECUU), executive control (Dynamic Rule Override, DRO), and theory of mind (Recursive Belief Tracking, RBT). Rather than report a single accuracy number, each benchmark traces a degradation profile along a controlled difficulty axis (distractor count, rule count, recursion order, etc.), turning the evaluation into a parametric probe whose shape can be predicted from a candidate theory of the underlying capability. We describe the design, give worked examples, and discuss how the resulting profiles plug into the workshop's theory–benchmark virtuous cycle. This paper introduces the battery and its rationale; a companion release will report calibration and model scores.