What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
Abstract
Incorporating code into training corpora has become a widely acknowledged practice in the development of modern foundation language models (LMs). Compared with a general Internet corpus, code offers high-quality, well-structured signals that substantially augment the coding proficiency of models. Beyond programming skills, prior research has suggested that code data may also contribute to non-coding capabilities. Nevertheless, through a series of rigorous controlled experiments, we demonstrate that the influence of code on other domains, particularly reasoning, remains limited. Our principal findings are as follows: (1) Code corpus yields substantial gains in programming-related abilities but only competed with knowledge-intensive tasks. (2) We identify a core subset that functions as cognitive scaffolding for mathematical reasoning, especially for complex problem-solving scenarios. (3) Formal reasoning provides more pronounced improvements in challenging mathematical reasoning tasks, while natural language–based reasoning proves more effective for simpler reasoning problems. Finally, by probing the internal mechanisms of LMs, we reveal how training data modulates routing patterns, thereby shaping emergent model behavior. As a central driver of model capability, our findings disentangle domain-specific data into finer-grained, cross-domain ability dimensions and underscore promising directions for future data optimization.