Poster Thu, Jul 9, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Causally Evaluating the Learnability of Formal Language Tasks

Vésteinn Snæbjarnarson ⋅ Anej Svete ⋅ Josef Valvoda ⋅ Reda Boumasmoud ⋅ Brian DuSell ⋅ Ryan Cotterell

Abstract

Large language models (LLMs) trained on natural language data are capable of translating between languages, predict chess moves, and write poetry. Performance on a given task depends on directly relevant training data, yet confounders abound: data in related languages has been shown to help low-resource languages, and training on code has been shown to improve reasoning capabilities in natural language generation. Formal languages have become a common tool for understanding the learnability of language model architectures and their limitations---we argue that they should also be treated as multi-task learners when studying the learnability of a given \emph{task}. This means that to understand the learnability of a given property of a formal language, confounders from other tasks need to be considered. We propose a causal graphical model and an efficient sampling mechanism for probabilistic finite-state automata that gives full control over the occurrences of a given task while maintaining other language properties. To enable targeted evaluation, we derive task-specific decomposed KL-divergences. These tools allow us to know the \emph{causal} relationship between how often a task appears and its true learnability. Our experiments confirm that the correlation between task occurrences and learnability does not recover the accurate relationship---for this, the causal analysis and machinery is necessary.