Poster

Bits That Count: Quantifying and Predicting Capabilities of Language Models

Elizabeth Donoway ⋅ Hailey Joren ⋅ Michael R DeWeese ⋅ Ethan Perez ⋅ John Schulman ⋅ Fabien Roger ⋅ Jan Leike

Abstract

What and how do language models learn during training? When does learning elicit \textit{existing} knowledge, and when does it primarily teach \textit{new} capabilities? We find that the amount of generalizable information language models learn during training predicts the origins of their emergent capabilities. Minuscule amounts of information---in many cases, a few bits in a single example---can unlock large fractions of models' maximum performance when capabilities are \textit{elicited} rather than \textit{taught}. We quantify these learning regimes using excess description length (EDL), an information-theoretic measure of generalizable information learned during training. We find that elicitation and teaching exhibit distinct EDL signatures that characterize the predominant learning mechanism as information scales: elicitation requires orders of magnitude less information than teaching to comparable performance. We demonstrate that EDL provides a practical tool for quantitatively estimating the maximum amount of predictive information models can compress from data into trainable parameters during learning. These capacity limits describe optimal tradeoffs between data and parameter count that robustly predict when parameter-efficient fine-tuning methods (\textit{e.g.}, LoRA) will underperform full fine-tuning.