Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Thu, Jul 9, 2026 • 7:00 PM – 8:00 PM PDT

Perplexity Cannot Always Tell Right from Wrong

Petar Veličković ⋅ Federico Barbero ⋅ Christos Perivolaropoulos ⋅ Simon Osindero ⋅ Razvan Pascanu

Project Page

Abstract

Perplexity---a function measuring a model's overall level of "surprise" when encountering a particular output---has gained significant traction in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently---a necessary pre-requisite for strong generalisation---it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always select for the more accurate model---rather, any increase in model confidence must be accompanied by a commensurate rise in accuracy for the new model to be selected.