Timezone: »

Calibrating Language Models via Augmented Prompt Ensembles
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba
Event URL: https://openreview.net/forum?id=L0dc4wqbNs »

Large Language Models (LLMs) have achieved remarkable success, but often exhibit overconfidence and poor calibration, particularly after instruction-finetuning, which limits their reliability and applicability. To address this, we investigate ensembles, a technique known to enhance neural network calibration but underexplored in LLMs, possibly due to the computational cost of training and evaluating multiple LLMs. We introduce Calibration via Augmented Prompt Ensembles (CAPE), a practical approach to LLM ensembles that leverages the inherent prompt sensitivity of LLMs by augmenting prompts, e.g., by template paraphrasing or option permutation. Our method requires no additional training and can be efficiently evaluated in batch mode, yielding significant calibration improvements for instruction-tuned LLMs.

Author Information

Mingjian Jiang (University of Toronto)
Yangjun Ruan (University of Toronto)
Sicong Huang (University of Toronto)
Saifei Liao (Department of Computer Science)
Silviu Pitis (University of Toronto)
Roger Grosse (University of Toronto and Vector Institute)
Jimmy Ba (University of Toronto / xAI)

More from the Same Authors