Timezone: »
Large Language Models (LLMs) have achieved remarkable success, but often exhibit overconfidence and poor calibration, particularly after instruction-finetuning, which limits their reliability and applicability. To address this, we investigate ensembles, a technique known to enhance neural network calibration but underexplored in LLMs, possibly due to the computational cost of training and evaluating multiple LLMs. We introduce Calibration via Augmented Prompt Ensembles (CAPE), a practical approach to LLM ensembles that leverages the inherent prompt sensitivity of LLMs by augmenting prompts, e.g., by template paraphrasing or option permutation. Our method requires no additional training and can be efficiently evaluated in batch mode, yielding significant calibration improvements for instruction-tuned LLMs.
Author Information
Mingjian Jiang (University of Toronto)
Yangjun Ruan (University of Toronto)
Sicong Huang (University of Toronto)
Saifei Liao (Department of Computer Science)
Silviu Pitis (University of Toronto)
Roger Grosse (University of Toronto and Vector Institute)
Jimmy Ba (University of Toronto / xAI)
More from the Same Authors
-
2020 : Counterfactual Data Augmentation using Locally Factored Dynamics »
Silviu Pitis -
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2022 : MoCoDA: Model-based Counterfactual Data Augmentation »
Silviu Pitis · Elliot Creager · Ajay Mandlekar · Animesh Garg -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 : Statistics estimation in neural network training: a recursive identification approach »
Ruth Crasto · Xuchan Bao · Roger Grosse -
2023 : Training on Thin Air: Improve Image Classification with Generated Data »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Using Synthetic Data for Data Augmentation to Improve Classification Accuracy »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : Multi-Objective Agency Requires Non-Markovian Rewards »
Silviu Pitis -
2023 : Failure Modes of Learning Reward Models for LLMs and other Sequence Models »
Silviu Pitis -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: Efficient Parametric Approximations of Neural Network Function Space Distance »
Nikita Dhawan · Sicong Huang · Juhan Bae · Roger Grosse -
2023 Poster: TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation »
Zhaoyan Liu · Noël Vouitsis · Satya Krishna Gorti · Jimmy Ba · Gabriel Loaiza-Ganem -
2022 Poster: Augment with Care: Contrastive Learning for Combinatorial Problems »
Haonan Duan · Pashootan Vaezipoor · Max Paulus · Yangjun Ruan · Chris Maddison -
2022 Spotlight: Augment with Care: Contrastive Learning for Combinatorial Problems »
Haonan Duan · Pashootan Vaezipoor · Max Paulus · Yangjun Ruan · Chris Maddison -
2022 Poster: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2022 Spotlight: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2021 Oral: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding »
Yangjun Ruan · Karen Ullrich · Daniel Severo · James Townsend · Ashish Khisti · Arnaud Doucet · Alireza Makhzani · Chris Maddison -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2021 Spotlight: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2020 : Counterfactual Data Augmentation using Locally Factored Dynamics »
Silviu Pitis -
2020 Poster: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning »
Silviu Pitis · Harris Chan · Stephen Zhao · Bradly Stadie · Jimmy Ba -
2020 Poster: Improving Transformer Optimization Through Better Initialization »
Xiao Shi Huang · Felipe Perez · Jimmy Ba · Maksims Volkovs -
2020 Poster: Evaluating Lossy Compression Rates of Deep Generative Models »
Sicong Huang · Alireza Makhzani · Yanshuai Cao · Roger Grosse -
2019 Poster: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2019 Poster: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: Sorting Out Lipschitz Function Approximation »
Cem Anil · James Lucas · Roger Grosse -
2018 Poster: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Poster: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Oral: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Oral: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Poster: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse -
2018 Oral: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse