Poster Wed, Jul 16, 2025 • 11:00 AM – 1:30 PM PDT

Can Transformers Learn Full Bayesian Inference in Context?

Arik Reuter · Tim G. J. Rudner · Vincent Fortuin · David Rügamer

[ OpenReview]

Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context—without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows and enables us to infer complex posterior distributions for models such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods that do not operate in context. The source code for this paper is available at https://github.com/ArikReuter/ICLforFullBayesianInference

Lay Summary

Large Language Models (LLMs), such as the one behind ChatGPT, have become widely used and commercially successful. A key reason for their success is their ability to perform in-context learning (ICL): given only a few examples or instructions in the input, they can solve complex tasks without needing to change their internal parameters. In this work, we explore whether the abstract principle of ICL—learning directly from context can also be applied to a very different challenge: performing full Bayesian inference, a core task in statistics and machine learning. Traditionally, full Bayesian inference requires either very costly computations or relies on approximations that may compromise accuracy. We show that for three widely used statistical models, an ICL-based approach can achieve results comparable to expensive, exact methods while outperforming commonly used approximations. In summary, our results validate that ICL is a meaningful principle for full Bayesian inference and might therefore become a general and promising approach for solving difficult inference problems in science and engineering.

Video

Chat is not available.