Skip to yearly menu bar Skip to main content


Poster

An Information-Theoretic Analysis of In-Context Learning

Hong Jun Jeon · Jason Lee · Qi Lei · Benjamin Van Roy

Hall C 4-9 #1109
[ ] [ Paper PDF ]
[ Poster
Tue 23 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

Previous theoretical results pertaining to meta-learning on sequences build on contrived and convoluted mixing time assumptions. We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers and corroborate existing results a simple linear setting. Our theoretical results characterize how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.

Chat is not available.