ICML Transformers are Minimax Optimal Nonparametric In-Context Learners

Oral
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Transformers are Minimax Optimal Nonparametric In-Context Learners

Juno Kim · Tai Nakamaki · Taiji Suzuki

[ Abstract ] [ Project Page ]

[ Slides] [ OpenReview]

Abstract: We shed light on the effectiveness of ICL from the viewpoint of statistical learning theory. We develop approximation and generalization error analyses for a transformer composed of a deep neural network and one linear attention layer, pretrained on nonparametric regression tasks sampled from general function spaces including the Besov space and piecewise $\gamma$-smooth class. In particular, we show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context by encoding the most relevant basis representations during pretraining. Our analysis extends to high-dimensional or sequential data and distinguishes the \emph{pretraining} and \emph{in-context} generalization gaps, establishing upper and lower bounds w.r.t. both the number of tasks and in-context examples.

Chat is not available.

Oral in Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Transformers are Minimax Optimal Nonparametric In-Context Learners

Juno Kim · Tai Nakamaki · Taiji Suzuki

Oral
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)