Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Transformer Efficiently Learns Low-dimensional Target Functions In-context

Yujin Song · Denny Wu · Kazusato Oko · Taiji Suzuki


Abstract: Transformers can efficiently learn in-context from example demonstrations.We study ICL of a nonlinear function class via transformer with a nonlinear MLP layer: given a class of single-index target functions $f_*(\boldsymbol{x}) = \sigma_*(\langle\boldsymbol{x},\boldsymbol{\beta}\rangle)$, where the index features $\boldsymbol{\beta}\in\mathbb{R}^d$ are drawn from a rank-$r\ll d$ subspace, we show that a nonlinear transformer optimized by gradient descent learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$.In contrast, an algorithm that directly learns $f_*$ on the test prompt yields a statistical complexity that scales with the ambient dimension $d$.Our result highlights the adaptivity of ICL to low-dimensional structures of the function class.

Chat is not available.