Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)
Transformer Efficiently Learns Low-dimensional Target Functions In-context
Yujin Song · Denny Wu · Kazusato Oko · Taiji Suzuki
Abstract:
Transformers can efficiently learn in-context from example demonstrations.We study ICL of a nonlinear function class via transformer with a nonlinear MLP layer: given a class of single-index target functions $f_*(\boldsymbol{x}) = \sigma_*(\langle\boldsymbol{x},\boldsymbol{\beta}\rangle)$, where the index features $\boldsymbol{\beta}\in\mathbb{R}^d$ are drawn from a rank-$r\ll d$ subspace, we show that a nonlinear transformer optimized by gradient descent learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$.In contrast, an algorithm that directly learns $f_*$ on the test prompt yields a statistical complexity that scales with the ambient dimension $d$.Our result highlights the adaptivity of ICL to low-dimensional structures of the function class.
Chat is not available.