Poster
in
Workshop: 1st ICML Workshop on In-Context Learning (ICL @ ICML 2024)
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang · Rong Ge
Large language models (LLM) exhibit strong in-context learning (ICL) ability, which allows the model to make predictions on new examples based on the given prompt. Recently, a line of research (Von Oswald et al., 2023; Aky ̈urek et al., 2023; Ahn et al., 2023; Mahankali et al., 2023; Zhang et al., 2023) considered ICL for a simple linear regression setting and showed that the forward pass of Transformers is simulating some variants of gradient descent (GD) algorithms on the in-context examples. In practice, the input prompt usually contains two types of information: in-context examples and the task description. Therefore, in this research, we will try to theoretically investigate how the task description helps ICL. Specifically, our input prompt contains not only in-context examples but also a “task descriptor”.We empirically show that the trained transformer can achieve significantly lower loss for ICL when the task descriptor is provided. We further give a global convergence theorem, where the converged parameters match our experimental result.