Skip to yearly menu bar Skip to main content


How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Xingwu Chen ⋅ Lei Zhao ⋅ Difan Zou

Abstract

Chat is not available.