ICML Jason Lee (Princeton): Learning Representations and Associations with Gradient Descent

Invited Talk
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Jason Lee (Princeton): Learning Representations and Associations with Gradient Descent

Jason Lee

[ Abstract ]

Sat 27 Jul 12:35 a.m. PDT — 1:05 a.m. PDT

Abstract:

Machine Learning has undergone a paradigm shift with the success of pretrained models. Pretraining models via gradient descent learns transferable representations that adapt to a wide swath of downstream tasks. However, significant prior theoretical work has demonstrated that in many regimes, overparametrized neural networks trained by gradient descent behave like kernel methods, and do not learn transferable representations. In this talk, we close this gap by demonstrating that there is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a neural network by learning representations that are relevant to the target task. We also demonstrate that these representations allow for efficient transfer learning, which is impossible in the kernel regime.

Finally, I will demonstrate how pretraining learns associations for in-context learning with transformers. This leads to a systematic and mechanistic understanding of learning causal structures including the celebrated induction head identified by Anthropic.

Chat is not available.

Invited Talk in Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Jason Lee (Princeton): Learning Representations and Associations with Gradient Descent

Jason Lee

Invited Talk
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)