Skip to yearly menu bar Skip to main content

Plenary Speaker
Workshop: HiLD: High-dimensional Learning Dynamics Workshop

High-dimensional Optimization in the Age of ChatGPT, Sanjeev Arora

Sanjeev Arora


Abstract: The first half of the talk surveys recent results in analyzing optimization methods in deep learning, specifically how different update methods affect the quality of solution produced. This study of "implicit bias of training" has led to some quantification of the effect of normalization, weight decay, learning rates in diverse architectures, as well as of the efficacy of local updates in a distributed environment ("Local SGD").

The second half of the talk surveys the nascent field of applying optimization ideas to large AI models. The nature and size of these models leads to new phenomena which motivate new research directions ---especially related to fine-tuning and in-context learning. Recent results highlight the power of a forward pass of today's large models. The first result shows that in context of fine-tuning, zero'th order optimization (doable with forward pass only) can be competitive with usual 1st order optimization. The second result shows that in-context learning (i.e., learning during a single forward pass) can be more powerful than hitherto believed, since it allows the LLM's forward pass to even fine-tune a fairly large "baby transformer" hidden inside.

Chat is not available.