Skip to yearly menu bar Skip to main content


Microsoft

Expo Talk Panel

CANCELED: Toward Stateless Training of LLMs: Breaking Memory Barriers Without Sacrificing Performance

Chao Ma

West Ballroom B
[ ]
Sun 13 Jul 5 p.m. PDT — 6 p.m. PDT

Abstract:

Training foundation models like LLMs demands immense computational resources, where memory constraints are a critical bottleneck. This challenge stems from optimization—the backbone of LLM training—which requires navigating non-convex, high-dimensional loss landscapes via noisy gradients, a task where adaptive optimizers like Adam excel. However, Adam requires tracking internal states that consumes memory that is 2 times larger than the model itself—a luxury many researchers or organizations cannot afford. Can we achieve state-of-the-art LLM optimization without this memory tax? In this talk, we demonstrate how combining insights from optimization and large model dynamics helps address this challenge. We will introduce a class of stateless optimization algorithms that eliminates the need for storing optimizer states, while achieving strong performance for LLMs pertaining.

Chat is not available.