Expo Talk Panel
CANCELED: Toward Stateless Training of LLMs: Breaking Memory Barriers Without Sacrificing Performance
Chao Ma
West Ballroom B
Training foundation models like LLMs demands immense computational resources, where memory constraints are a critical bottleneck. This challenge stems from optimization—the backbone of LLM training—which requires navigating non-convex, high-dimensional loss landscapes via noisy gradients, a task where adaptive optimizers like Adam excel. However, Adam requires tracking internal states that consumes memory that is 2 times larger than the model itself—a luxury many researchers or organizations cannot afford. Can we achieve state-of-the-art LLM optimization without this memory tax? In this talk, we demonstrate how combining insights from optimization and large model dynamics helps address this challenge. We will introduce a class of stateless optimization algorithms that eliminates the need for storing optimizer states, while achieving strong performance for LLMs pertaining.
Live content is unavailable. Log in and register to view live content