Expo Talk Panel
CANCELED: Toward Stateless Training of LLMs: Breaking Memory Barriers Without Sacrificing Performance
Chao Ma
West Ballroom B
Training foundation models like LLMs demands immense computational resources, where memory constraints are a critical bottleneck. This challenge stems from optimization—the backbone of LLM training—which requires navigating non-convex, high-dimensional loss landscapes via noisy gradients, a task where adaptive optimizers like Adam excel. However, Adam requires tracking internal states that consumes memory that is 2 times larger than the model itself—a luxury many researchers or organizations cannot afford. Can we achieve state-of-the-art LLM optimization without this memory tax? In this talk, we demonstrate how combining insights from optimization and large model dynamics helps address this challenge. We will introduce a class of stateless optimization algorithms that eliminates the need for storing optimizer states, while achieving strong performance for LLMs pertaining.