Workshop Sat, Jul 19, 2025 • 8:30 AM – 5:10 PM PDT

ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models

Tri Dao · Daniel Y Fu · Max Ryabinin · Daniel Hesslow · Simran Arora · Songlin Yang · Songlin Yang · Dan Biderman · Beidi Chen · Azalia Mirhoseini · Percy Liang

[ OpenReview]

Abstract

As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities, unlocking new AI applications. These new capabilities have led to a paradigm shift: large foundation models have become predominant in natural language processing and are growing increasingly common in computer vision, audio processing and even robotics. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.

In response to these challenges, diverse research directions have spawned promising works: (1) training and inference either at large scale or in resource-constrained scenarios (e.g., with higher network latency and lower bandwidth, in a collaborative manner across a fleet of contributed devices, or with a single GPU); (2) large-scale distributed training approaches, such as 3D parallelism and sharding; and (3) deep system optimizations, with custom languages such as TVM and Triton. These novel interdisciplinary research directions directly shape and impact the trajectory of research across machine learning.

Accordingly, these emerging lines of research are increasingly relevant to machine learning researchers. Indeed, researchers are key stakeholders: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models (e.g., recent award-winning papers at ICML and NeurIPS); on the other hand, novel research findings may be best demonstrated at scale --- which may require training models as efficiently as possible to make the best use of available resources.

The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. This would be the third installment of the ES-FoMo workshop at ICML. This year, we are bringing further focus on two trends observed in 2024 and early 2025: (1) test-time compute, popularized by OpenAI o1 and DeepSeek r1, and (2) the mergence of new modeling paradigms and modalities such as real-time video and decentralized training. We look forward to continuing to grow this community at ICML 2025.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:30 AM

Opening Remarks

Video

8:40 AM

Hagay Lupesko: Zero to 50 ExaFLOPS in under a year - lessons from the trenches

Hagay Lupesko

Video

9:10 AM

Wanchao Liang: TorchTitan

Video

9:40 AM

Break

10:00 AM

Baris Kasikci: The Quest For Blazingly Fast LLM Serving

10:00 AM

Baris Kasikci

Video

10:30 AM

FPTQuant: Function-Preserving Transforms for LLM Quantization

Boris van Breugel · Yelysei Bondarenko · Paul Whatmough · Markus Nagel

Video