Workshop
ES-FoMo: Efficient Systems for Foundation Models
Julien Launay · Daniel Y Fu · Tri Dao · Daniel Hesslow · Beidi Chen · Azalia Mirhoseini · Percy Liang
Ballroom A
Sat 29 Jul, 11:55 a.m. PDT
As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.
Machine learning practitioners are key stakeholders here: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models; on the other hand, novel research findings may be best demonstrated at scale—which may require training models as efficiently as possible to make the best use of available resources.
The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. We welcome submissions around training and inference systems/algorithms for foundation models, focusing on scaling-up or on reducing compute, time, memory, bandwidth, and energy requirements. Notably, we encourage submissions concerning the entire spectrum of foundation models: from BERT-sized Transformers, to large models with 100B+ parameters. Topics include but are not limited to:
* Training and inference systems, either distributed at large scale or in resource-constrained scenarios;
* Algorithms for improved training and inference efficiency;
* Systems for foundation models, such as novel programming languages or compilers.
Schedule
Sat 11:55 a.m. - 12:00 p.m.
|
🤗 Welcome and opening remarks
(
Opening
)
>
SlidesLive Video |
🔗 |
Sat 12:00 p.m. - 12:01 p.m.
|
🔥 Session I: Large-Scale Distributed Pretraining
(
Invited Talks
)
>
|
🔗 |
Sat 12:01 p.m. - 12:20 p.m.
|
Using Megatron to Train Large Language Models (Deepak Narayanan, Microsoft Research)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:20 p.m. - 12:40 p.m.
|
Distributed Systems for Decentralized AI (Ce Zhang, ETH/Together)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:40 p.m. - 1:00 p.m.
|
Training Large Language Models on Cerebras Wafer-Scale Clusters AI (Natalia Vassilieva, Cerebras)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:10 p.m. - 1:25 p.m.
|
☕️ Coffee break
|
🔗 |
Sat 1:25 p.m. - 1:40 p.m.
|
🎤 SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
(
Oral
)
>
link
SlidesLive Video |
Zhiyu Mei · Wei Fu · Guangju Wang · Huanchen Zhang · Yi Wu 🔗 |
Sat 1:40 p.m. - 1:55 p.m.
|
🎤 Fine-Tuning Language Models with Just Forward Passes
(
Oral
)
>
link
SlidesLive Video |
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora 🔗 |
Sat 1:55 p.m. - 1:56 p.m.
|
🚀 Session II: Efficient Inference
(
Invited Talks
)
>
|
🔗 |
Sat 1:56 p.m. - 2:25 p.m.
|
The Case for 4-bit Inference (Tim Dettmers, University of Washington)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:25 p.m. - 2:55 p.m.
|
Efficiently Scaling Transformer Inference (Aakanksha Chowdhery, Google Research)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:55 p.m. - 3:10 p.m.
|
🎤 Memory-Efficient Selective Fine-Tuning
(
Oral
)
>
link
SlidesLive Video |
Antoine Simoulin · Namyong Park · Xiaoyi Liu · Grey Yang 🔗 |
Sat 3:10 p.m. - 4:00 p.m.
|
🍱 Lunch break
|
🔗 |
Sat 4:00 p.m. - 5:15 p.m.
|
🧑🎓 Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 5:15 p.m. - 6:15 p.m.
|
💬 Panel: Large Language Models Tooling Across Industry and Academia
(
Panel
)
>
SlidesLive Video |
🔗 |
Sat 6:15 p.m. - 6:30 p.m.
|
☕️ Coffee break
|
🔗 |
Sat 6:30 p.m. - 6:45 p.m.
|
🎤 Fast Causal Attention with Dynamic Sparsity
(
Oral
)
>
link
SlidesLive Video |
Daniele Paliotta · Matteo Pagliardini · Martin Jaggi · François Fleuret 🔗 |
Sat 6:45 p.m. - 6:46 p.m.
|
⚙️ Session III: Deep Optimisation
(
Invited Talks
)
>
|
🔗 |
Sat 6:46 p.m. - 7:15 p.m.
|
PyTorch 2.x: Faster, More Pythonic, and as Dynamic as Ever (Natalia Gimelshein, OpenAI)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:15 p.m. - 7:45 p.m.
|
High-Performance Kernel Programming with Triton (Philippe Tillet, OpenAI)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:45 p.m. - 8:00 p.m.
|
🏅 Best Paper Award
(
Awards
)
>
SlidesLive Video |
🔗 |
Sat 9:00 p.m. - 12:00 a.m.
|
🎉 Post-Workshop Happy Hour (sponsored by Together) ( Party ) > link | 🔗 |
-
|
Mental Calibration: Discovering and Adjusting for Latent Factors Improves Zero-Shot Inference of CLIP ( Poster ) > link | Bang An · Sicheng Zhu · Michael-Andrei Panaitescu-Liess · Chaithanya Kumar Mummadi · Furong Huang 🔗 |
-
|
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding ( Poster ) > link | Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee 🔗 |
-
|
Generating Efficient Kernels for Quantized Inference on Large Language Models ( Poster ) > link | Tommaso Pegolotti · Elias Frantar · Dan Alistarh · Markus Püschel 🔗 |
-
|
SpeedLimit: Neural Architecture Search for Quantized Transformer Models ( Poster ) > link | Luke Bailey · Yuji Chai · Yunho Jin · Glenn Ko · Matthew Karle 🔗 |
-
|
A Comprehensive Analysis of Adapter Efficiency ( Poster ) > link | Nandini Mundra · Sumanth Doddapaneni · Raj Dabre · Anoop Kunchukuttan · Ratish Puduppully · Mitesh Khapra 🔗 |
-
|
Less is More: Using Multiple LLMs for Applications with Lower Costs ( Poster ) > link | Lingjiao Chen · Matei Zaharia · James Zou 🔗 |
-
|
Blockwise Parallel Transformer for Long Context Large Models ( Poster ) > link | Hao Liu · Pieter Abbeel 🔗 |
-
|
SuperShaper: A Pre-Training Approach for Discovering Efficient Transformer Shapes ( Poster ) > link | Vinod Ganesan · Gowtham Ramesh · Pratyush Kumar · Raj Dabre 🔗 |
-
|
Continual Pre-Training of Large Language Models: How to re-warm your model? ( Poster ) > link | Kshitij Gupta · Benjamin Thérien · Adam Ibrahim · Mats Richter · Quentin Anthony · Eugene Belilovsky · Timothée Lesort · Irina Rish 🔗 |
-
|
Implementing block-sparse matrix multiplication kernels using Triton ( Poster ) > link | Priya Mishra · Trevor Gale · Matei Zaharia · Cliff Young · Deepak Narayanan 🔗 |
-
|
Looped Transformers are Better at Learning Learning Algorithms ( Poster ) > link | Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos 🔗 |
-
|
Accelerating LLM Inference with Staged Speculative Decoding ( Poster ) > link | Benjamin F Spector · Christopher Re 🔗 |
-
|
Test-Time Training for Speech ( Poster ) > link | Sri Harsha Dumpala · Chandramouli Shama Sastry · Sageev Oore 🔗 |
-
|
Towards Efficient World Models ( Poster ) > link | Eloi Alonso · Vincent Micheli · François Fleuret 🔗 |
-
|
The Framework Tax: Disparities Between Inference Efficiency in Research and Deployment ( Poster ) > link | Jared Fernandez · Jacob Kahn · Clara Na · Yonatan Bisk · Emma Strubell 🔗 |
-
|
Towards Structured Sparsity in Transformers for Efficient Inference ( Poster ) > link | Harry Dong · Beidi Chen · Yuejie Chi 🔗 |
-
|
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models ( Poster ) > link |
12 presentersZhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Re · Clark Barrett · Zhangyang “Atlas” Wang · Beidi Chen |
-
|
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs ( Poster ) > link | Or Sharir · Anima Anandkumar 🔗 |
-
|
Compositional Interfaces for Compositional Generalization ( Poster ) > link | Jelena Luketina · Jack Lanchantin · Sainbayar Sukhbaatar · Arthur Szlam 🔗 |
-
|
ZipLM: Inference-Aware Structured Pruning of Language Models ( Poster ) > link | Eldar Kurtic · Elias Frantar · Dan Alistarh 🔗 |
-
|
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training ( Poster ) > link | Hong Liu · Zhiyuan Li · David Hall · Percy Liang · Tengyu Ma 🔗 |
-
|
RapidBERT: How to Train BERT with a Lunch Money Budget ( Poster ) > link | Alexander Trott · Jacob Portes · Sam Havens · DANIEL KING · Abhinav Venigalla · Moin Nadeem · Nikhil Sardana · Daya Khudia · Jonathan Frankle 🔗 |
-
|
UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model ( Poster ) > link | Youngjo Min · Kwangrok Ryoo · Bumsoo Kim · Taesup Kim 🔗 |
-
|
Cramming: Training a Language Model on a single GPU in one day ( Poster ) > link | Jonas Geiping · Tom Goldstein 🔗 |
-
|
SpecTr: Fast Speculative Decoding via Optimal Transport ( Poster ) > link | Ziteng Sun · Ananda Suresh · Jae Ro · Ahmad Beirami · Himanshu Jain · Felix Xinnan Yu · Michael Riley · Sanjiv Kumar 🔗 |
-
|
Landmark Attention: Random-Access Infinite Context Length for Transformers ( Poster ) > link | Amirkeivan Mohtashami · Martin Jaggi 🔗 |
-
|
Dissecting Efficient Architectures for Wake-Word Detection ( Poster ) > link | Cody Berger · Juncheng Li · Yiyuan Li · Aaron Berger · Dmitri Berger · Karthik Ganesan · Emma Strubell · Florian Metze 🔗 |
-
|
MRMP: Multi-Rate Magnitude Pruning of Graph Convolutional Networks ( Poster ) > link | Hichem Sahbi 🔗 |
-
|
Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime ( Poster ) > link | Don Kurian Dennis · Abhishek Shetty · Anish Sevekari · Kazuhito Koishida · Virginia Smith 🔗 |
-
|
Language Models are Weak Learners ( Poster ) > link | Hariharan Manikandan · Yiding Jiang · Zico Kolter 🔗 |
-
|
On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets ( Poster ) > link | Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel 🔗 |
-
|
Training Diffusion Models with Reinforcement Learning ( Poster ) > link | Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine 🔗 |
-
|
GPT-Zip: Deep Compression of Finetuned Large Language Models ( Poster ) > link | Berivan Isik · Hermann Kumbong · Wanyi Ning · Xiaozhe Yao · Sanmi Koyejo · Ce Zhang 🔗 |
-
|
Reverse Distillation: Training Billion Parameter Models For CTR Prediction ( Poster ) > link | Aditya Anantharaman · Aashiq Muhamed · Hemant Pugaliya · Chong Wang · Sujan Perera · Zhen Ge · qingjun cui · Belinda Zeng · Trishul Chilimbi 🔗 |
-
|
A Simple and Effective Pruning Approach for Large Language Models ( Poster ) > link | Mingjie Sun · Zhuang Liu · Anna Bair · Zico Kolter 🔗 |
-
|
Incremental Low-Rank Learning ( Poster ) > link | Jiawei Zhao · Yifei Zhang · Beidi Chen · Florian Schaefer · Anima Anandkumar 🔗 |
-
|
Deep Fusion: Efficient Network Training via Pre-trained Initializations ( Poster ) > link | Hanna Mazzawi · Xavi Gonzalvo · Michael Wunder 🔗 |
-
|
ROSA: Random Orthogonal Subspace Adaptation ( Poster ) > link | Marawan Gamal · Guillaume Rabusseau 🔗 |
-
|
Towards Fair Knowledge Distillation using Student Feedback ( Poster ) > link | Abhinav Java · Surgan Jandial · Chirag Agarwal 🔗 |
-
|
Audio-Journey: Efficient Visual+LLM-aided Audio Encodec Diffusion ( Poster ) > link | Juncheng Li · Jackson Michaels · Laura Yao · Lijun Yu · Zach Wood-Doughty · Florian Metze 🔗 |
-
|
Semi-supervised Tabular Classification via In-context Learning of Large Language Models ( Poster ) > link | Jaehyun Nam · Woomin Song · Seong Hyeon Park · Jihoon Tack · Sukmin Yun · Jaehyung Kim · Jinwoo Shin 🔗 |
-
|
BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation ( Poster ) > link | Bo-Kyeong Kim · Hyoung-Kyu Song · Thibault Castells · Shinkook Choi 🔗 |
-
|
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection ( Poster ) > link | Yu Bai · Fan Chen · Huan Wang · Caiming Xiong · Song Mei 🔗 |
-
|
Can Public Large Language Models Help Private Cross-device Federated Learning? ( Poster ) > link | Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer 🔗 |
-
|
Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths ( Poster ) > link | Xinyi Wang · William Wang 🔗 |
-
|
Learned Thresholds Token Merging and Pruning for Vision Transformers ( Poster ) > link | Maxim Bonnaerens · Joni Dambre 🔗 |
-
|
Three Towers: Flexible Contrastive Learning with Pretrained Image Models ( Poster ) > link | Jannik Kossen · Mark Collier · Basil Mustafa · Xiao Wang · Xiaohua Zhai · Lucas Beyer · Andreas Steiner · Jesse Berent · Rodolphe Jenatton · Efi Kokiopoulou 🔗 |
-
|
ViT Graph Head Attention for Small Sized Datasets ( Poster ) > link | HyeongJin Kim · GyungHyun Lee · Byoung Chul Ko 🔗 |
-
|
A Closer Look at In-Context Learning under Distribution Shifts ( Poster ) > link | Kartik Ahuja · David Lopez-Paz 🔗 |
-
|
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning ( Poster ) > link | Xinyi Wang · Wanrong Zhu · Michael Saxon · Mark Steyvers · William Wang 🔗 |
-
|
Constant Memory Attention Block ( Poster ) > link | Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Yoshua Bengio · Mohamed Osama Ahmed 🔗 |
-
|
Sequence Parallelism: Long Sequence Training from System Perspective ( Poster ) > link | Shenggui Li · Fuzhao Xue · Chaitanya Baranwal · Yongbin Li · Yang You 🔗 |
-
|
Scaling In-Context Demonstrations with Structured Attention ( Poster ) > link | Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen 🔗 |