ES-FoMo: Efficient Systems for Foundation Models

Workshop

ES-FoMo: Efficient Systems for Foundation Models

Julien Launay · Daniel Y Fu · Tri Dao · Daniel Hesslow · Beidi Chen · Azalia Mirhoseini · Percy Liang

Ballroom A

Sat 29 Jul, 11:55 a.m. PDT

[ Abstract ] Workshop Website

As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.

Machine learning practitioners are key stakeholders here: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models; on the other hand, novel research findings may be best demonstrated at scale—which may require training models as efficiently as possible to make the best use of available resources.

The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. We welcome submissions around training and inference systems/algorithms for foundation models, focusing on scaling-up or on reducing compute, time, memory, bandwidth, and energy requirements. Notably, we encourage submissions concerning the entire spectrum of foundation models: from BERT-sized Transformers, to large models with 100B+ parameters. Topics include but are not limited to:

* Training and inference systems, either distributed at large scale or in resource-constrained scenarios;
* Algorithms for improved training and inference efficiency;
* Systems for foundation models, such as novel programming languages or compilers.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 11:55 a.m. - 12:00 p.m.	🤗 Welcome and opening remarks ( Opening ) > SlidesLive Video	🔗
Sat 12:00 p.m. - 12:01 p.m.	🔥 Session I: Large-Scale Distributed Pretraining ( Invited Talks ) >	🔗
Sat 12:01 p.m. - 12:20 p.m.	Using Megatron to Train Large Language Models (Deepak Narayanan, Microsoft Research) ( Invited Talk ) > SlidesLive Video	🔗
Sat 12:20 p.m. - 12:40 p.m.	Distributed Systems for Decentralized AI (Ce Zhang, ETH/Together) ( Invited Talk ) > SlidesLive Video	🔗
Sat 12:40 p.m. - 1:00 p.m.	Training Large Language Models on Cerebras Wafer-Scale Clusters AI (Natalia Vassilieva, Cerebras) ( Invited Talk ) > SlidesLive Video	🔗
Sat 1:10 p.m. - 1:25 p.m.	☕️ Coffee break	🔗
Sat 1:25 p.m. - 1:40 p.m.	🎤 SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores ( Oral ) > link SlidesLive Video Link	Zhiyu Mei · Wei Fu · Guangju Wang · Huanchen Zhang · Yi Wu 🔗
Sat 1:40 p.m. - 1:55 p.m.	🎤 Fine-Tuning Language Models with Just Forward Passes ( Oral ) > link SlidesLive Video Link	Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora 🔗
Sat 1:55 p.m. - 1:56 p.m.	🚀 Session II: Efficient Inference ( Invited Talks ) >	🔗
Sat 1:56 p.m. - 2:25 p.m.	The Case for 4-bit Inference (Tim Dettmers, University of Washington) ( Invited Talk ) > SlidesLive Video	🔗
Sat 2:25 p.m. - 2:55 p.m.	Efficiently Scaling Transformer Inference (Aakanksha Chowdhery, Google Research) ( Invited Talk ) > SlidesLive Video	🔗
Sat 2:55 p.m. - 3:10 p.m.	🎤 Memory-Efficient Selective Fine-Tuning ( Oral ) > link SlidesLive Video Link	Antoine Simoulin · Namyong Park · Xiaoyi Liu · Grey Yang 🔗
Sat 3:10 p.m. - 4:00 p.m.	🍱 Lunch break	🔗
Sat 4:00 p.m. - 5:15 p.m.	🧑‍🎓 Poster Session ( Poster Session ) >	🔗
Sat 5:15 p.m. - 6:15 p.m.	💬 Panel: Large Language Models Tooling Across Industry and Academia ( Panel ) > SlidesLive Video	🔗
Sat 6:15 p.m. - 6:30 p.m.	☕️ Coffee break	🔗
Sat 6:30 p.m. - 6:45 p.m.	🎤 Fast Causal Attention with Dynamic Sparsity ( Oral ) > link SlidesLive Video Link	Daniele Paliotta · Matteo Pagliardini · Martin Jaggi · François Fleuret 🔗
Sat 6:45 p.m. - 6:46 p.m.	⚙️ Session III: Deep Optimisation ( Invited Talks ) >	🔗
Sat 6:46 p.m. - 7:15 p.m.	PyTorch 2.x: Faster, More Pythonic, and as Dynamic as Ever (Natalia Gimelshein, OpenAI) ( Invited Talk ) > SlidesLive Video	🔗
Sat 7:15 p.m. - 7:45 p.m.	High-Performance Kernel Programming with Triton (Philippe Tillet, OpenAI) ( Invited Talk ) > SlidesLive Video	🔗
Sat 7:45 p.m. - 8:00 p.m.	🏅 Best Paper Award ( Awards ) > SlidesLive Video	🔗
Sat 9:00 p.m. - 12:00 a.m.	🎉 Post-Workshop Happy Hour (sponsored by Together) ( Party ) > link Link	🔗
-	Mental Calibration: Discovering and Adjusting for Latent Factors Improves Zero-Shot Inference of CLIP ( Poster ) > link Link	Bang An · Sicheng Zhu · Michael-Andrei Panaitescu-Liess · Chaithanya Kumar Mummadi · Furong Huang 🔗
-	Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding ( Poster ) > link Link	Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee 🔗
-	Generating Efficient Kernels for Quantized Inference on Large Language Models ( Poster ) > link Link	Tommaso Pegolotti · Elias Frantar · Dan Alistarh · Markus Püschel 🔗
-	SpeedLimit: Neural Architecture Search for Quantized Transformer Models ( Poster ) > link Link	Luke Bailey · Yuji Chai · Yunho Jin · Glenn Ko · Matthew Karle 🔗
-	A Comprehensive Analysis of Adapter Efficiency ( Poster ) > link Link	Nandini Mundra · Sumanth Doddapaneni · Raj Dabre · Anoop Kunchukuttan · Ratish Puduppully · Mitesh Khapra 🔗
-	Less is More: Using Multiple LLMs for Applications with Lower Costs ( Poster ) > link Link	Lingjiao Chen · Matei Zaharia · James Zou 🔗
-	Blockwise Parallel Transformer for Long Context Large Models ( Poster ) > link Link	Hao Liu · Pieter Abbeel 🔗
-	SuperShaper: A Pre-Training Approach for Discovering Efficient Transformer Shapes ( Poster ) > link Link	Vinod Ganesan · Gowtham Ramesh · Pratyush Kumar · Raj Dabre 🔗
-	Continual Pre-Training of Large Language Models: How to re-warm your model? ( Poster ) > link Link	Kshitij Gupta · Benjamin Thérien · Adam Ibrahim · Mats Richter · Quentin Anthony · Eugene Belilovsky · Timothée Lesort · Irina Rish 🔗
-	Implementing block-sparse matrix multiplication kernels using Triton ( Poster ) > link Link	Priya Mishra · Trevor Gale · Matei Zaharia · Cliff Young · Deepak Narayanan 🔗
-	Looped Transformers are Better at Learning Learning Algorithms ( Poster ) > link Link	Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos 🔗
-	Accelerating LLM Inference with Staged Speculative Decoding ( Poster ) > link Link	Benjamin F Spector · Christopher Re 🔗
-	Test-Time Training for Speech ( Poster ) > link Link	Sri Harsha Dumpala · Chandramouli Shama Sastry · Sageev Oore 🔗
-	Towards Efficient World Models ( Poster ) > link Link	Eloi Alonso · Vincent Micheli · François Fleuret 🔗
-	The Framework Tax: Disparities Between Inference Efficiency in Research and Deployment ( Poster ) > link Link	Jared Fernandez · Jacob Kahn · Clara Na · Yonatan Bisk · Emma Strubell 🔗
-	Towards Structured Sparsity in Transformers for Efficient Inference ( Poster ) > link Link	Harry Dong · Beidi Chen · Yuejie Chi 🔗
-	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models ( Poster ) > link Link	12 presenters Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Re · Clark Barrett · Zhangyang “Atlas” Wang · Beidi Chen 🔗
-	Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs ( Poster ) > link Link	Or Sharir · Anima Anandkumar 🔗
-	Compositional Interfaces for Compositional Generalization ( Poster ) > link Link	Jelena Luketina · Jack Lanchantin · Sainbayar Sukhbaatar · Arthur Szlam 🔗
-	ZipLM: Inference-Aware Structured Pruning of Language Models ( Poster ) > link Link	Eldar Kurtic · Elias Frantar · Dan Alistarh 🔗
-	Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training ( Poster ) > link Link	Hong Liu · Zhiyuan Li · David Hall · Percy Liang · Tengyu Ma 🔗
-	RapidBERT: How to Train BERT with a Lunch Money Budget ( Poster ) > link Link	Alexander Trott · Jacob Portes · Sam Havens · DANIEL KING · Abhinav Venigalla · Moin Nadeem · Nikhil Sardana · Daya Khudia · Jonathan Frankle 🔗
-	UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model ( Poster ) > link Link	Youngjo Min · Kwangrok Ryoo · Bumsoo Kim · Taesup Kim 🔗
-	Cramming: Training a Language Model on a single GPU in one day ( Poster ) > link Link	Jonas Geiping · Tom Goldstein 🔗
-	SpecTr: Fast Speculative Decoding via Optimal Transport ( Poster ) > link Link	Ziteng Sun · Ananda Suresh · Jae Ro · Ahmad Beirami · Himanshu Jain · Felix Xinnan Yu · Michael Riley · Sanjiv Kumar 🔗
-	Landmark Attention: Random-Access Infinite Context Length for Transformers ( Poster ) > link Link	Amirkeivan Mohtashami · Martin Jaggi 🔗
-	Dissecting Efficient Architectures for Wake-Word Detection ( Poster ) > link Link	Cody Berger · Juncheng Li · Yiyuan Li · Aaron Berger · Dmitri Berger · Karthik Ganesan · Emma Strubell · Florian Metze 🔗
-	MRMP: Multi-Rate Magnitude Pruning of Graph Convolutional Networks ( Poster ) > link Link	Hichem Sahbi 🔗
-	Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime ( Poster ) > link Link	Don Kurian Dennis · Abhishek Shetty · Anish Sevekari · Kazuhito Koishida · Virginia Smith 🔗
-	Language Models are Weak Learners ( Poster ) > link Link	Hariharan Manikandan · Yiding Jiang · Zico Kolter 🔗
-	On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets ( Poster ) > link Link	Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel 🔗
-	Training Diffusion Models with Reinforcement Learning ( Poster ) > link Link	Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine 🔗
-	GPT-Zip: Deep Compression of Finetuned Large Language Models ( Poster ) > link Link	Berivan Isik · Hermann Kumbong · Wanyi Ning · Xiaozhe Yao · Sanmi Koyejo · Ce Zhang 🔗
-	Reverse Distillation: Training Billion Parameter Models For CTR Prediction ( Poster ) > link Link	Aditya Anantharaman · Aashiq Muhamed · Hemant Pugaliya · Chong Wang · Sujan Perera · Zhen Ge · qingjun cui · Belinda Zeng · Trishul Chilimbi 🔗
-	A Simple and Effective Pruning Approach for Large Language Models ( Poster ) > link Link	Mingjie Sun · Zhuang Liu · Anna Bair · Zico Kolter 🔗
-	Incremental Low-Rank Learning ( Poster ) > link Link	Jiawei Zhao · Yifei Zhang · Beidi Chen · Florian Schaefer · Anima Anandkumar 🔗
-	Deep Fusion: Efficient Network Training via Pre-trained Initializations ( Poster ) > link Link	Hanna Mazzawi · Xavi Gonzalvo · Michael Wunder 🔗
-	ROSA: Random Orthogonal Subspace Adaptation ( Poster ) > link Link	Marawan Gamal · Guillaume Rabusseau 🔗
-	Towards Fair Knowledge Distillation using Student Feedback ( Poster ) > link Link	Abhinav Java · Surgan Jandial · Chirag Agarwal 🔗
-	Audio-Journey: Efficient Visual+LLM-aided Audio Encodec Diffusion ( Poster ) > link Link	Juncheng Li · Jackson Michaels · Laura Yao · Lijun Yu · Zach Wood-Doughty · Florian Metze 🔗
-	Semi-supervised Tabular Classification via In-context Learning of Large Language Models ( Poster ) > link Link	Jaehyun Nam · Woomin Song · Seong Hyeon Park · Jihoon Tack · Sukmin Yun · Jaehyung Kim · Jinwoo Shin 🔗
-	BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation ( Poster ) > link Link	Bo-Kyeong Kim · Hyoung-Kyu Song · Thibault Castells · Shinkook Choi 🔗
-	Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection ( Poster ) > link Link	Yu Bai · Fan Chen · Huan Wang · Caiming Xiong · Song Mei 🔗
-	Can Public Large Language Models Help Private Cross-device Federated Learning? ( Poster ) > link Link	Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer 🔗
-	Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths ( Poster ) > link Link	Xinyi Wang · William Wang 🔗
-	Learned Thresholds Token Merging and Pruning for Vision Transformers ( Poster ) > link Link	Maxim Bonnaerens · Joni Dambre 🔗
-	Three Towers: Flexible Contrastive Learning with Pretrained Image Models ( Poster ) > link Link	Jannik Kossen · Mark Collier · Basil Mustafa · Xiao Wang · Xiaohua Zhai · Lucas Beyer · Andreas Steiner · Jesse Berent · Rodolphe Jenatton · Efi Kokiopoulou 🔗
-	ViT Graph Head Attention for Small Sized Datasets ( Poster ) > link Link	HyeongJin Kim · GyungHyun Lee · Byoung Chul Ko 🔗
-	A Closer Look at In-Context Learning under Distribution Shifts ( Poster ) > link Link	Kartik Ahuja · David Lopez-Paz 🔗
-	Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning ( Poster ) > link Link	Xinyi Wang · Wanrong Zhu · Michael Saxon · Mark Steyvers · William Wang 🔗
-	Constant Memory Attention Block ( Poster ) > link Link	Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Yoshua Bengio · Mohamed Osama Ahmed 🔗
-	Sequence Parallelism: Long Sequence Training from System Perspective ( Poster ) > link Link	Shenggui Li · Fuzhao Xue · Chaitanya Baranwal · Yongbin Li · Yang You 🔗
-	Scaling In-Context Demonstrations with Structured Attention ( Poster ) > link Link	Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen 🔗