Workshop
2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
Julia Gusak · Jean Kossaifi · Alena Shilova · Rocco Sedona · Jan Kautz
Join HPC and AI experts to learn how to train neural networks at an unprecedented scale with your existing infrastructure
Chat is not available.
Timezone: America/Los_Angeles
Schedule
|
Sat 12:00 a.m. - 12:00 a.m.
|
Coffee & Poster placement
|
🔗 |
|
Sat 12:00 a.m. - 12:10 a.m.
|
Welcome speech from organizers
(
Talk
)
>
SlidesLive Video |
Julia Gusak 🔗 |
|
Sat 12:10 a.m. - 12:40 a.m.
|
Online Training from Numerical Simulations
(
Invited Talk
)
>
SlidesLive Video |
Bruno Raffin 🔗 |
|
Sat 12:40 a.m. - 1:10 a.m.
|
Making device-agnostic ML training and inference easy at scale
(
Invited Talk
)
>
SlidesLive Video |
Zach Mueller 🔗 |
|
Sat 1:10 a.m. - 1:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 1:30 a.m. - 2:30 a.m.
|
Poster session
(
Poster session
)
>
|
🔗 |
|
Sat 2:30 a.m. - 3:00 a.m.
|
Enabling extremely fast inference and training performance using dataflow and custom chip
(
Invited Talk
)
>
SlidesLive Video |
Urmish Thakker 🔗 |
|
Sat 3:00 a.m. - 3:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 3:30 a.m. - 4:30 a.m.
|
Lunch
|
🔗 |
|
Sat 4:30 a.m. - 5:00 a.m.
|
Poster session
(
Poster session
)
>
|
🔗 |
|
Sat 5:00 a.m. - 5:30 a.m.
|
Structured matrices for memory-efficient training and finetuning
(
Invited Talk
)
>
SlidesLive Video |
Beidi Chen 🔗 |
|
Sat 5:30 a.m. - 6:00 a.m.
|
Architecting and deploying compute clusters for large language models
(
Invited Talk
)
>
SlidesLive Video |
Adam DeConinck 🔗 |
|
Sat 6:00 a.m. - 6:20 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 6:20 a.m. - 6:30 a.m.
|
Best Paper Awards
(
Talk
)
>
SlidesLive Video |
Jean Kossaifi 🔗 |
|
Sat 6:30 a.m. - 7:00 a.m.
|
Coffee & Poster session
(
Poster session
)
>
|
🔗 |
|
Sat 7:00 a.m. - 7:50 a.m.
|
Panel discussion
(
Panel
)
>
SlidesLive Video |
Adam DeConinck · Zach Mueller · Bruno Raffin · Max Ryabinin · Julia Gusak 🔗 |
|
Sat 7:50 a.m. - 8:00 a.m.
|
Closing remarks
(
Talk
)
>
SlidesLive Video |
🔗 |
|
-
|
Optimistic Asynchrony Control: Achieving Synchronous Convergence With Asynchronous Throughput for Embedding Model Training ( Poster ) > link | Roger Waleffe 🔗 |
|
-
|
An Analytical Approach to Enhancing DNN Efficiency and Accuracy Using Approximate Multiplication ( Poster ) > link | Salar Shakibhamedan · Anice Jahanjoo · Amin Aminifar · Nima Amirafshar · Nima TaheriNejad · Axel Jantsch 🔗 |
|
-
|
Memory and Bandwidth are All Your Need for Fully Sharded Data Parallel ( Poster ) > link | J. Wang · Jan Ebert · Oleg Filatov · Stefan Kesselheim 🔗 |
|
-
|
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity ( Poster ) > link |
12 presentersWentao Guo · Jikai Long · YIMENG ZENG · Zirui Liu · Xinyu Yang · Yide Ran · Jacob Gardner · Osbert Bastani · Chris De Sa · Xiaodong Yu · Beidi Chen · Zhaozhuo Xu |
|
-
|
DiLoCo: Distributed Low-Communication Training of Language Models ( Poster ) > link | Arthur Douillard · Qixuan Feng · Andrei Rusu · Rachita Chhaparia · Yani Donchev · Adhiguna Kuncoro · Marc'Aurelio Ranzato · Arthur Szlam · Jiajun Shen 🔗 |
|
-
|
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates ( Poster ) > link | Cristian Meo · Ksenia Sycheva · Anirudh Goyal · Justin Dauwels 🔗 |
|
-
|
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones ( Poster ) > link | Zhengqing Yuan · Zhaoxu Li · Weiran Huang · Yanfang Ye · Lichao Sun 🔗 |
|
-
|
Multi-objective Differentiable Neural Architecture Search ( Poster ) > link | Rhea Sukthanker · Arber Zela · Benedikt Staffler · Samuel Dooley · Josif Grabocka · Frank Hutter 🔗 |
|
-
|
Resource-constrained Neural Architecture Search on Language Models: A Case Study ( Poster ) > link | Andreas Paraskeva · Joao Reis · Suzan Verberne · Jan Rijn 🔗 |
|
-
|
SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models ( Poster ) > link | Zhaoxu Luo · Bowen Song · Liyue Shen 🔗 |
|
-
|
Accelerating Best-of-N via Speculative Rejection ( Poster ) > link | Ruiqi Zhang · Momin Haider · Ming Yin · Jiahao Qiu · Mengdi Wang · Peter Bartlett · Andrea Zanette 🔗 |
|
-
|
Adaptive Model Pruning in Federated Learning through Loss Exploration ( Poster ) > link | Christian Internò · Elena Raponi · Niki van Stein · Thomas Bäck · Markus Olhofer · Yaochu Jin · CITEC Barbara Hammer 🔗 |
|
-
|
Single Train Multi Deploy on Topology Search Spaces using Kshot-Hypernet ( Poster ) > link | Jingyue Zhuge · Christian Mayr · Anand Subramoney · David Kappel 🔗 |
|
-
|
Enhancing Fine-grained Multi-modal Alignment via Adapters: A Parameter-Efficient Training Framework for Referring Image Segmentation ( Poster ) > link | Zunnan Xu · Jiaqi Huang · Ting Liu · Yong Liu · Haonan Han · Kehong Yuan · Xiu Li 🔗 |
|
-
|
Liouna: Biologically Plausible Learning for Efficient Pre-Training of Transferrable Deep Models ( Poster ) > link | Fady Rezk · Antreas Antoniou · Henry Gouk · Timothy Hospedales 🔗 |
|
-
|
Boolean Logic for Low-Energy Deep Learning ( Poster ) > link | Van Minh NGUYEN · Cristian Ocampo · Aymen Askri · Ba-Hien Tran 🔗 |
|
-
|
Class-aware Initialization of Early Exits for Pre-training Large Language Models ( Poster ) > link | Alperen Gormez · Erdem Koyuncu 🔗 |
|
-
|
Communication Efficient Federated Learning with Differentiated Aggregation ( Poster ) > link | Peyman Gholami · Hulya Seferoglu 🔗 |
|
-
|
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough ( Poster ) > link | Konstantin Dobler · Gerard de Melo 🔗 |
|
-
|
MoReDrop: Dropout without Dropping ( Poster ) > link | Li Jiang · Duo Li · Yichuan Ding · Xue Liu · Victor Chan 🔗 |
|
-
|
ECO: Efficient Computational Optimization for Exact Machine Unlearning in Deep Neural Networks ( Poster ) > link | Yu-Ting Huang · Pei-Yuan Wu · Chuan-Ju Wang 🔗 |
|
-
|
DrJAX: Scalable and Differentiable MapReduce Primitives in JAX ( Poster ) > link | J K Rush · Zachary Charles · Zachary Garrett · Sean Augenstein · Nicole Mitchell 🔗 |
|
-
|
Variational Stochastic Gradient Descent for Deep Neural Networks ( Poster ) > link | Haotian Chen · Anna Kuzina · Babak Esmaeili · Jakub Tomczak 🔗 |
|
-
|
Coarse-to-Fine Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition ( Poster ) > link | Hichem Sahbi 🔗 |
|
-
|
Towards Efficient and Scalable Training of Differentially Private Deep Learning ( Poster ) > link | Sebastian Rodriguez Beltran · Marlon Tobaben · Niki Loppi · Antti Honkela 🔗 |
|
-
|
Lowering PyTorch's Memory Consumption for Selective Differentiation ( Poster ) > link | Samarth Bhatia · Felix Dangel 🔗 |
|
-
|
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies ( Poster ) > link | Brian Bartoldson · James Diffenderfer · Konstantinos Parasyris · Bhavya Kailkhura 🔗 |
|
-
|
DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity ( Poster ) > link | Baekrok Shin · Junsoo Oh · Hanseul Cho · Chulhee Yun 🔗 |
|
-
|
Efficient Adaptive Federated Optimization ( Poster ) > link | Su Hyeong Lee · Sidharth Sharma · Manzil Zaheer · Tian Li 🔗 |
|
-
|
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives ( Poster ) > link | Huanrui Yang · Yafeng Huang · Zhen Dong · Denis Gudovskiy · Tomoyuki Okuno · Yohei Nakata · Yuan Du · EECS Kurt Keutzer · Shanghang Zhang 🔗 |
|
-
|
Efficient Document Ranking with Learnable Late Interactions ( Poster ) > link | Himanshu Jain · Ziwei Ji · Ankit Singh Rawat · Andreas Veit · Sadeep Jayasumana · Sashank J. Reddi · Aditya Menon · Felix Xinnan Yu 🔗 |
|
-
|
Effective Layer Pruning Through Similarity Metric Perspective ( Poster ) > link | Ian Pons · Bruno L. Yamamoto · Anna Reali · Artur Jordao Lima Correia 🔗 |
|
-
|
Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance ( Poster ) > link | Mika Jain · Stefanie Jegelka · Ishani Karmarkar · Luana Ruiz · Ellen Vitercik 🔗 |
|
-
|
Scalify: scale propagation for efficient low-precision LLM training ( Poster ) > link | Paul Balanca · Sam Hosegood · Carlo Luschi · Andrew Fitzgibbon 🔗 |
|
-
|
u-μP: The Unit-Scaled Maximal Update Parametrization ( Poster ) > link | Charlie Blake · Constantin Eichenberg · Josef Dean · Lukas Balles · Luke Prince · Björn Deiseroth · Andres Felipe Cruz Salinas · Carlo Luschi · Samuel Weinbach · Douglas Orr 🔗 |
|
-
|
Resolving Discrepancies in Compute-Optimal Scaling of Language Models ( Contributed Talk & Poster ) > link | Tomer Porian · Mitchell Wortsman · Jenia Jitsev · Ludwig Schmidt · Yair Carmon 🔗 |
|
-
|
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs ( Contributed Talk & Poster ) > link | Ashwinee Panda · Berivan Isik · Xiangyu Qi · Sanmi Koyejo · Tsachy Weissman · Prateek Mittal 🔗 |
|
-
|
Asynchronous Local-SGD Training for Language Modeling ( Contributed Talk & Poster ) > link | Bo Liu · Rachita Chhaparia · Arthur Douillard · Satyen Kale · Andrei Rusu · Jiajun Shen · Arthur Szlam · Marc'Aurelio Ranzato 🔗 |
|
-
|
AdaMeM: Memory Efficient Momentum for Adafactor ( Contributed Talk & Poster ) > link | Nikhil Vyas · Depen Morwani · Sham Kakade 🔗 |
|
-
|
Can LLMs Enhance Performance Prediction for Deep Learning Models? ( Contributed Talk & Poster ) > link | Karthick Panner Selvam · Phitchaya Phothilimthana · Sami Abu-El-Haija · Bryan Perozzi · Mats Brorsson 🔗 |
|
-
|
LoQT: Low Rank Adapters for Quantized Training ( Contributed Talk & Poster ) > link | Sebastian Loeschcke · Mads Toftrup · Michael Kastoryano · Serge Belongie · Vésteinn Snæbjarnarson 🔗 |
|
-
|
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors ( Contributed Talk & Poster ) > link | Vijay Lingam · Atula Tejaswi · Aditya Vavre · Aneesh Shetty · Gautham Krishna Gudur · Joydeep Ghosh · Eunsol Choi · Alexandros Dimakis · Aleksandar Bojchevski · Sujay Sanghavi 🔗 |