Workshop
2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
Julia Gusak · Jean Kossaifi · Alena Shilova · Rocco Sedona · Jan Kautz
Hall A1
Sat 27 Jul, midnight PDT
Join HPC and AI experts to learn how to train neural networks at an unprecedented scale with your existing infrastructure
Chat is not available.
Timezone: America/Los_Angeles
Schedule
Sat 12:00 a.m. - 12:00 a.m.
|
Coffee & Poster placement
|
🔗 |
Sat 12:00 a.m. - 12:10 a.m.
|
Welcome speech from organizers
(
Talk
)
>
SlidesLive Video |
Julia Gusak 🔗 |
Sat 12:10 a.m. - 12:40 a.m.
|
Online Training from Numerical Simulations
(
Invited Talk
)
>
SlidesLive Video |
Bruno Raffin 🔗 |
Sat 12:40 a.m. - 1:10 a.m.
|
Making device-agnostic ML training and inference easy at scale
(
Invited Talk
)
>
SlidesLive Video |
Zach Mueller 🔗 |
Sat 1:10 a.m. - 1:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:30 a.m. - 2:30 a.m.
|
Poster session
(
Poster session
)
>
|
🔗 |
Sat 2:30 a.m. - 3:00 a.m.
|
Enabling extremely fast inference and training performance using dataflow and custom chip
(
Invited Talk
)
>
SlidesLive Video |
Urmish Thakker 🔗 |
Sat 3:00 a.m. - 3:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 3:30 a.m. - 4:30 a.m.
|
Lunch
|
🔗 |
Sat 4:30 a.m. - 5:00 a.m.
|
Poster session
(
Poster session
)
>
|
🔗 |
Sat 5:00 a.m. - 5:30 a.m.
|
Structured matrices for memory-efficient training and finetuning
(
Invited Talk
)
>
SlidesLive Video |
Beidi Chen 🔗 |
Sat 5:30 a.m. - 6:00 a.m.
|
Architecting and deploying compute clusters for large language models
(
Invited Talk
)
>
SlidesLive Video |
Adam DeConinck 🔗 |
Sat 6:00 a.m. - 6:20 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 6:20 a.m. - 6:30 a.m.
|
Best Paper Awards
(
Talk
)
>
SlidesLive Video |
Jean Kossaifi 🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
Coffee & Poster session
(
Poster session
)
>
|
🔗 |
Sat 7:00 a.m. - 7:50 a.m.
|
Panel discussion
(
Panel
)
>
SlidesLive Video |
Adam DeConinck · Zach Mueller · Bruno Raffin · Max Ryabinin · Julia Gusak 🔗 |
Sat 7:50 a.m. - 8:00 a.m.
|
Closing remarks
(
Talk
)
>
SlidesLive Video |
🔗 |
-
|
Optimistic Asynchrony Control: Achieving Synchronous Convergence With Asynchronous Throughput for Embedding Model Training ( Poster ) > link | Roger Waleffe 🔗 |
-
|
An Analytical Approach to Enhancing DNN Efficiency and Accuracy Using Approximate Multiplication ( Poster ) > link | Salar Shakibhamedan · Anice Jahanjoo · Amin Aminifar · Nima Amirafshar · Nima TaheriNejad · Axel Jantsch 🔗 |
-
|
Memory and Bandwidth are All Your Need for Fully Sharded Data Parallel ( Poster ) > link | J. Wang · Jan Ebert · Oleg Filatov · Stefan Kesselheim 🔗 |
-
|
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity ( Poster ) > link |
12 presentersWentao Guo · Jikai Long · YIMENG ZENG · Zirui Liu · Xinyu Yang · Yide Ran · Jacob Gardner · Osbert Bastani · Chris De Sa · Xiaodong Yu · Beidi Chen · Zhaozhuo Xu |
-
|
DiLoCo: Distributed Low-Communication Training of Language Models ( Poster ) > link | Arthur Douillard · Qixuan Feng · Andrei Rusu · Rachita Chhaparia · Yani Donchev · Adhiguna Kuncoro · Marc'Aurelio Ranzato · Arthur Szlam · Jiajun Shen 🔗 |
-
|
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates ( Poster ) > link | Cristian Meo · Ksenia Sycheva · Anirudh Goyal · Justin Dauwels 🔗 |
-
|
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones ( Poster ) > link | Zhengqing Yuan · Zhaoxu Li · Weiran Huang · Yanfang Ye · Lichao Sun 🔗 |
-
|
Multi-objective Differentiable Neural Architecture Search ( Poster ) > link | Rhea Sukthanker · Arber Zela · Benedikt Staffler · Samuel Dooley · Josif Grabocka · Frank Hutter 🔗 |
-
|
Resource-constrained Neural Architecture Search on Language Models: A Case Study ( Poster ) > link | Andreas Paraskeva · Joao Reis · Suzan Verberne · Jan Rijn 🔗 |
-
|
SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models ( Poster ) > link | Zhaoxu Luo · Bowen Song · Liyue Shen 🔗 |
-
|
Accelerating Best-of-N via Speculative Rejection ( Poster ) > link | Ruiqi Zhang · Momin Haider · Ming Yin · Jiahao Qiu · Mengdi Wang · Peter Bartlett · Andrea Zanette 🔗 |
-
|
Adaptive Model Pruning in Federated Learning through Loss Exploration ( Poster ) > link | Christian Internò · Elena Raponi · Niki van Stein · Thomas Bäck · Markus Olhofer · Yaochu Jin · CITEC Barbara Hammer 🔗 |
-
|
Single Train Multi Deploy on Topology Search Spaces using Kshot-Hypernet ( Poster ) > link | Jingyue Zhuge · Christian Mayr · Anand Subramoney · David Kappel 🔗 |
-
|
Enhancing Fine-grained Multi-modal Alignment via Adapters: A Parameter-Efficient Training Framework for Referring Image Segmentation ( Poster ) > link | Zunnan Xu · Jiaqi Huang · Ting Liu · Yong Liu · Haonan Han · Kehong Yuan · Xiu Li 🔗 |
-
|
Liouna: Biologically Plausible Learning for Efficient Pre-Training of Transferrable Deep Models ( Poster ) > link | Fady Rezk · Antreas Antoniou · Henry Gouk · Timothy Hospedales 🔗 |
-
|
Boolean Logic for Low-Energy Deep Learning ( Poster ) > link | Van Minh NGUYEN · Cristian Ocampo · Aymen Askri · Ba-Hien Tran 🔗 |
-
|
Class-aware Initialization of Early Exits for Pre-training Large Language Models ( Poster ) > link | Alperen Gormez · Erdem Koyuncu 🔗 |
-
|
Communication Efficient Federated Learning with Differentiated Aggregation ( Poster ) > link | Peyman Gholami · Hulya Seferoglu 🔗 |
-
|
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough ( Poster ) > link | Konstantin Dobler · Gerard de Melo 🔗 |
-
|
MoReDrop: Dropout without Dropping ( Poster ) > link | Li Jiang · Duo Li · Yichuan Ding · Xue Liu · Victor Chan 🔗 |
-
|
ECO: Efficient Computational Optimization for Exact Machine Unlearning in Deep Neural Networks ( Poster ) > link | Yu-Ting Huang · Pei-Yuan Wu · Chuan-Ju Wang 🔗 |
-
|
DrJAX: Scalable and Differentiable MapReduce Primitives in JAX ( Poster ) > link | J K Rush · Zachary Charles · Zachary Garrett · Sean Augenstein · Nicole Mitchell 🔗 |
-
|
Variational Stochastic Gradient Descent for Deep Neural Networks ( Poster ) > link | Haotian Chen · Anna Kuzina · Babak Esmaeili · Jakub Tomczak 🔗 |
-
|
Coarse-to-Fine Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition ( Poster ) > link | Hichem Sahbi 🔗 |
-
|
Towards Efficient and Scalable Training of Differentially Private Deep Learning ( Poster ) > link | Sebastian Rodriguez Beltran · Marlon Tobaben · Niki Loppi · Antti Honkela 🔗 |
-
|
Lowering PyTorch's Memory Consumption for Selective Differentiation ( Poster ) > link | Samarth Bhatia · Felix Dangel 🔗 |
-
|
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies ( Poster ) > link | Brian Bartoldson · James Diffenderfer · Konstantinos Parasyris · Bhavya Kailkhura 🔗 |
-
|
DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity ( Poster ) > link | Baekrok Shin · Junsoo Oh · Hanseul Cho · Chulhee Yun 🔗 |
-
|
Efficient Adaptive Federated Optimization ( Poster ) > link | Su Hyeong Lee · Sidharth Sharma · Manzil Zaheer · Tian Li 🔗 |
-
|
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives ( Poster ) > link | Huanrui Yang · Yafeng Huang · Zhen Dong · Denis Gudovskiy · Tomoyuki Okuno · Yohei Nakata · Yuan Du · EECS Kurt Keutzer · Shanghang Zhang 🔗 |
-
|
Efficient Document Ranking with Learnable Late Interactions ( Poster ) > link | Himanshu Jain · Ziwei Ji · Ankit Singh Rawat · Andreas Veit · Sadeep Jayasumana · Sashank J. Reddi · Aditya Menon · Felix Xinnan Yu 🔗 |
-
|
Effective Layer Pruning Through Similarity Metric Perspective ( Poster ) > link | Ian Pons · Bruno L. Yamamoto · Anna Reali · Artur Jordao Lima Correia 🔗 |
-
|
Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance ( Poster ) > link | Mika Jain · Stefanie Jegelka · Ishani Karmarkar · Luana Ruiz · Ellen Vitercik 🔗 |
-
|
Scalify: scale propagation for efficient low-precision LLM training ( Poster ) > link | Paul Balanca · Sam Hosegood · Carlo Luschi · Andrew Fitzgibbon 🔗 |
-
|
u-μP: The Unit-Scaled Maximal Update Parametrization ( Poster ) > link | Charlie Blake · Constantin Eichenberg · Josef Dean · Lukas Balles · Luke Prince · Björn Deiseroth · Andres Felipe Cruz Salinas · Carlo Luschi · Samuel Weinbach · Douglas Orr 🔗 |
-
|
Resolving Discrepancies in Compute-Optimal Scaling of Language Models ( Contributed Talk & Poster ) > link | Tomer Porian · Mitchell Wortsman · Jenia Jitsev · Ludwig Schmidt · Yair Carmon 🔗 |
-
|
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs ( Contributed Talk & Poster ) > link | Ashwinee Panda · Berivan Isik · Xiangyu Qi · Sanmi Koyejo · Tsachy Weissman · Prateek Mittal 🔗 |
-
|
Asynchronous Local-SGD Training for Language Modeling ( Contributed Talk & Poster ) > link | Bo Liu · Rachita Chhaparia · Arthur Douillard · Satyen Kale · Andrei Rusu · Jiajun Shen · Arthur Szlam · Marc'Aurelio Ranzato 🔗 |
-
|
AdaMeM: Memory Efficient Momentum for Adafactor ( Contributed Talk & Poster ) > link | Nikhil Vyas · Depen Morwani · Sham Kakade 🔗 |
-
|
Can LLMs Enhance Performance Prediction for Deep Learning Models? ( Contributed Talk & Poster ) > link | Karthick Panner Selvam · Phitchaya Phothilimthana · Sami Abu-El-Haija · Bryan Perozzi · Mats Brorsson 🔗 |
-
|
LoQT: Low Rank Adapters for Quantized Training ( Contributed Talk & Poster ) > link | Sebastian Loeschcke · Mads Toftrup · Michael Kastoryano · Serge Belongie · Vésteinn Snæbjarnarson 🔗 |
-
|
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors ( Contributed Talk & Poster ) > link | Vijay Lingam · Atula Tejaswi · Aditya Vavre · Aneesh Shetty · Gautham Krishna Gudur · Joydeep Ghosh · Eunsol Choi · Alexandros Dimakis · Aleksandar Bojchevski · Sujay Sanghavi 🔗 |