Workshop
2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
Julia Gusak 路 Jean Kossaifi 路 Alena Shilova 路 Rocco Sedona 路 Jan Kautz
Hall A1
Sat 27 Jul, midnight PDT
Join HPC and AI experts to learn how to train neural networks at an unprecedented scale with your existing infrastructure
Chat is not available.
Timezone: America/Los_Angeles
Schedule
Sat 12:00 a.m. - 12:00 a.m.
|
Coffee & Poster placement
|
馃敆 |
Sat 12:00 a.m. - 12:10 a.m.
|
Welcome speech from organizers
(
Talk
)
>
SlidesLive Video |
Julia Gusak 馃敆 |
Sat 12:10 a.m. - 12:40 a.m.
|
Online Training from Numerical Simulations
(
Invited Talk
)
>
SlidesLive Video |
Bruno Raffin 馃敆 |
Sat 12:40 a.m. - 1:10 a.m.
|
Making device-agnostic ML training and inference easy at scale
(
Invited Talk
)
>
SlidesLive Video |
Zach Mueller 馃敆 |
Sat 1:10 a.m. - 1:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 1:30 a.m. - 2:30 a.m.
|
Poster session
(
Poster session
)
>
|
馃敆 |
Sat 2:30 a.m. - 3:00 a.m.
|
Enabling extremely fast inference and training performance using dataflow and custom chip
(
Invited Talk
)
>
SlidesLive Video |
Urmish Thakker 馃敆 |
Sat 3:00 a.m. - 3:30 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 3:30 a.m. - 4:30 a.m.
|
Lunch
|
馃敆 |
Sat 4:30 a.m. - 5:00 a.m.
|
Poster session
(
Poster session
)
>
|
馃敆 |
Sat 5:00 a.m. - 5:30 a.m.
|
Structured matrices for memory-efficient training and finetuning
(
Invited Talk
)
>
SlidesLive Video |
Beidi Chen 馃敆 |
Sat 5:30 a.m. - 6:00 a.m.
|
Architecting and deploying compute clusters for large language models
(
Invited Talk
)
>
SlidesLive Video |
Adam DeConinck 馃敆 |
Sat 6:00 a.m. - 6:20 a.m.
|
Contributed Talks
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 6:20 a.m. - 6:30 a.m.
|
Best Paper Awards
(
Talk
)
>
SlidesLive Video |
Jean Kossaifi 馃敆 |
Sat 6:30 a.m. - 7:00 a.m.
|
Coffee & Poster session
(
Poster session
)
>
|
馃敆 |
Sat 7:00 a.m. - 7:50 a.m.
|
Panel discussion
(
Panel
)
>
SlidesLive Video |
Adam DeConinck 路 Zach Mueller 路 Bruno Raffin 路 Max Ryabinin 路 Julia Gusak 馃敆 |
Sat 7:50 a.m. - 8:00 a.m.
|
Closing remarks
(
Talk
)
>
SlidesLive Video |
馃敆 |
-
|
Optimistic Asynchrony Control: Achieving Synchronous Convergence With Asynchronous Throughput for Embedding Model Training ( Poster ) > link | Roger Waleffe 馃敆 |
-
|
An Analytical Approach to Enhancing DNN Efficiency and Accuracy Using Approximate Multiplication ( Poster ) > link | Salar Shakibhamedan 路 Anice Jahanjoo 路 Amin Aminifar 路 Nima Amirafshar 路 Nima TaheriNejad 路 Axel Jantsch 馃敆 |
-
|
Memory and Bandwidth are All Your Need for Fully Sharded Data Parallel ( Poster ) > link | J. Wang 路 Jan Ebert 路 Oleg Filatov 路 Stefan Kesselheim 馃敆 |
-
|
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity ( Poster ) > link |
12 presentersWentao Guo 路 Jikai Long 路 YIMENG ZENG 路 Zirui Liu 路 Xinyu Yang 路 Yide Ran 路 Jacob Gardner 路 Osbert Bastani 路 Chris De Sa 路 Xiaodong Yu 路 Beidi Chen 路 Zhaozhuo Xu |
-
|
DiLoCo: Distributed Low-Communication Training of Language Models ( Poster ) > link | Arthur Douillard 路 Qixuan Feng 路 Andrei Rusu 路 Rachita Chhaparia 路 Yani Donchev 路 Adhiguna Kuncoro 路 Marc'Aurelio Ranzato 路 Arthur Szlam 路 Jiajun Shen 馃敆 |
-
|
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates ( Poster ) > link | Cristian Meo 路 Ksenia Sycheva 路 Anirudh Goyal 路 Justin Dauwels 馃敆 |
-
|
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones ( Poster ) > link | Zhengqing Yuan 路 Zhaoxu Li 路 Weiran Huang 路 Yanfang Ye 路 Lichao Sun 馃敆 |
-
|
Multi-objective Differentiable Neural Architecture Search ( Poster ) > link | Rhea Sukthanker 路 Arber Zela 路 Benedikt Staffler 路 Samuel Dooley 路 Josif Grabocka 路 Frank Hutter 馃敆 |
-
|
Resource-constrained Neural Architecture Search on Language Models: A Case Study ( Poster ) > link | Andreas Paraskeva 路 Joao Reis 路 Suzan Verberne 路 Jan Rijn 馃敆 |
-
|
SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models ( Poster ) > link | Zhaoxu Luo 路 Bowen Song 路 Liyue Shen 馃敆 |
-
|
Accelerating Best-of-N via Speculative Rejection ( Poster ) > link | Ruiqi Zhang 路 Momin Haider 路 Ming Yin 路 Jiahao Qiu 路 Mengdi Wang 路 Peter Bartlett 路 Andrea Zanette 馃敆 |
-
|
Adaptive Model Pruning in Federated Learning through Loss Exploration ( Poster ) > link | Christian Intern貌 路 Elena Raponi 路 Niki van Stein 路 Thomas B盲ck 路 Markus Olhofer 路 Yaochu Jin 路 CITEC Barbara Hammer 馃敆 |
-
|
Single Train Multi Deploy on Topology Search Spaces using Kshot-Hypernet ( Poster ) > link | Jingyue Zhuge 路 Christian Mayr 路 Anand Subramoney 路 David Kappel 馃敆 |
-
|
Enhancing Fine-grained Multi-modal Alignment via Adapters: A Parameter-Efficient Training Framework for Referring Image Segmentation ( Poster ) > link | Zunnan Xu 路 Jiaqi Huang 路 Ting Liu 路 Yong Liu 路 Haonan Han 路 Kehong Yuan 路 Xiu Li 馃敆 |
-
|
Liouna: Biologically Plausible Learning for Efficient Pre-Training of Transferrable Deep Models ( Poster ) > link | Fady Rezk 路 Antreas Antoniou 路 Henry Gouk 路 Timothy Hospedales 馃敆 |
-
|
Boolean Logic for Low-Energy Deep Learning ( Poster ) > link | Van Minh NGUYEN 路 Cristian Ocampo 路 Aymen Askri 路 Ba-Hien Tran 馃敆 |
-
|
Class-aware Initialization of Early Exits for Pre-training Large Language Models ( Poster ) > link | Alperen Gormez 路 Erdem Koyuncu 馃敆 |
-
|
Communication Efficient Federated Learning with Differentiated Aggregation ( Poster ) > link | Peyman Gholami 路 Hulya Seferoglu 馃敆 |
-
|
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough ( Poster ) > link | Konstantin Dobler 路 Gerard de Melo 馃敆 |
-
|
MoReDrop: Dropout without Dropping ( Poster ) > link | Li Jiang 路 Duo Li 路 Yichuan Ding 路 Xue Liu 路 Victor Chan 馃敆 |
-
|
ECO: Efficient Computational Optimization for Exact Machine Unlearning in Deep Neural Networks ( Poster ) > link | Yu-Ting Huang 路 Pei-Yuan Wu 路 Chuan-Ju Wang 馃敆 |
-
|
DrJAX: Scalable and Differentiable MapReduce Primitives in JAX ( Poster ) > link | J K Rush 路 Zachary Charles 路 Zachary Garrett 路 Sean Augenstein 路 Nicole Mitchell 馃敆 |
-
|
Variational Stochastic Gradient Descent for Deep Neural Networks ( Poster ) > link | Haotian Chen 路 Anna Kuzina 路 Babak Esmaeili 路 Jakub Tomczak 馃敆 |
-
|
Coarse-to-Fine Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition ( Poster ) > link | Hichem Sahbi 馃敆 |
-
|
Towards Efficient and Scalable Training of Differentially Private Deep Learning ( Poster ) > link | Sebastian Rodriguez Beltran 路 Marlon Tobaben 路 Niki Loppi 路 Antti Honkela 馃敆 |
-
|
Lowering PyTorch's Memory Consumption for Selective Differentiation ( Poster ) > link | Samarth Bhatia 路 Felix Dangel 馃敆 |
-
|
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies ( Poster ) > link | Brian Bartoldson 路 James Diffenderfer 路 Konstantinos Parasyris 路 Bhavya Kailkhura 馃敆 |
-
|
DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity ( Poster ) > link | Baekrok Shin 路 Junsoo Oh 路 Hanseul Cho 路 Chulhee Yun 馃敆 |
-
|
Efficient Adaptive Federated Optimization ( Poster ) > link | Su Hyeong Lee 路 Sidharth Sharma 路 Manzil Zaheer 路 Tian Li 馃敆 |
-
|
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives ( Poster ) > link | Huanrui Yang 路 Yafeng Huang 路 Zhen Dong 路 Denis Gudovskiy 路 Tomoyuki Okuno 路 Yohei Nakata 路 Yuan Du 路 EECS Kurt Keutzer 路 Shanghang Zhang 馃敆 |
-
|
Efficient Document Ranking with Learnable Late Interactions ( Poster ) > link | Himanshu Jain 路 Ziwei Ji 路 Ankit Singh Rawat 路 Andreas Veit 路 Sadeep Jayasumana 路 Sashank J. Reddi 路 Aditya Menon 路 Felix Xinnan Yu 馃敆 |
-
|
Effective Layer Pruning Through Similarity Metric Perspective ( Poster ) > link | Ian Pons 路 Bruno L. Yamamoto 路 Anna Reali 路 Artur Jordao Lima Correia 馃敆 |
-
|
Model-Agnostic Graph Dataset Compression with the Tree Mover鈥檚 Distance ( Poster ) > link | Mika Jain 路 Stefanie Jegelka 路 Ishani Karmarkar 路 Luana Ruiz 路 Ellen Vitercik 馃敆 |
-
|
Scalify: scale propagation for efficient low-precision LLM training ( Poster ) > link | Paul Balanca 路 Sam Hosegood 路 Carlo Luschi 路 Andrew Fitzgibbon 馃敆 |
-
|
u-渭P: The Unit-Scaled Maximal Update Parametrization ( Poster ) > link | Charlie Blake 路 Constantin Eichenberg 路 Josef Dean 路 Lukas Balles 路 Luke Prince 路 Bj枚rn Deiseroth 路 Andres Felipe Cruz Salinas 路 Carlo Luschi 路 Samuel Weinbach 路 Douglas Orr 馃敆 |
-
|
Resolving Discrepancies in Compute-Optimal Scaling of Language Models ( Contributed Talk & Poster ) > link | Tomer Porian 路 Mitchell Wortsman 路 Jenia Jitsev 路 Ludwig Schmidt 路 Yair Carmon 馃敆 |
-
|
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs ( Contributed Talk & Poster ) > link | Ashwinee Panda 路 Berivan Isik 路 Xiangyu Qi 路 Sanmi Koyejo 路 Tsachy Weissman 路 Prateek Mittal 馃敆 |
-
|
Asynchronous Local-SGD Training for Language Modeling ( Contributed Talk & Poster ) > link | Bo Liu 路 Rachita Chhaparia 路 Arthur Douillard 路 Satyen Kale 路 Andrei Rusu 路 Jiajun Shen 路 Arthur Szlam 路 Marc'Aurelio Ranzato 馃敆 |
-
|
AdaMeM: Memory Efficient Momentum for Adafactor ( Contributed Talk & Poster ) > link | Nikhil Vyas 路 Depen Morwani 路 Sham Kakade 馃敆 |
-
|
Can LLMs Enhance Performance Prediction for Deep Learning Models? ( Contributed Talk & Poster ) > link | Karthick Panner Selvam 路 Phitchaya Phothilimthana 路 Sami Abu-El-Haija 路 Bryan Perozzi 路 Mats Brorsson 馃敆 |
-
|
LoQT: Low Rank Adapters for Quantized Training ( Contributed Talk & Poster ) > link | Sebastian Loeschcke 路 Mads Toftrup 路 Michael Kastoryano 路 Serge Belongie 路 V茅steinn Sn忙bjarnarson 馃敆 |
-
|
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors ( Contributed Talk & Poster ) > link | Vijay Lingam 路 Atula Tejaswi 路 Aditya Vavre 路 Aneesh Shetty 路 Gautham Krishna Gudur 路 Joydeep Ghosh 路 Eunsol Choi 路 Alexandros Dimakis 路 Aleksandar Bojchevski 路 Sujay Sanghavi 馃敆 |