Poster
Layer-wise Quantization for Quantized Optimistic Dual Averaging
Anh Duc Nguyen · Ilia Markov · Zhengqing Wu · Ali Ramezani-Kebrya · Kimon Antonakopoulos · Dan Alistarh · Volkan Cevher
West Exhibition Hall B2-B3 #W-613
Training advanced AI models across many computers often stalls because of the huge amount of information that must be exchanged. We introduce a technique that squeezes the data shared during training by assigning different compression levels to each layer based on its importance. Key layers receive more precision, while others are represented with fewer bits. We integrate this into a training algorithm called Quantized Optimistic Dual Averaging (QODA), which works seamlessly with compressed data and skips extra synchronization steps. We rigorously prove that, despite the reduced communication, our method converges as reliably as standard uncompressed training. In experiments on image generation and large language models across dozens of GPUs, our approach more than doubles training speed while matching final accuracy. By cutting communication costs and speeding up each training round, our work makes distributed deep learning faster, more scalable, and energy-efficient.
Live content is unavailable. Log in and register to view live content