TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
Yuxiang Chen ⋅ Yifan Liu ⋅ Xiaoming Xu ⋅ Pengle Zhang ⋅ Michael Beyer ⋅ Martin Rapp ⋅ Jun Zhu ⋅ Jianfei Chen
Abstract
Large Language Models (LLMs) training is prohibitively expensive, driving interest in low-precision fully-quantized training (FQT). While novel 4-bit formats like NVFP4 offer substantial efficiency gains, achieving near-lossless training at such low precision remains challenging. We introduce **TetraJet-v2**, an end-to-end 4-bit FQT method that leverages NVFP4 for activations, weights and gradients in all linear layers. We identify two critical issues hindering low-precision LLM training: weight oscillation and outliers. To address these, we propose: 1) an unbiased double-block quantization method for NVFP4 linear layers, 2) **OsciReset**, an algorithm to suppress weight oscillation, and 3) **OutControl**, an algorithm to retain outlier accuracy. **TetraJet-v2** outperforms prior methods on FP4 pre-training for LLMs across models up to 370M parameters trained up to 212B tokens, reducing the performance gap to BF16 by an average of $51.3$% while enabling an $1.67\times$ end-to-end speedup over FP8.
Successful Page Load