Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models

AdaNF: Quantization Group Adaptive NormalFloat for Low Bit Fine-tuning of LLMs

Yeojoon Youn · Sehoon Kim · Suhong Moon · Sang Keun Choe · Ce Zhang


Abstract:

The integration of Quantization and Low-Rank Adaptation (LoRA) presents a promising avenue for the memory-efficient fine-tuning of large language models (LLMs) within GPU memory constraints. QLoRA, introduced by \cite{dettmers2024qlora}, successfully demonstrates high-fidelity 4-bit fine-tuning using an information-theoretically optimal datatype, NormalFloat. However, challenges arise with lower-bit fine-tuning, such as 2-bit, where QLoRA often struggles with convergence due to significant information loss from quantization. In this study, we address these challenges by adjusting the cumulative distribution function (CDF) offset of NormalFloat, which significantly reduces information loss through improved NormalFloat initialization. Furthermore, we introduce quantization group \textbf{Ada}ptive \textbf{N}ormal\textbf{F}loat (AdaNF), a technique that dynamically adjusts the NormalFloat CDF offset based on the statistical characteristics of each quantization group in the parameters. This adaptive approach minimizes the Lp norm of the quantization error through a grid search, allowing for customized quantization that preserves more information. Our empirical investigations across various models and downstream tasks in the low-bit fine-tuning regime confirm that our method achieves performance comparable to existing methods, effectively mitigating the limitations of prior approaches.

Chat is not available.