Navigating the Pareto Frontier of Alignment:Spectrum-Adaptive Fine-Tuning for LLMs
Abstract
Supervised Fine-Tuning (SFT) with Negative Log-Likelihood (NLL) remains the standard post-training paradigm for Large Language Models, yet it imposes an excessive penalty on low-probability target tokens. This focus forces the model to prioritize minimizing the loss of difficult samples over optimizing the overall quality of the generation, often leading to unwarranted overconfidence. On the other hand, alternatives like Dynamic Fine-Tuning (DFT) suffer from vanishing gradients on these tokens, which severely hinders the acquisition of new concepts. To bridge this gap, we propose SAFT Spectrum-Adaptive Fine-Tuning), a unified framework that interpolates between the aggressive learning signal of NLL and the robust nature of probability-weighted optimization. By adaptively balancing these objectives, SAFT effectively mitigates outlier sensitivity without sacrificing learning efficiency. Empirically, our method achieves state-of-the-art performance on mathematical reasoning benchmarks, demonstrating superior generalization on out-of-distribution tasks. Our anonymized code is available at https://anonymous.4open.science/r/SAFT-9FEB.