Don't Force the Fit: Bounded Log-Likelihood Loss for Enhanced Reasoning in Large Language Models
Abstract
Supervised fine-tuning (SFT) is central to aligning large language models (LLMs) with instruction following and task-specific reasoning. Despite its success, SFT optimizes token-level likelihoods under the implicit assumption that strictly fitting all tokens in expert demonstrations induces the desired downstream behavior. However, in reasoning tasks where correctness is defined by logical validity or final outcomes rather than exact token realizations, this assumption can lead to optimization misalignment. We empirically observe that low-probability tokens in reasoning demonstrations often correspond to realization-specific or stylistic variations, and that reducing their influence during training consistently improves generalization on reasoning benchmarks. Motivated by this insight, we propose the Bounded Log-Likelihood Loss (BLL-Loss), a simple and parameter-free alternative to standard likelihood training that bounds gradient contributions from low-probability tokens while preserving conventional optimization behavior. We provide theoretical insights and extensive empirical results demonstrating that BLL-Loss improves reasoning generalization across diverse model scales and challenging benchmarks.