Task-Aware Preference Calibration for Direct Preference Optimization
Abstract
Direct Preference Optimization (DPO) has become a predominant approach for aligning large language models with human preferences. Recent work has used perplexity differentials to identify unreliable preference labels, but these methods apply uniform calibration strategies across all samples. We observe that the reliability of perplexity signals varies substantially across task types: perplexity differentials strongly correlate with preference quality for factual tasks but provide weak signals for creative tasks where novelty is valued. Based on this observation, we propose Task-Aware Preference Calibration (TAPC), which learns task-conditioned calibration functions that adapt to the characteristics of different prompt types. TAPC employs a task encoder to extract prompt representations and learns task-specific slope and bias parameters for mapping perplexity signals to confidence targets. Through meta-learning on a small reference dataset, TAPC discovers how to weight perplexity signals appropriately for each task category. Experiments on Llama-3-8B and Qwen2-7B demonstrate that TAPC outperforms existing methods across multiple benchmarks, with particularly large improvements on creative and open-ended tasks where uniform calibration strategies fail.