Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Abstract
Aligning large language models to human preferences is inherently multidimensional, yet most pipelines collapse heterogeneous signals into a single objective. We seek to answer what it would take to simultaneously align a model across various domains spanning those with: verifiable rewards, non-verifiable subjective preferences, and complex interactive scenarios. Such multi-objective alignment setups are often plagued by the individual objectives being at odds with each other, resulting in inefficient training and little user control during inference. To address these issues, we propose a unified framework that standardizes PRM training across verifiable and non-verifiable settings for step-level supervision, performs vectorized multi-objective alignment with Multi-Action-Head DPO, and enables controllable inference via objective-specific weighting and PRM-guided decoding. Experiments across math reasoning, human value alignment, and multi-turn tutoring show that our framework jointly improves multiple objectives simultaneously with limited interference, while remaining generalizable and adaptable across domains and offering flexible user control at inference time.