Pluralistic AI Alignment Requires Inference-Time Multi-Objective Control
Abstract
Pluralistic AI alignment---accommodating diverse human values rather than a single canonical preference---requires agents to reason under multiple, often conflicting objectives, such as helpfulness, honesty, harmlessness, fairness, and context-specific user preferences. Unlike classical learning methods that optimize a fixed scalar objective, pluralistic alignment requires distinguishing between objectives that may be flexibly traded off and constraints that should remain non-negotiable. These two categories map onto two existing lines of research: offline multi-objective reinforcement learning provides tools for representing and navigating trade-offs among multiple objectives, and offline safe reinforcement learning formalizes safety-critical constraints and feasible policy regions. A third line, multi-objective LLM alignment, exposes a further requirement: because retraining large models for each preference configuration is infeasible, controllability must shift from training time to deployment time. Taken together, these observations motivate our central position: \emph{inference-time multi-objective control should be a central goal of pluralistic AI alignment}. We argue that unifying these perspectives yields a framework in which training-time learning produces reusable objective and safety representations, while inference-time control enables adaptation to diverse and changing preferences without retraining.