Skip to yearly menu bar Skip to main content


Poster

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang · Xiaoman Pan · Feng Luo · Shuang Qiu · Han Zhong · Dong Yu · Jianshu Chen


Abstract: We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning, and the heterogeneity, multi-dimensionality, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce \textbf{R}ewards-\textbf{i}n-\textbf{C}ontext (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. The salient features of RiC are simplicity and adaptivity, as it only requires supervised fine-tuning of a single foundation model and support dynamic adjustment of user preferences during inference time. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approaches the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning both Large Language Models (LLMs) and diffusion models to accommodate diverse rewards with only around $10\%$ GPU hours compared with MORLHF baselines.

Live content is unavailable. Log in and register to view live content