Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Decoding-Time Language Model Alignment with Multiple Objectives

Ruizhe Shi · Yifang Chen · Yushi Hu · Alisa Liu · Hannaneh Hajishirzi · Noah Smith · Simon Du


Abstract: Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that combines a set of base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform and derive an efficient decoding strategy to greedily output the next token from predicted probabilities of all base models. Theoretically, we show why existing approaches can be highly sub-optimal even in natural settings and obtain optimality guarantees for our method, validated by empirical results.

Chat is not available.