PIPE: Personalized Image-generation via Preference Encoding
Moonkyung Ryu ⋅ Chih-wei Hsu ⋅ Avinab Saha ⋅ Ofir Nabati ⋅ Guy Tennenholtz ⋅ Junfeng He ⋅ Craig Boutilier
Abstract
While modern text-to-image (T2I) models excel at generating high-quality images, they are typically trained to optimize with respect to generalized, population-level preferences. This homogeneous approach ignores the diverse, individual tastes and aesthetic judgments of different users. In this work, we propose a novel framework that learns fine-grained user preferences without relying on computationally expensive visual language models (VLMs) or prompt-sensitive text profiles. Instead, we introduce a robust, continuous user representation that models a user’s reward function as a linear combination of K base user types. We learn user-specific weights $\lambda_u$ via logistic regression on pairwise preference data to construct a continuous user embedding. This embedding is integrated into the diffusion process via an IP-Adapter, and fine-tuned using Diffusion-DPO. Our approach consistently generates images aligned with individual reward functions, achieving a 66.2% win rate against a pre-trained SDXL baseline and a 63.2% win rate against the state-of-the-art PPD framework.
Successful Page Load