PortraitRL: Reinforcement Learning for Personalized Portrait Pose Transfer with Multi-Objective Reward Modeling
Abstract
Portrait pose transfer (PPT) requires generative models to preserve fine-grained identity details while following complex pose and layout modification instructions. Existing methods often struggle with extensive data annotation requirements or employ optimization objectives that are suboptimal for addressing PPT's two key challenges. In this work, we propose PortraitRL, a novel post-training framework that addresses these challenges with a multi-objective reward mechanism. Specifically, we employ LVLM-based reward functions to effectively evaluate PPT's two challenges and apply within-group standardization to eliminate scale differences, allowing these rewards to effectively guide optimization. More importantly, we devise a novel reinforcement learning algorithm, Negative-aware Score Preference Optimization (NaSPO), which automatically identifies positive and negative preference samples through within-group advantages, eliminating annotation requirements while fully leveraging both positive and negative learning signals. Extensive experiments show state-of-the-art performance, with significant improvements in both detail preservation and editing accuracy.