StyleDistillation: A New Insight of Image Style Enables Personalized Aesthetic Manipulation
Abstract
Text-guided stylized image generation has yielded promising advances by leveraging the powerful capabilities of text-to-image diffusion models. However, the inherent coupling of style and content information within the reference image presents a significant challenge. To address this, we propose StyleDistillation, a novel approach grounded in two key observations about the CLIP embedding space from a style perspective. By leveraging a lightweight StyleDistiller module, combined with carefully designed optimization objectives based on geometric and semantic priors, we can extract fine-grained style representation from the reference image. Additionally, we introduce a Prompt Alignment Enhancement mechanism during inference, which significantly improves the control that text prompts exert over the generated images. Extensive experiments demonstrate that our method achieves outstanding performance in both style reproduction and prompt alignment. Furthermore, StyleDistillation supports various personalized operations, including style editing and style fusion, highlighting its substantial potential for diverse applications.