Dual Optimal Transport for Multi-Concept Composition: Structural Alignment and Texture Injection in Diffusion Models
Abstract
Diffusion models have shown impressive capabilities in text-to-image synthesis, but multi-concept personalized generation remains challenging, particularly in aligning multiple reference concepts while preserving fidelity. In this work, we propose a novel framework that addresses this challenge with a two-stage Sketch-to-Rendering process, utilizing Dual Optimal Transport (OT) for structural alignment and texture injection. Our approach consists of two key components: Structural Guidance via OT: Ensures shape alignment by using mass-preserving OT for spatial consistency, and Texture Injection via Geometry-Guided OT: Leverages low-frequency structure alignment to inject high-frequency texture details via OT-based residual transfer, preserving texture fidelity without distorting structure. Extensive experiments demonstrate that our method significantly enhances both conceptual fidelity and visual quality in multi-concept generation. Ablation studies further confirm the effectiveness of the proposed optimal transport guidance and the decoupling of structure and texture during the generation process.