AnyCanvas: Potential Field Guidance for Training-Free Spatial Control in Text-to-Image Diffusion
Abstract
Diffusion-based text-to-image (T2I) models have demonstrated remarkable advancements in generating high-quality images. However, while real-world applications like product packaging and logo design necessitate synthesis within irregular geometries, existing methods struggle to handle such constraints. Therefore, generating complete pictures that conform to arbitrary-shaped canvas constraints while maintaining semantic integrity remains a significant challenge. To address this, we introduce AnyCanvas, a training-free framework that leverages a Mask-to-Potential Field paradigm to convert binary masks into a differentiable potential field, which guides content to naturally converge within target regions. Extensive experiments demonstrate that AnyCanvas achieves 4.23\% higher spatial adherence to user-specified constraints while maintaining 99.45\% of the semantic fidelity measured by CLIP score, leading to a superior harmonic mean of spatial and semantic metrics. AnyCanvas also exhibits robust generalizability across different model backbones and versatile spatial control objectives.