Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Abstract
We present Causal-Adapter, a modular framework that adapts frozen text-to-image diffusion backbones for counterfactual image generation. Our method enables causal interventions on target attributes while preserving all other aspects of the image, including the core identity. In contrast to prior approaches that rely on prompt engineering without explicit causal structure, Causal-Adapter leverages structural causal modeling augmented with two attribute regularization strategies: prompt-aligned injection, which aligns causal attributes with textual embeddings for precise semantic control, and a conditioned token contrastive loss to disentangle attribute factors and reduce spurious correlations. Causal-Adapter achieves state-of-the-art results on synthetic and real-world datasets, outperforming other baselines in effectiveness, composition, realism, and minimality. These results demonstrate that the approach enables efficient, robust, and generalizable counterfactual image editing with faithful attribute modification and strong preservation of core identity.