Skip to yearly menu bar Skip to main content


Poster

CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

Jie Xiao · Kai Zhu · Han Zhang · Zhiheng Liu · Yujun Shen · Zhantao Yang · Ruili Feng · Yu Liu · Xueyang Fu · Zheng-Jun Zha

Hall C 4-9 #410
[ ] [ Paper PDF ]
[ Slides [ Poster
Wed 24 Jul 4:30 a.m. PDT — 6 a.m. PDT

Abstract:

Consistency Models (CMs) have showed a promise in creating high-quality images with few steps. However, the way to add new conditional controls to the pre-trained CMs has not been explored. In this paper, we explore the pivotal subject of leveraging the generative capacity and efficiency of consistency models to facilitate controllable visual content creation via ControlNet. First, it is observed that ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but sacrifice image low-level details and realism. To tackle with this issue, we develop a CMs-tailored training strategy for ControlNet using the consistency training. It is substantiated that ControlNet can be successfully established through the consistency training technique. Besides, a unified adapter can be trained utilizing the consistency training, which enhances the adaptation of DM's ControlNet. We quantitatively and qualitatively evaluate all strategies across various conditional controls, including sketch, hed, canny, depth, human pose, low-resolution image and masked image, with the pre-trained text-to-image latent consistency models.

Chat is not available.