Skip to yearly menu bar Skip to main content


Poster

dPOD: On Discrete Prompt Optimization for Diffusion Models

Ruochen Wang · Ting Liu · Cho-Jui Hsieh · Boqing Gong


Abstract:

This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: 1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the optimization process. 2) Text Gradient: Computing the text gradient incurs prohibitively high memory-runtime complexity, as it requires backpropagating through all inference steps of the diffusion model. Beyond the problem formulation, our main technical contributions lie in solving the above challenges. First, we design a family of dynamically generated compact subspaces comprised of only the most relevant words to user input, substantially restricting the domain space. Second, we introduce ``Shortcut Gradient" --- an effective replacement for the text gradient that can be obtained with constant memory and runtime. Empirical evaluation of prompts collected from diverse sources (DiffusionDB, ChatGPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model.

Live content is unavailable. Log in and register to view live content