Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Challenges in Deployable Generative AI

Seeing Through the Facade: Understanding the Realism, Expressivity, and Limitations of Diffusion Models

Christopher Pondoc · Joseph O'Brien · Joseph Guman

Keywords: [ Computer Vision ] [ Deepfakes ] [ ICML ] [ Machine Learning ] [ image classification ] [ Diffusion Model ]


Abstract:

While text-to-image generation models such as DALLE-2 and Stable Diffusion 2.0 have captured the public psyche with the ability to create photorealistic images, just how "fake" are their outputs? To better understand this question, we present a three-prong process for extracting insights from diffusion models. First, we show strong results in classifying real vs. fake images by using transfer learning with a nearly decade-old model, setting an initial benchmark of realism not yet achieved. After visualizing the classifier's inference decisions, we conclude that concrete, singular subject objects -- like buildings and hands -- helped distinguish real from fake images. However, we found no consensus on which features were distinct to each of DALLE-2 and Stable Diffusion. Finally, after dissecting the prompts used to generate fake images, we found that prompts that failed to trick our classifier contained similar types of nouns while prompts that succeeded in this task differed for each model. We believe our work can serve as the first step in an iterative process that continuously establishes increasingly difficult benchmarks of realism for diffusion models to overcome. The code for our project is open source: https://github.com/cpondoc/diffusion-model-analysis.

Chat is not available.