Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles
Xiao Li ⋅ Yixuan Jia ⋅ Zekai Zhang ⋅ Xiang Li ⋅ Lianghe Shi ⋅ Jinxin Zhou ⋅ Zhihui Zhu ⋅ Liyue Shen ⋅ Qing Qu
Abstract
Diffusion models are effective generative frameworks with strong representation learning capabilities, yet the intrinsic properties that govern their semantic structure and generalization remain poorly understood. Drawing inspiration from self-supervised representation learning (SSL), we introduce an evaluation framework that decomposes diffusion features into a perturbation invariant component and a residual component induced by noise and augmentations. From this decomposition we derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that measures how residual, augmentation-sensitive energy contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior. On the representation side, we find invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and find that $\mathrm{ICR}$ serves as a sensitive training time indicator of the early learning phenomenon: rising residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show diffusion models can be monitored from a self-supervised perspective via the geometry of their learned representations.
Successful Page Load