Tightening the Score Matching Gap for Diffusion Models
Abstract
Diffusion models (DMs) are a state-of-the-art generative method to approximately sample from an unknown distribution. Their training and evaluation primarily rely on an Evidence Lower Bound (ELBO), which relates the Kullback-Leibler (KL) divergence of model samples to the score matching loss along the path, which serves as a tractable surrogate. The difference between sample quality and the score matching loss produced by this bound leads to the score matching gap, which is known to be tight in the worst-case but not descriptive of sample quality in general. In this work, we provide a theoretical analysis of this gap, developing tighter bounds for three metrics: KL divergence, reverse KL divergence, and Wasserstein distance, effectively exploiting the regularity of the class of score estimators. Our results suggest that the quality of the score approximation has more impact on closing the score matching gap for low noise scales. To obtain these bounds, our key technical insight is to exploit the contraction properties of the backward processes. In particular, we rely on entropy flows, logarithmic Sobolev inequalities and reflection couplings, rigorously linking the ergodicity of the Langevin diffusion to the score matching gap problem.