Inference-Time Forward-Process Alignment in Diffusion Models
Abstract
The prevailing inference framework for diffusion models formulates generation fundamentally as a problem of numerical integration. This perspective casts the model as an accurate estimator, neglecting the inherent statistical uncertainty of the denoising process. In this work, we propose inference-time \textbf{F}orward-process \textbf{A}lignment for \textbf{Di}ffusion models (\textbf{DiFA}), a training-free inference framework that reformulates diffusion sampling as a sequential state estimation problem. Instead of discarding historical predictions, DiFA treats the inference trajectory as a sequence of correlated observations with varying variances. We derive a principled fading-memory Kalman filter strategy that synthesizes historical predictions to minimize estimation variance. Crucially, to counteract the over-smoothing typically associated with variance reduction, we introduce a deviation boosting mechanism that adaptively restores high-frequency details. Empirically, DiFA yields significant improvements in FID, IS, and FD-DINOv2 scores on CIFAR-10 and ImageNet, demonstrating that aligning inference with the forward statistical structure substantially improves generative fidelity.