Budget-Constrained Step-Leve Diffusion Caching
Abstract
Step-level caching offers a promising avenue for accelerating diffusion models by exploiting temporal redundancy. However, existing strategies predominantly rely on heuristic, threshold-based metrics to trigger cache updates. This reactive paradigm is inherently myopic as it optimizes only for local feature consistency, and yields unpredictable deployment latency. In this work, we propose BudCache, a budget-constrained optimization framework that inverts this standard: instead of letting error thresholds dictate the cost, we enforce a strict computational budget and globally search for the caching policy that maximizes generation fidelity. To tackle the combinatorial complexity of step selection, we employ a hybrid strategy combining Simulated Annealing with deterministic Hill Climbing. This approach efficiently escapes local optima to locate globally optimized cache masks within minutes, incurring zero inference overhead. Crucially, to address the trajectory drift induced by aggressive caching, we introduce a cache-aware schedule alignment mechanism. By refining the time discretization via a lightweight, data-free distillation, we significantly enhance performance in low-NFE regimes. Extensive experiments on FLUX.1-dev and Wan2.1 demonstrate that BudCache consistently outperforms heuristic baselines, achieving superior perceptual quality under rigid latency constraints.