Trajectory-Aware Spiking DiTs Conversion via Membrane Potential Error-Feedback
Abstract
Diffusion Transformers (DiTs) have achieved state-of-the-art generative performance, yet their iterative denoising process remains computationally expensive and energy-intensive. Spiking Neural Networks (SNNs) offer a promising neuromorphic alternative for energy efficiency; however, the non-differentiable nature of spiking neurons makes direct training difficult, positioning ANN-to-SNN conversion as a more practical, training-free solution. In this paper, we identify a critical challenge unique to converting DiTs: standard fixed-scale spiking neurons fail to accommodate the highly dynamic activation ranges inherent across denoising steps. This mismatch leads to cumulative errors that significantly degrade generation fidelity. To resolve this, we propose a novel conversion framework featuring Multi-Threshold (MT) neurons and a Membrane Potential Error-Feedback (MPEF) mechanism. MT neurons expand the expressive capacity of discrete spikes by employing a multi-level firing strategy. Concurrently, MPEF exploits the temporal correlation between successive denoising steps to recycle residual membrane potential, effectively compensating for information loss and mitigating distribution shifts without retraining. Extensive experiments on ImageNet demonstrate that our framework achieves competitive generative quality with superior energy efficiency, establishing a new performance benchmark for spiking Diffusion Transformers.