$\sigma$: Sigmoid Modulation for Ultra High Resolution Diffusion
Bingxuan Zhao ⋅ Qing Zhou ⋅ Yu Wang ⋅ Chuang Yang ⋅ Qi Wang
Abstract
While Diffusion Transformers (DiTs) have revolutionized high-fidelity image synthesis, the prohibitive computational costs of training at ultra-high resolutions necessitate robust inference-time extrapolation. Existing extrapolation methods typically operate under a *scale-agnostic* assumption, treating the denoising dynamics identically across resolutions. In this work, we identify a critical oversight in this paradigm: the spectral evolution of the diffusion process, transitioning from low-frequency structural construction to high-frequency texture refinement, is inherently scale-dependent. Consequently, applying a uniform strategy across scales causes a spectral misalignment, manifesting as *structural collapse* or *textural degradation*. To bridge this gap, we introduce **SigMa ($\sigma$)**, a training-free framework that utilizes Sigmoid Modulation for *scale-adaptive* calibration of the extrapolation process. SigMa orchestrates the spectral evolution via a parameterized schedule with two core mechanisms: *Decoupled Geometric Center Alignment*, which synchronizes the transition timing to secure global structure, and *Iso-Variance Rate Adaptation*, which scales the transition velocity to ensure a smooth feature handover. Extensive experiments demonstrate that SigMa effectively rectifies spectral deviations, enabling training-free extrapolation up to 16 megapixels and achieving state-of-the-art performance on standard benchmarks.
Successful Page Load