MAMBO-G: Magnitude-Aware Mitigation for Boosted Guidance
Shangwen Zhu ⋅ Qianyu Peng ⋅ Zhilei Shu ⋅ Yuting Hu ⋅ Han Zhang ⋅ Andy Zheng ⋅ Xinyu Cui ⋅ Jian Zhao ⋅ Ruili Feng ⋅ Fan Cheng
Abstract
High-fidelity text-to-image and text-to-video generation typically relies on Classifier-Free Guidance (CFG), but achieving optimal results often demands computationally expensive sampling schedules. In this work, we propose MAMBO-G, a training-free acceleration framework that significantly reduces computational cost by dynamically optimizing guidance magnitudes. We observe that standard CFG schedules are inefficient, applying disproportionately large updates in early steps that hinder convergence speed. MAMBO-G mitigates this by modulating the guidance scale based on the update-to-prediction magnitude ratio, effectively stabilizing the trajectory and enabling rapid convergence. This efficiency is particularly vital for resource-intensive tasks like video generation. Our method serves as a universal plug-and-play accelerator, achieving up to 3$\times$ speedup on Stable Diffusion v3.5 (SD3.5) and 4$\times$ on Lumina. Most notably, MAMBO-G accelerates the 14B-parameter Wan2.1 video model by 2$\times$ while preserving visual fidelity, offering a practical solution for efficient large-scale video synthesis. Our implementation follows a mainstream open-source diffusion framework and is plug-and-play with existing pipelines.
Successful Page Load