Semantic Granularity Navigation in Image Editing
Abstract
Despite the generative capabilities of diffusion models, real-image editing remains constrained by a persistent trade-off between semantic editability and structural fidelity. We identify a primary cause of this limitation as the implicit coupling of editing progress with noise scale in existing paradigms. This coupling creates a budget misallocation: achieving stronger semantic changes often necessitates initializing from high-noise states, which can consume computation on disrupting global layout before semantic modification begins. To address this, we introduce NaviEdit, a training-free framework that decouples the editing trajectory from the denoising schedule via a strict Time-Axis Consistency principle. By reformulating editing as controlled vector field navigation on a distinct task axis, NaviEdit strategically concentrates the computational budget within semantically responsive intermediate scales while reducing exposure to destructive high-noise regimes. Experiments show that NaviEdit outperforms strong state-of-the-art baselines across PIE-Bench, achieving larger semantic edits with better structure preservation under comparable compute budgets, without requiring model tuning.