Enhanced Latent-Space Adversarial Training for Super-Resolution
Abstract
Real-world super-resolution (SR) is challenging due to complex degradations. HYPIR, a recent state-of-the-art diffusion-based restoration model, struggles to deal with this task in a single step. Although a naive two-step cascade improves the results, over-saturation, limited fine-grained details, and high inference latency remain. To address these limitations, we present HYPIR++. It removes the degradation removal encoder and noise augmentation to better preserve fidelity cues from low-quality inputs. To enhance fine-grained detail restoration and local structure fidelity, HYPIR++ introduces a tailored latent ConvNeXt and a latent patch discriminator, enabling adversarial learning directly in the latent space. In addition, HYPIR++ improves inference efficiency by reducing the text sequence length and replacing full attention with sparse neighbor attention, allowing direct processing of high-resolution images without block-based tiling. Extensive experiments demonstrate that HYPIR++ achieves superior perceptual quality and a 1.71× speedup over HYPIR, establishing a new state-of-the-art for real-world SR.