SDiD:Shared diffusion prior for efficient distributed stereo image compression
Abstract
Stereo vision is widely utilized in automotive imagery and 3D reconstruction, creating a demand for compressing stereo images. Existing methods for stereo image compression often employ VAE-like architectures based on distortion optimization, leading to subpar perceptual quality at low bitrates. While generative compression excels in high perceptual fidelity at low bitrates, it struggles to maintain consistency across viewpoints, making decoded images less useful for critical downstream tasks. To address this, we introduce SDiD, a distributed stereo image compression architecture based on shared pre-trained diffusion priors. We employ a diffusion prior alignment module to efficiently obtain the main-view-prior from the foundation diffusion, and utilize a prior transformation structure to enable the auxiliary view to achieve reliable and fast perceptual enhancement while maintaining consistency. Through extensive experiments, we demonstrate that SDiD outperforms existing methods in perceptual quality across multiple datasets. Even at extremely low bitrates, SDiD can accurately recover depth information between decoded images. On the InStereo2K dataset, SDiD requires only one-third of the bits compared to the state-of-the-art baseline (0.02 bpp vs. 0.06 bpp) to reconstruct image pairs with similar depth information.