Geometric Reciprocity: Unlocking Self-Supervision for Stereoscopic Video Generation
Abstract
Monocular-to-stereo conversion synthesizes stereoscopic content from 2D videos for immersive 3D experiences. Modern Depth-Image-Based Rendering (DIBR) approaches identify stereo inpainting of disocclusions as the critical bottleneck. Training-based methods achieve superior quality but rely on scarce stereo pairs or synthetic data with domain gaps. We address this through the first self-supervised framework learning from monocular videos via cycle consistency. Our key contribution is the Geometric Reciprocity Theorem (GRT): the disocclusion mask when synthesizing a target view exactly equals the mask of pixels lost when warping back from target to source, enabling analytical computation of test-time disocclusion masks directly from monocular images. This achieves exact train-test consistency, enabling self-supervised learning from unlimited monocular videos and substantial improvements over training-free and supervised state-of-the-art methods.