Aggregate Metrics Hide Shortcut Regimes: A Complexity-Stratified Benchmark for Novel View Synthesis
Abstract
Standard novel-view synthesis benchmarks report aggregate metrics across heterogeneous object sets, obscuring systematic performance differences tied to object appearance complexity. We introduce a view-change complexity score, the mean VGG perceptual distance between views of the same object separated by a fixed angle, and use it to stratify the 100-object COIL-100 turntable dataset into four quartiles. Evaluating two training-free baselines reveals a sharp regime crossover under VGG-16 distance: on low-complexity objects (Q1), copying the source image achieves VGG distance 0.157 while nearest-neighbour retrieval scores 0.353 (125% worse); on high-complexity objects (Q4), this reverses, with retrieval outperforming copy-source by 45%. A conditional DDPM trained for 30,000 steps confirms the stratified picture: the model consistently underperforms copy-source on Q1 while outperforming it on Q4, and conditioning on the rotation angle provides no measurable benefit at intermediate training checkpoints. These findings demonstrate that stratified complexity evaluation is essential for exposing and avoiding shortcut regimes in rotational NVS.