R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations
Zhifeng Gu ⋅ Yuqi Wang ⋅ Bing WANG
Abstract
Relative spatial relations provide a compact description of spatial structure, serving as a key component of relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer these relations, but the inferred relations are often unreliable and are resolved by post-hoc heuristics at the cost of semantic fidelity. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. Our key motivation is that multi-hop reasoning requires repeated reference-frame shifts, which accumulate errors and lead to semantic and metric drift. To mitigate this, we propose invariant spatial decomposition to shorten relations chains, and consistent spatial imagination that uses an imagine-and-revise loop to encourage self-consistency during relation inference. We further design supportive spatial optimization that eases pose optimization by global-to-local coordinate re-parameterization. Extensive experiments across diverse scene types and instructions demonstrate that R$^3$L improves layout feasibility and semantic consistency. Notably, our analysis shows that resolving frame-induced inconsistencies during reasoning is crucial for reliable multi-hop relative spatial reasoning. Code will be released upon acceptance.
Successful Page Load