Poster Tue, Jul 7, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Peiwen Sun ⋅ Shiqiang Lang ⋅ Dongming Wu ⋅ Ding Yi ⋅ Kaituo Feng ⋅ Huadai Liu ⋅ Zhen Ye ⋅ Rui Liu ⋅ Yun-Hui Liu ⋅ Jianan Wang ⋅ Xiangyu Yue

Project Page

Abstract

With the current surge in spatial reasoning, researchers have made significant progress in understanding indoor scenes, but still struggle with more diverse applications. This paper aims to advance all-scale spatial reasoning by tackling two key challenges: 1) the heavy reliance on indoor 3D scans and labor-intensive annotations for dataset curation; 2) the absence of all-scale modeling, which often leads to overfitting to single scenes. In this paper, we introduce a holistic solution that integrates a structured spatial reasoning knowledge system, scale-aware modeling, and a progressive training paradigm, as the first attempt to broaden the scope of all-scale spatial intelligence. Using a task-specific, specialist-driven automated pipeline, we curate over 38K video scenes across 5 spatial scales to create SpaceVista-1M, a dataset comprising 1M spatial QAs spanning 19 diverse tasks. While specialist models offer valuable domain knowledge, they are often unreliable evaluators. Therefore, we build an all-scale benchmark with precise annotations by manually recording and retrieving videos. Nevertheless, naive training with SpaceVista-1M often yields suboptimal results due to the potential knowledge conflict. Accordingly, we introduce SpaceVista-7B, a spatial reasoning model that accepts inputs beyond semantics and uses scale as an anchor for scale-aware experts and progressive rewards. Finally, extensive evaluations across 5 benchmarks, including our SpaceVista-Bench, demonstrate competitive performance, showcasing generalization across all scales and scenarios. All materials will be released at https://mm2km.github.io/.