Poster Wed, Jul 8, 2026 • 1:00 AM – 2:45 AM PDT HALL A #1213

SpaCeFormer: Space-Curve Transformer for Open-Vocabulary 3D Instance Segmentation without Proposals

Christopher Choy ⋅ Junha Lee ⋅ Chunghyun Park ⋅ Minsu Cho ⋅ Jan Kautz

Project Page

Abstract

Open-vocabulary 3D segmentation is crucial for real-world applications, yet existing methods are constrained by fragmented masks and inconsistent captions in dataset generation, and by multi-stage pipelines prone to error propagation. We present SpaCeFormer-3M, the largest open-vocabulary 3D instance segmentation dataset with 846K instances from 15K scenes, and SpaCeFormer (Space-Curve Transformer), a proposal-free segmentation architecture. Our data pipeline leverages multi-view mask clustering to produce geometry-consistent 3D instances and employs multi-view VLM prompting for view-consistent captions. On the modeling side, SpaCeFormer combines spatial window attention with Morton curve serialization for spatially coherent features, and a RoPE-enhanced decoder to predict instance masks directly from learned queries without external proposals. On ScanNet200, our approach achieves 11.1 zero-shot mAP, a 2.8$\times$ improvement over prior proposal-free methods while requiring only 0.21 seconds per scene.