UniMapping: Unified SLAM Framework for Map-Centric Embodied Perception
Abstract
Simultaneous Localization and Mapping (SLAM) is increasingly expected to provide reusable spatial representations for downstream perception. However, existing approaches often struggle with scale-consistency and producing maps that lack the geometric fidelity required for reliable perception. We propose UniMapping, a unified SLAM framework that constructs a persistent neural-descriptor map from multimodal observations. We introduce a Spatial-Aware Deformable Transformer that injects explicit geometric inductive bias to ensure scale-invariant feature extraction, alongside a Spatial Fusion strategy that decouples feature aggregation from temporal sequences. Extensive experiments on both indoor and outdoor benchmarks demonstrate competitive SLAM performance. Notably, our method significantly enhances downstream tasks (mAP +3.1% and mIoU +7.1%) by leveraging accumulated multi-view context.