De4D-SLAM: Gradient-Isolated Static-Dynamic Decoupling for Monocular SLAM in Dynamic Environments
Abstract
Conventional dynamic SLAM approaches typically treat dynamic objects as outliers based on pre-defined categories, creating perceptual blind spots that limit the comprehensive environmental perception required for embodied agents. Although integrating Gaussian Splatting into SLAM enables holistic scene representation, it introduces an optimization paradox: without categorical priors, flexible dynamic primitives rapidly overfit static residuals. This phenomenon undermines the self-supervised error signals necessary for distinguishing motion. In response, we present De4D-SLAM, a novel framework designed for decoupled 4D reconstruction from monocular video. Our approach features a Gradient-Isolated Decoupling strategy, which leverages static reconstruction residuals to supervise a Spatially-Aware Kolmogorov-Arnold Network (SA-KAN), ensuring robust, category-agnostic motion segmentation. Additionally, we propose a Flow-Induced Initialization prior to stabilize the non-convex optimization of 4D Gaussian primitives using dense optical flow. Extensive evaluations on the TUM and Bonn benchmarks demonstrate that De4D-SLAM achieves state-of-the-art performance in both tracking and dynamic reconstruction, successfully reconciling the tension between robust localization and high-fidelity 4D mapping.