SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion
Abstract
Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the connection between modalities, yet we identify that standard hard projection severs this connection, inducing Cross-Modal Entropy Collapse where sparse support hinders visual prior propagation. To bridge this gap, we propose SplAttN, which maximizes Point-wise Mutual Information via Differentiable Gaussian Splatting. By reformulating projection as continuous density estimation, SplAttN facilitates gradient flow and optimizes connection learnability. Extensive experiments show that SplAttN achieves state-of-the-art performance on PCN and ShapeNet-55/34. Crucially, we utilize the real-world KITTI benchmark as a stress test for multi-modal reliance. Counter-factual evaluation reveals that while baselines degenerate into unimodal template retrievers insensitive to visual removal, SplAttN maintains a robust dependency on visual cues, validating that our method establishes an effective cross-modal connection. Code is available at https://anonymous.4open.science/r/Anonymous-766B/.