HieraScaffold: Learning Compact Hierarchical Representations for Scalable 4D LiDAR Generation
Abstract
Outdoor LiDAR generation has shown strong potential for autonomous driving and large-scale 3D perception. However, existing approaches remain computationally intensive and primarily static, lacking explicit modeling of temporal dynamics. This limitation weakens spatiotemporal coherence and reduces the realism of 4D LiDAR generation. We propose a hierarchical recoupling generation framework that explicitly disentangles and reconstructs large-scale geometry and motion within a unified hierarchical structure. First, we design a multi-resolution feature scaffold that predicts time-correlated unsigned distance fields and spatial gradients, enabling hierarchical decomposition of 4D dynamics into static and motion-varying components. Next, to achieve compact yet expressive modeling, we introduce a neural contourlet representation that prunes redundant scaffolds into minimal directional bases, efficiently capturing essential geometric and motion cues. Finally, we progressively re-couple these hierarchical components to generate realistic and temporally coherent 4D LiDAR data. Extensive experiments demonstrate that our method outperforms baselines in both quality and consistency, achieving 3.3\%, 25.0\%, 17.8\% improvements in FRD, MMD, and JSD, respectively, over the strong competitors, LiDMs and RangeLDM.