DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds
Abstract
We address the problem of 4D reconstruction from partial point cloud sequences, where observations from depth sensors are incomplete, unordered, and lack explicit point correspondence over time. Recovering coherent 4D geometry in this geometry-only setting is challenging due to missing observations and ambiguous dynamics. While recent progress has largely been driven by image-based methods, existing point-based approaches typically focus on single-object scenarios, assume relatively complete inputs, and rely on explicit correspondence. To mitigate these limitations, we propose DynaTok, a point-based framework for correspondence-free 4D reconstruction from partial point cloud sequences that operates without images. DynaTok encodes each frame into compact latent tokens, aggregates incomplete observations over time with a Transformer-based spatiotemporal encoder, and decouples geometry and motion via a residual token design within a single unified model. Conditioned on the aggregated tokens, a point flow-matching decoder reconstructs complete and temporally consistent 4D point cloud sequences using only point cloud supervision. Experiments on object-level and scene-level benchmarks demonstrate improved reconstruction quality and temporal coherence under partial point cloud observations.