Glimpse: Geometry Learning of Multi-scale Structural Priors for 3D Pose Estimation
Abstract
Monocular 3D human pose estimation is fundamentally challenged by severe occlusion and inherent depth ambiguity. To address this, we propose Glimpse, a framework that learns robust 3D poses by explicitly modeling anatomical geometry from a single image. We recast the problem as geometry learning of multi-scale structural priors, realized through two synergistic components. First, structured sampling captures the body's geometric continuity through dual-level feature extraction, acquiring both local joint appearance and continuous features along skeletal limbs via deformable sampling. By propagating limb-level geometric cues to their connected joints, this design bridges information gaps caused by occlusion. Second, geometric correction ensures global 3D consistency by lifting coherent 2D features into a canonical 3D reference space, where a shared 3D anchor guides a distance-aware fusion mechanism. Extensive experiments conducted on Human3.6M and MPI-INF-3DHP demonstrate that Glimpse achieves state-of-the-art performance, with superior robustness under severe occlusion and complex articulation.