DGG-HMR: Multi-Person Human Mesh Recovery with Depth-Guided Geometric Anchoring
Abstract
Multi-person human mesh recovery (HMR) from a single image is inherently ill-posed, as multiple 3D poses can produce identical 2D projections due to depth ambiguity. Existing methods typically regress 3D translation implicitly from image features, which often leads to unreliable depth estimation. To address this issue, we propose a depth-guided multi-person HMR framework that explicitly models instance-level depth cues and integrates them into mesh recovery. Specifically, we first introduce an instance-aware depth estimator that predicts per-person pelvis depth from the full image, providing reliable instance-level 3D anchors and decoupling depth estimation from mesh regression. Then, based on these anchors, we design a geometry-anchored refinement decoder that injects instance-specific depth and spatial priors into the decoder initialization, guiding mesh refinement under joint 2D-3D supervision. Finally, we adopt a single-stage joint training strategy to coordinate depth estimation and mesh recovery in a unified framework. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art performance in both mesh reconstruction accuracy and depth ordering.