Hyperbolic Hierarchical Alignment for Video-Based Visible-Infrared Person Re-Identification
Abstract
Video-based visible-infrared person re-identification (VVI-ReID) aims to learn robust video-level representations under modality discrepancy. However, existing methods typically rely on Euclidean geometry, which is suboptimal for modeling the complex temporal dynamics within visible and infrared tracklets, as it inevitably distorts the intrinsic hierarchical structure inherent in diverse temporal variations (e.g., occlusion, pose). In this paper, we propose Hyperbolic Hierarchical Alignment (HHA), which unifies spatio-temporal modeling and cross-modality alignment on the Poincar\'e ball. HHA employs a Hyperbolic Hierarchical Spatio-Temporal Aggregator (HHSA) to organize time-varying cues into low-distortion hierarchical representations via Hyperbolic Geometry Interaction (HGI) and Dual-Geometry Fusion (DGF). Furthermore, we introduce Geometry-Aware Modality Alignment (GMA), which integrates Hyperbolic Modality Alignment (HMA) to couple modality centroids for geometric consistency and Hyperbolic Prototype Alignment (HPA) to anchor both modalities to shared identity prototypes for robust discrimination. Experiments on HITSZ-VCM and BUPTCampus demonstrate state-of-the-art performance.