Interleaved Selective State Space Models for Efficient WiFi-Based 3D Multi-Person Pose Estimation
Abstract
WiFi-based human pose estimation offers privacy-preserving and occlusion-robust sensing, but current Transformer-based approaches suffer from quadratic complexity and lack explicit inductive biases for Channel State Information structure. We propose WiFi-Mamba, the first State Space Model architecture for WiFi-based 3D multi-person pose estimation. Our approach introduces three key contributions: (1) a Dual-Stream Selective State Space Model that processes amplitude and phase through parallel pathways with cross-stream state coupling to respect their distinct physical properties, (2) Selective State Attention for pose query decoding with SSM-derived sequential context, and (3) Persistent SSM Memory for temporal consistency across frames without recurrent memory explosion. Extensive experiments on the Person-in-WiFi 3D dataset, covering both single-person and multi-person scenarios, demonstrate 16-27% MPJPE reduction across varying numbers of persons while using only 4.4% of baseline parameters (2.14M vs. 48.2M), achieving superior efficiency-accuracy trade-offs particularly beneficial for edge deployment in privacy-sensitive continuous monitoring scenarios.