Bodily Emotion Recognition on ABEE: A Diagnostic Baseline Study for Multi-Label Classification and VAD Regression
Abstract
We present a diagnostic baseline study for predicting multi-label emotions and continuous Valence–Arousal–Dominance (VAD) from bodily expressions in video, using the ABEE dataset from the BEEU Challenge 2025. We evaluate a feature engineered XGBoost pipeline and a lightweight CADResNet (1.043M parameters), and conduct four controlled experiments: temporal frame ablation (5–24 frames), task-interference analysis, optical-flow motion baseline, and five-fold cross-validation. VAD regression remains fundamentally challenging across all configurations (R2 < 0), withsevere output range collapse (pred std/gt std < 0.27). Crucially, joint training improves VAD R2 by 1.84× over VAD-only, contradicting the task-interference hypothesis. These findings establish reproducible benchmarks and concrete failure diagnostics for future body-only affective computing research.