Spherical Procrustes Alignment for Reliable Medical Audio Diagnosis
Abstract
Reliable medical audio diagnosis demands models that are not only accurate but also honest about their uncertainty. However, fine-tuned models based on small, imbalanced datasets often become overconfident due to norm bias, whereby they rely on feature magnitude rather than semantic alignment. As a theoretical optimum for class-separating geometric structures, Equiangular Tight Frame (ETF) is effective for class-imbalanced and calibration tasks because of its maximal angular separability and geometric fairness. Yet, existing ETF-based methods perform weak when deal with noisy medical data; specifically, the gradient-based rotation results in instability, while the fixed ETFs fail in adapting to drifting prototypes. To solve this, We propose Spherical Procrustes Alignment (SPA), the first method combining spherical constraints with dynamic ETF alignment for medical audio. The SPA includes two branches: 1) the Spherical branch, which normalizes features and weights to eliminate the norm bias, and 2) the Geometric branch, which adapts features, tracks prototypes, and uses Dynamic Procrustes Alignment to align the fixed ETF with the prototypes, generating stable logits. Then a self-alignment mechanism fuses the two branches to jointly optimize the logits. Experiments on ICBHI 2017 and CirCor DigiScope datasets show that the SPA achieves new state-of-the-art results, turning large pre-trained models into reliable and efficient clinical tools without extra inference costs.