ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Face Recognition
Abstract
Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, many current strategies rely on external foundation models or datasets, which can be restricted by policy or legal constraints, especially for sensitive modalities such as human face images and videos. We propose ScoreMix, a self-contained data augmentation method to boost recognition performance by leveraging score compositionality in class-conditioned diffusion models. ScoreMix mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific hard augmentations without external resources. We systematically study class-selection strategies and find that mixing classes that are distant in the discriminator embedding space yields larger gains, providing up to 3\% additional average improvement across benchmarks over proximity-based selection. Interestingly, we observe that learned condition and embedding spaces are largely uncorrelated under standard alignment metrics, and that condition-space distances are weakly correlated to downstream gains. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points without hyperparameter search, highlighting robustness and practicality. Code and dataset will be made publicly available.