Poster Tue, Jul 7, 2026 • 6:30 PM – 8:15 PM PDT HALL A #2404

ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition

Parsa Rahimi ⋅ Sébastien Marcel

Project Page

Abstract

Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, many current strategies rely on external foundation models or datasets, which can be restricted by policy or legal constraints, especially for sensitive modalities such as human face images and videos. We propose ScoreMix, a self-contained data augmentation method to boost recognition performance by leveraging score compositionality in class-conditioned diffusion models. ScoreMix mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific hard augmentations without external resources. We systematically study class-selection strategies and find that mixing classes that are distant in the discriminator embedding space yields larger gains, providing up to 3\% additional average improvement across benchmarks over proximity-based selection. Interestingly, we observe that learned condition and embedding spaces are largely uncorrelated under standard alignment metrics, and that condition-space distances are weakly correlated to downstream gains. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points without hyperparameter search, highlighting robustness and practicality. Project page: https://parsa-ra.github.io/scoremix/ .

Lay Summary

We analyze and leverage score composition in conditional diffusion models as an effective self-contained augmentation method. We applied our method in both open and closed classification tasks.