Test-Time Debiasing with Probabilistic Prompts via Wasserstein Distance in Vision-Language Models
Abstract
Vision-Language Models (VLMs) inherit social biases from large-scale pretraining data, and these biases can amplify in downstream tasks, leading to systematic performance disparities across sensitive groups. Due to the high training cost and the risk of catastrophic forgetting, recent research has focused more on lightweight \emph{test-time} debiasing, aiming to obtain an ideal fair embedding for each query. However, such point-based corrections are often unstable and become notably weaker in multi-class settings, where group structure cannot be adequately captured by a single point. Therefore, we propose W4D, a distributional debiasing framework that reframes fairness as aligning query embedding distributions to group reference distributions under the Wasserstein distance, which provides a geometry-aware notion of discrepancy beyond mean shifts. To make this alignment practical at test-time, W4D introduces probabilistic prompts that induce controlled distributional perturbations and optimizes a Wasserstein-based objective to reduce cross-group disparity while preserving task-relevant semantics. This distributional perspective improves robustness in multi-class debiasing and yields a stronger fairness--utility trade-off across diverse VLM downstream evaluations. Our code is available at https://anonymous.4open.science/r/W4D/.