DDGA: Dirichlet Distributional Gradient Aggregation for Transferable Vision-Language Adversarial Attacks
Abstract
Vision-Language Models (VLMs) achieve remarkable performance on multimodal tasks but remain highly vulnerable to adversarial examples, making transferable attacks essential for realistic robustness evaluation. Recent Adversarial Evolution Triangle (AET) methods improve transferability by interpolating over a simplex formed by clean and historical adversarial samples, yet rely on finite random sampling to approximate effective perturbation distributions, which is unstable under limited budgets. In this paper, we propose Dirichlet Distributional Gradient Aggregation (DDGA), a distribution-aware adversarial attack framework that explicitly models and optimizes perturbations over the AET simplex. DDGA parameterizes simplex mixing weights with a learnable Dirichlet policy and optimizes the expected adversarial objective via policy gradient, replacing heuristic sampling with principled distributional optimization. Moreover, we exploit the closed-form covariance of the learned distribution to construct orthogonal perturbations that enhance gradient diversity. Extensive experiments on image-text retrieval and image captioning demonstrate that DDGA consistently outperforms state-of-the-art transfer-based attacks across multiple VLM architectures.