Hearing Without Noticing? Attention-Aware Stealthy Black-box Adversarial Audio Attacks
Abstract
Automatic Speech Recognition (ASR) systems, such as those in intelligent assistants, are vulnerable to adversarial examples (AEs). Benign audio clips like music, when embedded with small perturbations, can trick ASR models into recognizing attacker-specified commands. Prior studies focus on minimizing perturbation magnitude to craft AEs. However, they fails to achieve high attack stealthiness against black-box ASR systems in the physical world. In this paper, we introduce the first music carrier selection algorithm and an attention-aware stealthiness loss function to generate stealthy AEs. Extensive evaluations on five commercial ASR APIs and three widely-used voice assistants demonstrate that our method significantly outperforms state-of-the-art techniques in both effectiveness and stealthiness. Notably, in a user study involving 200 participants, 55.6\% of participants perceived our physical adversarial examples as benign audio, which is an improvement of over 20\% compared to existing methods.