You Don't Protect if You Don't Expect: Breaking the Key Assumption behind CLIP's Test-Time Defenses
Abstract
Recent test-time defenses for CLIP claim to preserve zero-shot clean accuracy while improving adversarial robustness. However, we find the reported robustness of six recent proposed state-of-the-art methods substantially overestimated: they fail under basic adaptive attacks. We further observe that these defenses share a common reliance on an indicative measurement that is assumed to capture the distributional difference between clean and adversarial samples and to determine whether the defense should preserve or alter the static model’s prediction. We argue that this assumption is the fundamental weakness, and we propose CLIP-MAD (Manipulating Assumed Difference), an adaptive attack strategy designed to break it. CLIP-MAD efficiently expands the adversarial distribution without costly full gradient calculations and can be flexibly combined with existing attack baselines to further boost attack strength. Experiments across 13 datasets demonstrate that CLIP-MAD produces strong adversarial samples that markedly reduce the robustness of diverse test-time defenses, revealing a false sense of security in CLIP’s zero-shot robustness.