Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Introducing Vision into Large Language Models Expands Attack Surfaces and Failure Implications

Keywords: foundation models Visual Language Models Adversarial Examples large language models

Project Page [ OpenReview]

Abstract

Recently, there has been a surge of interest in introducing vision into Large Language Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as Flamingo, BLIP-2, and GPT-4, signifies an exciting convergence of advancements in both visual and language foundation models. Yet, risks associated with this integrative approach are largely unexamined. We shed light on the security implications of this trend. First, we underscore that the continuous and redundant nature of the additional visual input space makes it a fertile ground for adversarial attacks. This unavoidably expands the attack surfaces of LLMs, thus complicating defenses. Specifically, we demonstrate that attackers can craft adversarial visual inputs to circumvent the safety mechanisms of LLMs, inducing biased behaviors of the models in the language domain. Second, we point out the broad functionality of LLMs, in turn, also presents visual attackers with a wider array of achievable adversarial objectives, extending the implications of security failures beyond mere misclassification. By revealing these risks, we emphasize the urgent need for thorough risk assessment, robust defense strategies, and responsible deployment practices to ensure the secure and safe use of VLMs.

Video

Chat is not available.