Timezone: »
The growing interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) like Flamingo and GPT-4, is steering a convergence of vision and language foundation models. Yet, risks associated with this integration are largely unexamined. This paper sheds light on the security and safety implications of this trend. First, we underscore that the continuous and high-dimensional nature of the additional visual input makes it a weak link against adversarial attacks, representing an expanded attack surface of vision-integrated LLMs. Second, we highlight that the versatility of LLMs also presents visual attackers with a wider array of achievable adversarial objectives, extending the implications of security failures beyond mere misclassification. As an illustration, we present a case study in which we exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision. To our surprise, we discover that a single visual adversarial example can universally jailbreak an aligned model, inducing it to heed a wide range of harmful instructions and generate harmful content far beyond merely imitating the derogatory corpus used to optimize the adversarial example. Our study underscores the escalating adversarial risks associated with the pursuit of multimodality. More broadly, our findings connect the long-studied fundamental adversarial vulnerabilities of neural networks to the nascent field of AI alignment. The presented attack suggests a fundamental adversarial challenge for AI alignment, especially in light of the emerging trend towards multimodality in frontier foundation models.
Author Information
Xiangyu Qi (Princeton University)
Kaixuan Huang (Princeton University)
Ashwinee Panda (Princeton University)
Mengdi Wang (Princeton University)
Prateek Mittal (Princeton University)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 : Introducing Vision into Large Language Models Expands Attack Surfaces and Failure Implications »
Dates n/a. Room
More from the Same Authors
-
2021 : A Short Note on the Relationship of Information Gain and Eluder Dimension »
Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei -
2023 : Teach GPT To Phish »
Ashwinee Panda · Zhengming Zhang · Yaoqing Yang · Prateek Mittal -
2023 : Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker »
Sophie Dai · Wenxin Ding · Arjun Nitin Bhagoji · Daniel Cullina · Ben Zhao · Heather Zheng · Prateek Mittal -
2023 : Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 : A Privacy-Friendly Approach to Data Valuation »
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 : On the Reproducibility of Data Valuation under Learning Stochasticity »
Jiachen Wang · Feiyang Kang · Chiyuan Zhang · Ruoxi Jia · Prateek Mittal -
2023 : Scaling In-Context Demonstrations with Structured Attention »
Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen -
2023 : Differentially Private Generation of High Fidelity Samples From Diffusion Models »
Vikash Sehwag · Ashwinee Panda · Ashwini Pokle · Xinyu Tang · Saeed Mahloujifar · Mung Chiang · Zico Kolter · Prateek Mittal -
2023 : Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight »
Jiacheng Guo · Minshuo Chen · Huan Wang · Caiming Xiong · Mengdi Wang · Yu Bai -
2023 : Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang -
2023 Poster: Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data »
Minshuo Chen · Kaixuan Huang · Tuo Zhao · Mengdi Wang -
2023 Poster: MultiRobustBench: Benchmarking Robustness Against Multiple Attacks »
Sophie Dai · Saeed Mahloujifar · Chong Xiang · Vikash Sehwag · Pin-Yu Chen · Prateek Mittal -
2023 Poster: Effectively Using Public Data in Privacy Preserving Machine Learning »
Milad Nasresfahani · Saeed Mahloujifar · Xinyu Tang · Prateek Mittal · Amir Houmansadr -
2023 Poster: STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Mengdi Wang · Furong Huang · Dinesh Manocha -
2023 Poster: Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP »
Jiacheng Guo · Zihao Li · Huazheng Wang · Mengdi Wang · Zhuoran Yang · Xuezhou Zhang -
2023 Poster: Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories »
Zixuan Zhang · Minshuo Chen · Mengdi Wang · Wenjing Liao · Tuo Zhao -
2023 Poster: Uncovering Adversarial Risks of Test-Time Adaptation »
Tong Wu · Feiran Jia · Xiangyu Qi · Jiachen Wang · Vikash Sehwag · Saeed Mahloujifar · Prateek Mittal -
2022 : Policy Gradient: Theory for Making Best Use of It »
Mengdi Wang -
2022 : Learner Knowledge Levels in Adversarial Machine Learning »
Sophie Dai · Prateek Mittal -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2021 Poster: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2021 Spotlight: Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries »
Arjun Nitin Bhagoji · Daniel Cullina · Vikash Sehwag · Prateek Mittal -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2020 : QA for invited talk 7 Wang »
Mengdi Wang -
2020 : Invited talk 7 Wang »
Mengdi Wang -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation »
Yaqi Duan · Zeyu Jia · Mengdi Wang -
2019 Poster: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo -
2019 Poster: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Analyzing Federated Learning through an Adversarial Lens »
Arjun Nitin Bhagoji · Supriyo Chakraborty · Prateek Mittal · Seraphin Calo -
2018 Poster: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Oral: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2017 Poster: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin -
2017 Talk: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin