ICML Black Box Adversarial Prompting for Foundation Models

Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Black Box Adversarial Prompting for Foundation Models

Keywords: [ Generative Models ] [ black-box optimization ] [ large language models ] [ text-to-text ] [ foundation models ] [ Adversarial Attacks ] [ applications of bayesian optimization ] [ text-to-image ] [ prompting ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts for unstructured image and text generation. These prompts, which can be standalone or prepended to benign prompts, induce specific behaviors into the generative process, such as generating images of a particular object or generating high perplexity text.

Chat is not available.

Oral in Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Black Box Adversarial Prompting for Foundation Models

Oral
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning