Timezone: »
Black Box Adversarial Prompting for Foundation Models
Natalie Maus · Patrick Chao · Eric Wong · Jacob Gardner
Event URL: https://openreview.net/forum?id=aI5QPjTRbS »
Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts for unstructured image and text generation. These prompts, which can be standalone or prepended to benign prompts, induce specific behaviors into the generative process, such as generating images of a particular object or generating high perplexity text.
Author Information
Natalie Maus (University of Pennsylvania)
Patrick Chao (University of Pennsylvania)
Eric Wong (University of Pennsylvania)
Jacob Gardner (University of Pennsylvania)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 : Black Box Adversarial Prompting for Foundation Models »
Dates n/a. Room
More from the Same Authors
-
2023 : Interventional and Counterfactual Inference with Diffusion Models »
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan -
2023 : Interventional and Counterfactual Inference with Diffusion Models »
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan -
2023 Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning »
Sijia Liu · Pin-Yu Chen · Dongxiao Zhu · Eric Wong · Kathrin Grosse · Baharan Mirzasoleiman · Sanmi Koyejo -
2023 Oral: Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference »
Kyurae Kim · Kaiwen Wu · Jisu Oh · Jacob Gardner -
2023 Poster: Do Machine Learning Models Learn Statistical Rules Inferred from Data? »
Aaditya Naik · Yinjun Wu · Mayur Naik · Eric Wong -
2023 Poster: Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference »
Kyurae Kim · Kaiwen Wu · Jisu Oh · Jacob Gardner