ICML Soft prompting might be a bug, not a feature

Poster
in
Workshop: Challenges in Deployable Generative AI

Soft prompting might be a bug, not a feature

Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan

Keywords: [ Soft prompting ] [ Interpretability ] [ LLMs ] [ prompting ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Prompt tuning, or "soft prompting," replaces text prompts to generative models with learned embeddings (i.e. vectors) and is used as an alternative to parameter-efficient fine-tuning. Prior work suggests analyzing soft prompts by interpreting them as natural language prompts. However, we find that soft prompts occupy regions in the embedding space that are distinct from those containing natural language, meaning that direct comparisons may be misleading. We argue that because soft prompts are currently uninterpretable, they could potentially be a source of vulnerability of LLMs to malicious manipulations during deployment.

Chat is not available.

Poster in Workshop: Challenges in Deployable Generative AI

Soft prompting might be a bug, not a feature

Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan

Poster
in
Workshop: Challenges in Deployable Generative AI