Soft prompting might be a bug, not a feature
Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan
Abstract
Prompt tuning, or "soft prompting," replaces text prompts to generative models with learned embeddings (i.e. vectors) and is used as an alternative to parameter-efficient fine-tuning. Prior work suggests analyzing soft prompts by interpreting them as natural language prompts. However, we find that soft prompts occupy regions in the embedding space that are distinct from those containing natural language, meaning that direct comparisons may be misleading. We argue that because soft prompts are currently uninterpretable, they could potentially be a source of vulnerability of LLMs to malicious manipulations during deployment.
Video
Chat is not available.
Successful Page Load