Timezone: »

Soft prompting might be a bug, not a feature
Luke Bailey · Gustaf Ahdritz · Anat Kleiman · Siddharth Swaroop · Finale Doshi-Velez · Weiwei Pan
Event URL: https://openreview.net/forum?id=MHWDdMEJ5s »

Prompt tuning, or "soft prompting," replaces text prompts to generative models with learned embeddings (i.e. vectors) and is used as an alternative to parameter-efficient fine-tuning. Prior work suggests analyzing soft prompts by interpreting them as natural language prompts. However, we find that soft prompts occupy regions in the embedding space that are distinct from those containing natural language, meaning that direct comparisons may be misleading. We argue that because soft prompts are currently uninterpretable, they could potentially be a source of vulnerability of LLMs to malicious manipulations during deployment.

Author Information

Luke Bailey (Harvard)
Luke Bailey

Undergraduate at Harvard College studying computer science and mathematics. Applying for PhD programs in machine learning next fall.

Gustaf Ahdritz (Harvard University)
Anat Kleiman (Harvard University)
Siddharth Swaroop (Harvard University)
Finale Doshi-Velez (Harvard University)
Finale Doshi-Velez

Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability. Selected Additional Shinies: BECA recipient, AFOSR YIP and NSF CAREER recipient; Sloan Fellow; IEEE AI Top 10 to Watch

Weiwei Pan (Harvard University)

More from the Same Authors