Timezone: »

Teach GPT To Phish
Ashwinee Panda · Zhengming Zhang · Yaoqing Yang · Prateek Mittal
Event URL: https://openreview.net/forum?id=tGvWCD9BEP »

Quantifying privacy risks in large language models (LLM) is an important research question. We take a step towards answering this question by defining a real-world threat model wherein an entity seeks to augment an LLM with private data they possess via fine-tuning.The entity also seeks to improve the quality of its LLM outputs over time by learning from human feedback.We propose a novel phishing attack', a data extraction attack on this system where an attacker uses blind data poisoning, to induce the model to memorize the association between a given prompt and somesecret' privately held data.We validate that across multiple scales of LLMs and data modalities, an attacker can inject prompts into a training dataset that induce the model to memorize a `secret' that is unknown to the attacker, and easily extract this memorized secret.

Author Information

Ashwinee Panda (Princeton University)
Zhengming Zhang (Southeast University)
Yaoqing Yang (Dartmouth College)
Prateek Mittal (Princeton University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors