Timezone: »
Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.
Author Information
John Kirchenbauer (University of Maryland, College Park)
Researcher at the Center for Machine Learning (ml.umd.edu, umiacs.umd.edu), advised by Professor Tom Goldstein My current research focuses on understanding various aspects of model behavior including robustness, reliability, and efficiency in the discrete data domains of both natural language and graphs. I am interested in understanding how deep learning models generalize under distribution shift from an empirical and theoretical perspective through dataset curation, architectural changes, and novel analysis techniques. One overarching goal is the development of principled yet practical definitions of what "In" and "Out-of" distribution means in practice for these data domains.
Jonas Geiping (University of Maryland, College Park)
Yuxin Wen (University of Maryland)
Jonathan Katz (University of Maryland)
Ian Miers (Department of Computer Science, University of Maryland, College Park)
Tom Goldstein (University of Maryland)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: A Watermark for Large Language Models »
Thu. Jul 27th 12:00 -- 01:30 AM Room Exhibit Hall 1 #416
More from the Same Authors
-
2022 : Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning »
Yuxin Wen · Jonas Geiping · Liam Fowl · Hossein Souri · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch »
Hossein Souri · Liam Fowl · Rama Chellappa · Micah Goldblum · Tom Goldstein -
2022 : How much Data is Augmentation Worth? »
Jonas Geiping · Gowthami Somepalli · Ravid Shwartz-Ziv · Andrew Wilson · Tom Goldstein · Micah Goldblum -
2023 : Cramming: Training a Language Model on a single GPU in one day »
Jonas Geiping · Tom Goldstein -
2023 : Understanding Data Replication in Diffusion Models »
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein -
2023 Poster: GOAT: A Global Transformer on Large-scale Graphs »
Kezhi Kong · Jiuhai Chen · John Kirchenbauer · Renkun Ni · C. Bayan Bruss · Tom Goldstein -
2023 Poster: Cramming: Training a Language Model on a single GPU in one day. »
Jonas Geiping · Tom Goldstein -
2022 Poster: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Spotlight: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations »
Amin Ghiasi · Hamid Kazemi · Steven Reich · Chen Zhu · Micah Goldblum · Tom Goldstein -
2022 Poster: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Spotlight: Certified Neural Network Watermarks with Randomized Smoothing »
Arpit Bansal · Ping-yeh Chiang · Michael Curry · Rajiv Jain · Curtis Wigington · Varun Manjunatha · John P Dickerson · Tom Goldstein -
2022 Poster: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2022 Spotlight: Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification »
Yuxin Wen · Jonas Geiping · Liam Fowl · Micah Goldblum · Tom Goldstein -
2021 : Paper Presentation 1: Analyzing the Security of Machine Learning for Algorithmic Trading »
Avi Schwarzschild · Micah Goldblum · Tom Goldstein -
2021 Workshop: ICML Workshop on Representation Learning for Finance and E-Commerce Applications »
Senthil Kumar · Sameena Shah · Joan Bruna · Tom Goldstein · Erik Mueller · Oleg Rokhlenko · Hongxia Yang · Jianpeng Xu · Oluwatobi O Olabiyi · Charese Smiley · C. Bayan Bruss · Saurabh H Nagrecha · Svitlana Vyetrenko -
2021 Poster: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2021 Poster: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Data Augmentation for Meta-Learning »
Renkun Ni · Micah Goldblum · Amr Sharaf · Kezhi Kong · Tom Goldstein -
2021 Spotlight: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks »
Avi Schwarzschild · Micah Goldblum · Arjun Gupta · John P Dickerson · Tom Goldstein -
2020 Poster: Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness »
Aounon Kumar · Alexander Levine · Tom Goldstein · Soheil Feizi -
2020 Poster: Certified Data Removal from Machine Learning Models »
Chuan Guo · Tom Goldstein · Awni Hannun · Laurens van der Maaten -
2020 Poster: Adversarial Attacks on Copyright Detection Systems »
Parsa Saadatpanah · Ali Shafahi · Tom Goldstein -
2020 Poster: Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks »
Micah Goldblum · Steven Reich · Liam Fowl · Renkun Ni · Valeriia Cherepanova · Tom Goldstein -
2020 Poster: The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent »
Karthik Abinav Sankararaman · Soham De · Zheng Xu · W. Ronny Huang · Tom Goldstein -
2019 Poster: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2019 Oral: Transferable Clean-Label Poisoning Attacks on Deep Neural Nets »
Chen Zhu · W. Ronny Huang · Hengduo Li · Gavin Taylor · Christoph Studer · Tom Goldstein -
2018 Poster: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2018 Oral: Linear Spectral Estimators and an Application to Phase Retrieval »
Ramina Ghods · Andrew Lan · Tom Goldstein · Christoph Studer -
2017 Poster: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Talk: Adaptive Consensus ADMM for Distributed Optimization »
Zheng Xu · Gavin Taylor · Hao Li · Mario Figueiredo · Xiaoming Yuan · Tom Goldstein -
2017 Poster: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer -
2017 Talk: Convex Phase Retrieval without Lifting via PhaseMax »
Tom Goldstein · Christoph Studer