Timezone: »
Progress in generative AI depends not only on better model architectures, but on terabytes of scraped Flickr images, Wikipedia pages, Stack Overflow answers, and websites. But generative models ingest vast quantities of intellectual property (IP), which they can memorize and regurgitate verbatim. Several recently-filed lawsuits relate such memorization to copyright infringement. These lawsuits will lead to policies and legal rulings that define our ability, as ML researchers and practitioners, to acquire training data, and our responsibilities towards data owners and curators.
AI researchers will increasingly operate in a legal environment that is keenly interested in their work — an environment that may require future research into model architectures that conform to legal requirements. Understanding the law and contributing to its development will enable us to create safer, better, and practically useful models.
We’re excited to share a series of tutorials from renowned experts in both ML and law and panel discussions, where researchers in both disciplines can engage in semi-moderated conversation.
Our workshop will begin to build a comprehensive and precise synthesis of the legal issues at play. Beyond IP, the workshop will also address privacy and liability for dangerous, discriminatory, or misleading and manipulative outputs. It will take place on 29 July 2023, in Ballroom B.
Sat 12:00 p.m. - 12:15 p.m.
|
Welcome and Opening Remarks
(
Opening
)
SlidesLive Video » |
🔗 |
Sat 12:15 p.m. - 12:45 p.m.
|
Invited Talk: Pam Samuelson
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 12:45 p.m. - 1:05 p.m.
|
Invited Talk: Mark Lemley
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 1:05 p.m. - 1:40 p.m.
|
Coffee Break
|
🔗 |
Sat 1:40 p.m. - 2:00 p.m.
|
Invited Talk: Miles Brundage
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 2:00 p.m. - 3:00 p.m.
|
Panel Discussion on Intellectual Property
(
Panel
)
SlidesLive Video » Panel on Intellectual Property: Pam Samuelson, Mark Lemley, Luis Villa, Katherine Lee (Moderated by Jack Balkin and A. Feder Cooper) |
🔗 |
Sat 3:00 p.m. - 4:30 p.m.
|
Lunch Break
|
🔗 |
Sat 4:30 p.m. - 4:45 p.m.
|
Invited Talk: Jack Balkin
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 4:45 p.m. - 5:15 p.m.
|
Spotlight Presentations (5 Papers)
(
Talk
)
SlidesLive Video » The Restatement (Artificial) of Torts The Data Provenance Initiative Break It Till You Make It: Limitations of Copyright Liability Under a Pre-training Paradigm of AI Development Diffusion Art or Digital Forgery? Investigating Data Replication in Stable Diffusion Measuring the Success of Diffusion Models at Imitating Human Artists |
🔗 |
Sat 5:15 p.m. - 6:00 p.m.
|
In Person Poster Session
(
Poster
)
|
🔗 |
Sat 6:00 p.m. - 6:30 p.m.
|
Coffe Break
|
🔗 |
Sat 6:30 p.m. - 6:45 p.m.
|
Invited Talk: Nicholas Carlini
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 6:45 p.m. - 7:00 p.m.
|
Invited Talk: Gautam Kamath
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Sat 7:00 p.m. - 8:00 p.m.
|
Panel Discussion on Privacy
(
Panel
)
SlidesLive Video » Panel on Privacy: Kristen Vaccaro, Nicholas Carlini, Miles Brundage, and Jack Balkin (Moderated by Katherine Lee and Deep Ganguli) |
🔗 |
-
|
Title: Ignore the Law: The Legal Risks of Prompt Injection Attacks on Large Language Models; Author(s): Ram Shankar Siva Kumar, Jonathon Penney
(
Poster
)
|
🔗 |
-
|
Title: Machine Learning Has A Fixation Problem; Author(s): Katrina Geddes
(
Poster
)
|
🔗 |
-
|
Title: From Algorithmic Destruction to Algorithmic Imprint: Generative AI and Privacy Risks Linked to Potential Traces of Personal Data in Trained Models; Author(s): Lydia Belkadi, Catherine Jasserand
(
Poster
)
|
🔗 |
-
|
Title: Developing Methods for Identifying and Removing Copyrighted Content from Generative AI Models; Author(s): Krishna Sri Ipsit Mantri, Nevasini NA Sasikumar
(
Poster
)
|
🔗 |
-
|
Title: Takeaways from Data Extraction and Unlearning for Law; Author(s): Jaydeep Borkar
(
Poster
)
|
🔗 |
-
|
Title: AI and the EU Digital Markets Act: Addressing the Risks of Bigness and Dominance in Generative AI; Author(s): Andrew Chong, et al.
(
Poster
)
|
🔗 |
-
|
Title: How can we manage the risks and liabilities associated with legal translation in the age of machine translation and generative AI?; Author(s): Argyri Panezi, John O Shea
(
Poster
)
|
🔗 |
-
|
Title: Generative AI and the Future of Financial Advice Regulation; Author(s): Talia Gillis, Sarith Felber, Itamar Caspi
(
Poster
)
|
🔗 |
-
|
Title: Exploring Antitrust and Platform Power in Generative AI; Author(s): Konrad Kollnig, Qian Li
(
Poster
)
|
🔗 |
-
|
Title: PoT: Securely Proving Legitimacy of Training Data and Logic for AI Regulation; Author(s): Hongyang Zhang, Haochen Sun
(
Poster
)
|
🔗 |
-
|
Title: When Synthetic Data Met Regulation; Author(s): Georgi Ganev
(
Poster
)
|
🔗 |
-
|
Title: Provably Confidential Language Modelling Author(s): Xuandong Zhao, Lei Li, Yu-Xiang Wang
(
Poster
)
|
🔗 |
-
|
Title: The Extractive-Abstractive Axis: Measuring Content ’Borrowing’ in Generative Language Models; Author(s): Nedelina Teneva
(
Poster
)
|
🔗 |
-
|
Title: Anticipating and Mitigating Unsafe and Harmful Outcomes with Generative Language Models: The Role and Limits of Laws; Author(s): Inyoung Cheong, Aylin Caliskan, Tadayoshi Kohno
(
Poster
)
|
🔗 |
-
|
Title: Reclaiming the Digital Commons: A Public Data Trust for Training Data; Author(s): Alan Chan, Herbie Bradley, Nitarshan Rajkumar
(
Poster
)
|
🔗 |
-
|
Title: Chain Of Reference prompting helps LLM to think like a lawyer Author(s): Nikon Rasumov-Rahe, Aditya Kuppa, Marc Voses
(
Poster
)
|
🔗 |
-
|
Title: Compute and Antitrust: Regulatory implications of the AI hardware supply chain, from chip design to foundation model APIs; Author(s): Haydn Belfield, Shin-Shin Hua
(
Poster
)
|
🔗 |
-
|
Title: Consent-to-train Metadata for a Machine Learning World; Author(s): Daphne E Ippolito, Yun William Yu
(
Poster
)
|
🔗 |
-
|
Title: When is Copying Fair? Exploring the Copyright Implications of Andy Warhol Foundation v. Goldsmith for Generative AI; Author(s): Tiffany Georgievski
(
Poster
)
|
🔗 |
-
|
Title: Licensing Training Data and Attributing Copyright of Derivative Content From Large Language Models Can Resolve Up- and Downstream Copyright Issues; Author(s): Brian L Zhou, Lakshmi Sritan R Motati
(
Poster
)
|
🔗 |
-
|
Title: The Limited Relevance of Fair Use: Legal Implications of Training LLMs on Copyrighted Text; Author(s): Noorjahan Rahman
(
Poster
)
|
🔗 |
-
|
Title: Applying Torts to Juridical Persons: Corporate and AI Governance; Author(s): Aaron Tucker
(
Poster
)
|
🔗 |
-
|
Title: Differential Privacy vs Detecting Copyright Infringement: A Case Study Using Normalizing Flows; Author(s): Saba Amiri, Eric Nalisnick, Adam Belloum, Sander Klous, Leon Gommans
(
Poster
)
|
🔗 |
-
|
Title: Gradient Surgery for One-shot Unlearning on Generative Model; Author(s): Seohui Bae, Seoyoon Kim, Hyemin Jung, Woohyung Lim
(
Poster
)
|
🔗 |
-
|
Title: Protecting Visual Artists from Generative AI: A Multidisciplinary Perspective; Author(s): Eunseo Choi
(
Poster
)
|
🔗 |
-
|
Title: The Restatement (Artificial) of Torts; Author(s): Colin Doyle
(
Poster (Spotlight)
)
|
🔗 |
-
|
Title: The Data Provenance Initiative; Author(s): Shayne Longpre, et al.
(
Poster (Spotlight)
)
|
🔗 |
-
|
Title: Break It Till You Make It: Limitations of Copyright Liability Under a Pre-training Paradigm of AI Development; Author(s): Rui-Jie Yew, Dylan Hadfield-Menell
(
Poster (Spotlight)
)
|
🔗 |
-
|
Title: Diffusion Art or Digital Forgery? Investigating Data Replication in Stable Diffusion; Author(s): Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas A. Geiping, Tom Goldstein
(
Poster (Spotlight)
)
|
🔗 |
-
|
Title: Measuring the Success of Diffusion Models at Imitating Human Artists; Author(s): Stephen Casper, et al.
(
Poster (Spotlight)
)
|
🔗 |
-
|
Talk
(
Recorded Talk
)
|
FatemehSadat Mireshghallah 🔗 |
Author Information
Katherine Lee (Google DeepMind)
A. Feder Cooper (Cornell University)
FatemehSadat Mireshghallah (University of California San Diego)
Madiha Zahrah (Cornell Tech)
Madiha is a Ph.D. student at Cornell Tech. Her research interests are at the intersection of technology, law and privacy. Specifically, she is investigating the ways that communities use technical and legal affordances to express and enact their shared values, with a particular emphasis on privacy and openness.
James Grimmelmann (Cornell)

I’m a professor at Cornell Law School and Cornell Tech, where I direct CTRL-ALT, the Cornell Tech Research Lab in Applied Law and Technology. I study how laws regulating software affect freedom, wealth, and power. I try to help lawyers and technologists understand each other. My research interests include search engines, digital copyright, online governance, content moderation, and other topics in computer and Internet law.
David Mimno (Cornell University)
Deep Ganguli (Anthropic)
Ludwig Schubert

Formerly researching mechanistic interpretability with Chris Olah at Google Brain, OpenAI; supporting clear explanations of machine learning at Distill.pub. Currently unaffiliated on sabbatical while supporting friends' research.
More from the Same Authors
-
2021 : DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? »
Archit Uniyal · Rakshit Naidu · Sasikanth Kotti · Patrik Joslin Kenfack · Sahib Singh · FatemehSadat Mireshghallah -
2021 : Benchmarking Differential Privacy and Federated Learning for BERT Models »
Priyam Basu · Rakshit Naidu · Zumrut Muftuoglu · Sahib Singh · FatemehSadat Mireshghallah -
2022 : Memorization in NLP Fine-tuning Methods »
FatemehSadat Mireshghallah · FatemehSadat Mireshghallah · Archit Uniyal · Archit Uniyal · Tianhao Wang · Tianhao Wang · David Evans · David Evans · Taylor Berg-Kirkpatrick · Taylor Berg-Kirkpatrick -
2023 : Counterfactual Memorization in Neural Language Models »
Chiyuan Zhang · Daphne Ippolito · Katherine Lee · Matthew Jagielski · Florian Tramer · Nicholas Carlini -
2023 : Probing Heterogeneous Pretraining Datasets with Small Curated Datasets »
Gregory Yauney · Emily Reif · David Mimno -
2023 : Data Similarity is Not Enough to Explain Language Model Performance »
Gregory Yauney · Emily Reif · David Mimno -
2023 : Talk »
FatemehSadat Mireshghallah -
2021 Poster: On-the-fly Rectification for Robust Large-Vocabulary Topic Inference »
Moontae Lee · Sungjun Cho · Kun Dong · David Mimno · David Bindel -
2021 Spotlight: On-the-fly Rectification for Robust Large-Vocabulary Topic Inference »
Moontae Lee · Sungjun Cho · Kun Dong · David Mimno · David Bindel -
2020 Poster: Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks »
Ahmed T. Elthakeb · Prannoy Pilligundla · FatemehSadat Mireshghallah · Alexander Cloninger · Hadi Esmaeilzadeh