Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Generative AI and Law (GenLaw ’24)

“Heart on My Sleeve”: From Memorization to Duty

Nathan Reitinger


Abstract: Do machine learning models store protected content; can machine learning models infringe on copyright? This early-stage law review Article answers that question with empirical data: yes. A set of unconditional image generators, diffusion models ($n=14$), are trained on small slices of the CelebA dataset (i.e., up to 30K images from a dataset filled with pictures of celebrities' faces). The output from these generators (i.e., a synthetic image) is then compared to training data using a variety of similarity metrics. As the empirical data shows, the question is not \textit{can} models contain copyrighted works, but \textit{do} models contain copyright works. In some cases, there is a 99\% chance that a model will generate an image nearly identical to its training data; in other cases, even after 10,000 generations, a model does not produce any images that may be considered identical (though finding similarity is nonetheless possible). The Article uses this empirical data to argue for a series of duties to be placed on model owners---a necessity, as it is argued, to ensure the continued progress of the sciences and useful arts.

Chat is not available.