Copyright-Bench: Agentic Evaluation of Copyright Law Compliance
Abstract
Large language model (LLM) agents increasingly perform commercial tasks that involve retrieving external content such as images and, where appropriate, reproducing that content. LLM agents should comply with the law, including the laws of copyright. Yet today we lack adequate tools to assess whether they do so. To that end, we introduce Copyright-Bench, a benchmark designed to evaluate copyright law compliance of LLM agents. Copyright-Bench is comprised of realistic commercial tasks---website development, merchandise design, and corporate content production---that involve agents selecting between freely licensed content (the use of which is legal) and copyrighted content (the use of which is illegal at least in this setting). Notably, the evaluation introduces prompt variations that simulate different levels of user intent and time pressure. Comparing state-of-the-art agents against a human baseline, we find that: (1) LLM agents take actions that violate copyright law despite the availability of lawful alternatives; and (2) violation rates increase in response to user intent and under simulated time pressure.