Proximal Decoding: Provably Reducing Copyright Risk for Any Language Model
Jacqueline He ⋅ Jonathan Hayase ⋅ Scott Yih ⋅ Sewoong Oh ⋅ Luke Zettlemoyer ⋅ Pang Wei Koh
Abstract
Modern language models (LMs) tend to memorize portions of their training data and reproduce verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Proximal Decoding, a plug-and-play inference-time method for suppressing verbatim reproduction: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Proximal Decoding does so by adaptively allocating a user-chosen information budget over the generation trajectory and enforcing per-step constraints that yield a sequence-level guarantee, enabling a tunable risk–utility trade-off. To make Proximal Decoding practically useful, we introduce a new permissively trained safe model (Comma 1.7B), as well as Proximal$\_{\mathrm{Byte}}$, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler (Hayase et al., 2025) framework. We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Proximal and Proximal$\_{\mathrm{Byte}}$ define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75\% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.
Successful Page Load