Poster
in
Workshop: 2nd Workshop on Generative AI and Law (GenLaw ’24)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi · Jaechan Lee · Yangsibo Huang · Sadhika Malladi · Jieyu Zhao · Ari Holtzman · Daogao Liu · Luke Zettlemoyer · Noah Smith · Chiyuan Zhang
Language models (LMs) are trained on extensive text data, potentially containing private and copyrighted material. Data owners might request the removal of their data from the model due to privacy or copyright concerns. However, removing only the specific data points—essentially retraining without them—is impractical in today’s models. This has led to the development of numerous approximate unlearning algorithms. Traditionally, the evaluation of these algorithms has been limited, failing to accurately assess their effectiveness and practicality from the perspectives of both model deployers and data owners.To address this, we propose MUSE, a comprehensive machine unlearning evaluation benchmark that outlines six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. We applied these criteria to evaluate the effectiveness of eight popular unlearning algorithms on 7B-parameter LMs, specifically unlearning content from Harry Potter books and news articles.Our results show that while most algorithms can prevent verbatim and knowledge memorization to varying extents, only one algorithm avoids severe privacy leakage. Moreover, the existing algorithms often fail to meet deployers’ expectations as they degrade the general utility of the model and struggle to handle successive unlearning requests or large-scale content removal effectively. Our findings highlight significant practical issues with current unlearning algorithms for language models, prompting the release of our benchmark to encourage further evaluations.