Workshop
Machine Unlearning for Generative AI
Vaidehi Patil · Mantas Mazeika · Yang Liu · Katherine Lee · Mohit Bansal · Viet Anh Nguyen
Generative AI models are trained on internet-scale datasets, yielding powerful capabilities but also introducing risks like copyright infringement, PII leakage, and harmful knowledge. Targeted removal or unlearning of sensitive data is challenging, as retraining on curated sets is computationally expensive, driving research into machine unlearning and model editing. Yet approaches like RLHF only suppress undesirable outputs, leaving underlying knowledge vulnerable to adversarial extraction. This raises urgent privacy, security, and legal concerns, especially under the EU’s GDPR “right to be forgotten”. Because neural networks encode information across millions of parameters, precise deletion without degrading performance is complex, and adversarial or whitebox attacks can recover ostensibly erased data. This workshop brings together experts in AI safety, privacy, and policy to advance robust, verifiable unlearning methods, standardized evaluation frameworks, and theoretical foundations. By achieving true erasure, we aim to ensure AI can ethically and legally forget sensitive data while preserving broader utility.
Live content is unavailable. Log in and register to view live content