Forget to Know, Remember to Use: Context-Aware Unlearning for Large Language Models
Abstract
Large language models can memorize information that must be removed--ranging from copyright-sensitive content (e.g., book chapters) to personally identifiable information (e.g., income)--to ensure responsible and compliant behavior. Unlearning has emerged as an efficient alternative to full retraining, aiming to remove specific knowledge. However, users may still expect model to leverage the removed information when it is re-introduced in the prompt. Existing evaluations of unlearning methods focus on (1) the extent of forgetting of the target knowledge (forget set) and (2) performance preservation on the retain set (i.e., utility), but overlook this critical usability dimension. Through a systematic evaluation of six state-of-the-art unlearning methods, we show that they consistently degrade such \emph{contextual utility}--the model's ability to use forgotten knowledge when it is provided in context. To address this, we augment unlearning objectives with a plug-in term that explicitly preserves contextual utility. Extensive experiments demonstrate that our approach restores contextual utility to near original levels while still maintaining effective forgetting and retain-set utility.