Beyond Sample-Level Forgetting: Improving Reliability in Multimodal Unlearning
Abstract
Multimodal unlearning aims to eliminate specific data from pretrained multimodal models, which offers significant advantages in data privacy and model efficiency. Current methods struggle to achieve the desired properties of effectiveness, reliability and locality, due to the complex interdependency of unimodal and multimodal knowledge. By introducing a causal perspective, we propose multimodal unlearning with decoupled knowledge components. To promote fine-grained understanding of multimodal context, we introduce Multimodal Variational Inference (MVI) to infer modal-specific and -consistent factors with incomplete sample observation. With foundation of decoupled knowledge, we propose contrastive semantic editing to regulate multimodal unlearning towards refined forgetting. Experiments on privacy- and copyright-sensitive scenarios validate effectiveness of our method across multiple scenarios, ensuring the unlearned model maintains high reliability and locality.