Before It Persists: Write-Time Defense for Multimodal Agent Memory
Abstract
Persistent memory makes multimodal agents more capable, but it also creates a new attack surface: once unsupported content is written into memory, later retrieval and consolidation can reuse it as if it were reliable state. We study write-time defense for multimodal agent memory. Our system, SAGE-Mem, separates transient evidence from durable belief : observations may be stored as evidence, but they are promoted to belief only when they are sufficiently supported, independent, and non-conflicting. This targets a gap left by retrieval-time defenses, which act only after poisoned content has already entered memory. We evaluate on LoCoMo-Adv, an adversarial multimodal extension of LoCoMo-10, and on MM-BrowseComp-Adv, a multimodal browsing benchmark covering answer-overwrite, OCR, vision-caption, and visual-prompt attacks. On LoCoMo-Adv, at a conservative operating point, SAGE-Mem eliminates observed write admission and retrieval contamination relative to a retrieval-time baseline, but reduces benign completion under attack (0.460 vs. 0.642). On the canonical browsing overwrite setting, BrowseGuard, a browsing-specific write policy built on the same principle, blocks all 388 direct and paraphrased overwrite attempts while keeping attacked utility near its clean level (0.155 vs. 0.160). On the broader five-attack browsing suite, extending the same guarded write policy across browser, OCR, and caption channels reduces Write ASR from 0.2552 to 0.0369 and Retrieval ASR from 0.5636 to 0.3694. Overall, the results suggest that for memory-bearing agents, robustness should be evaluated not only at retrieval, but also at the point where observations become persistent state.