Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Security–Fidelity Tradeoffs: No Universal Defense Against Prompt Injection

Mitchell Hermon ⋅ Rahul Gupta ⋅ Weitong Ruan ⋅ Ekraam Sabir ⋅ Haohan Wang

Abstract

We identify a fundamental tension in securing LLMs: the \textbf{security--fidelity tradeoff}. While defenses against indirect prompt injection are becoming more robust, we show that they inevitably impair the model's ability to process benign, instruction-like text. Current evaluations miss this cost because they conflate utility with fidelity. We address this gap with \textsc{SecFid}, a benchmark that uses behaviorally separable probes to unambiguously distinguish between resisting an attack, succumbing to it, and faithfully processing it as data. Our evaluation reveals this tradeoff across a diverse set of models and highlights how the strongest defenses achieve security often by aggressively suppressing valid content, causing fidelity failure rates up to 50\% on translation. We ground these results in a decision-theoretic framework, proving that when benign and adversarial inputs overlap, no universal defense exists. Therefore, optimal robustness is strictly task-dependent, determined by an application’s tolerance for fidelity errors versus security failures.