Think Twice Before You Act: Protecting LLM Agents Against Tool Description Poisoning via Isolated Planning
Abstract
The integration of external tools has substantially expanded the capabilities of large language model (LLM) agents, but also introduced new attack surfaces beyond prompt injection. In particular, cross-tool description poisoning can manipulate planner-visible tool metadata to steer an agent’s trajectory, even if the poisoned tool itself is never chosen. To understand the effectiveness of existing attacks against this emerging threat, we evaluate several existing agent defenses against prompt-injection and find they transfer poorly to cross-tool description poisoning. Building on this insight, we propose Tool-Guard, a novel defense based on a new concept called isolated planning, in which tool invocations that are detected as misaligned or suspicious cause the corresponding tool to be placed in a quarantined list (the influenced list), breaking further influence from poisoned descriptions. With this influence isolated, the tool can continue to be used to support the task, enabling a robust defense that preserves legitimate tool utility. Experiments on the AgentDojo and ASB benchmarks show that \sysname substantially reduces attack success while maintaining high task utility.