Quantifying Frontier LLM Capabilities for Container Sandbox Escape
Abstract
Large Language Models (LLMs) increasingly act as autonomous agents with tool use, ability to execute code, file I/O, and network access. These capabilities create novel security risks. To mitigate these risks, agents are often deployed and evaluated in isolated environments commonly referred to as sandboxes, with Docker or OCI as one of the most popular container runtimes for sandbox implementations. We introduce SandboxEscapeBench, an open benchmark that safely measures an LLM's capacity to break out of these sandboxes. The benchmark is implemented as an \texttt{Inspect AI} Capture the Flag (CTF) evaluation utilising a nested sandbox architecture with the outer layer containing the flag and no known vulnerabilities. Following a threat model of a motivated adversarial agent with shell access inside a container, \bench covers a spectrum of sandbox-escape mechanisms spanning misconfiguration, privilege allocation mistakes, kernel flaws, and runtime/orchestration weaknesses. We find that, when vulnerabilities are added, LLMs are able to identify and exploit them, showing that use of evaluation like \bench is needed to ensure sandboxing continues to provide the encapsulation needed for highly-capable models.