Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Thu, Jul 9, 2026 • 7:00 PM – 8:00 PM PDT

Instruction Bleed: A Theory-Anchored Benchmark for Cross-Module Interference in Prompt-Composed Agents

Ching-Yu Lin ⋅ Yifan Liu

Project Page

Abstract

Transformer self-attention computes global pairwise interactions across its input, leaving no architectural isolation between concatenated prompt modules. Three architectural inductive biases — proactive interference, coverage-bounded compositional generalization, and format sensitivity — jointly predict cross-module behavioral interference not derivable from per-module testing, yet no current agent benchmark measures it. We contribute a theory-anchored benchmark protocol whose three perturbation channels (volume, content, form) each isolate one of the predicted mechanisms, with paired effect sizes and bootstrap CIs as the calibrated readout. On a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials), only the content channel produces a detectable effect (Cohen's d = 0.63, bootstrap 95% CI [+0.03, +0.31], excluding zero); volume and form CIs include zero, discriminatively localizing interference to coverage-bounded composition. We formalize compositional behavioral leakage (CBL) and derive falsifiable predictions framing the multi-system replication program.