Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Thu, Jul 9, 2026 • 7:00 PM – 8:00 PM PDT

AIE-Bench: Benchmarking Agents That Build Agents

Abhishek Mishra ⋅ Selvam Palanimalai ⋅ Yogendra Manawat ⋅ Samuel Verboomen ⋅ Prannay Hebbar ⋅ Damir Vrabac ⋅ Deepak Nathani ⋅ Sumeet Motwani ⋅ Kunal Bhatia ⋅ Vignesh Baskaran

Project Page

Abstract

We introduce AIE Bench, a benchmark for measuring how well AI agents can build and improve other AI agents. Existing benchmarks evaluate whether an agent can solve tasks. This benchmark aims to measure whether an agent can modify another agent to make it better at those tasks. AIE Bench is built around two roles. A meta-agent proposes modifications, and a target-agent that is being improved. This setup covers meta-improvement, where one agent improves another, and self-improvement, where an agent improves itself. We instantiate AIE Bench across two task families spanning terminal interaction and tool calling, and we evaluate frontier agentic systems on their ability to drive gains through iterative modification. AIE Bench aims to make recursive agent improvement a measurable and reproducible research target.