Skip to yearly menu bar Skip to main content


IBM

Expo Talk Panel

Otter: Generating Tests from Issues to Validate SWE Patches

Toufique Ahmed

West Ballroom B
[ ]
Mon 14 Jul 8 a.m. PDT — 9 a.m. PDT

Abstract:

Recent SWE agents generate code to resolve issues. While great for productivity, such systems make good tests even more important. Unfortunately, most prior work on test generation assumes that the code under test already exists. Instead, we are looking at the case where the code patch that resolves the issue has not yet been written. We introduce Otter, an LLM-based solution for generating tests from issues. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planning stage. As of March 9, 2025, Otter is the SOTA for this scenario, topping the SWT-Bench Verified leaderboard.

Live content is unavailable. Log in and register to view live content