Expo Talk Panel
Otter: Generating Tests from Issues to Validate SWE Patches
Toufique Ahmed
West Ballroom B
[
Abstract
]
Mon 14 Jul 8 a.m. PDT
— 9 a.m. PDT
Abstract:
Recent SWE agents generate code to resolve issues. While great for productivity, such systems make good tests even more important. Unfortunately, most prior work on test generation assumes that the code under test already exists. Instead, we are looking at the case where the code patch that resolves the issue has not yet been written. We introduce Otter, an LLM-based solution for generating tests from issues. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planning stage. As of March 9, 2025, Otter is the SOTA for this scenario, topping the SWT-Bench Verified leaderboard.
Live content is unavailable. Log in and register to view live content