SIA-W: Self-Improving Agents with Test-Time Weight Updates
Abstract
We introduce SIA-W, a self-improving agent framework that jointly optimises the agent scaffold and the model weights. Scaffold iteration is a powerful first lever: by evolving tools, prompts, and execution harnesses across generations, the agent rapidly builds domain-adapted search and reasoning procedures. SIA-W then compounds these gains with a second lever, test-time RL, which adapts the model weights directly on task feedback once the scaffold has matured. Across three diverse research tasks spanning law (charge classification), systems (GPU kernel optimisation), and biology (single-cell denoising), combining both levers delivers substantial gains over scaffold iteration alone: 16 percentage points on LawBench, 19% runtime reduction on GPU kernels, and 19% improvement on denoising, with weight updates surfacing domain knowledge that complements what the harness builds.