Hybrid-Gym: Training Coding Agents to Generalize Across Tasks
Abstract
Coding agents are increasingly used for a wide range of real-world tasks, from adding features and documentation to creating programs from scratch. Ideally, the agent should perform well across all the diverse tasks. However, most prior work concentrates on issue solving, and such single-task training does not transfer reliably to other coding tasks. In this work, we aim to train coding agents that generalize across tasks. We first analyze task transferability from two axes: the commonalities shared among coding tasks and the capabilities of current agents. Guided by these findings, we derive a set of principles for training task design and verify them through a series of controlled experiments. We then present Hybrid-Gym, a training dataset built on four scalable training tasks that follow these principles. Experiments show that, under zero-shot task transfer, Hybrid-Gym achieves performance comparable to in-domain training, and further improves existing datasets when combined with them (e.g., +4.85% on SWT-Bench-Verified).