GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design
Abstract
Designing genetic circuits, which are biological systems capable of programmed behaviors within living cells, remains a laborious, expert-driven process despite decades of progress in synthetic biology. We introduce GenCircuit-RL, a reinforcement learning framework that trains language models to reason about genetic circuit design through code generation, where models produce Python code using PySBOL to construct circuits in the standardized Synthetic Biology Open Language (SBOL) format. Our approach addresses the challenge of sparse feedback in biological design through hierarchical verification rewards that decompose correctness into five levels, from code execution through structural validity to functional behavior, providing dense learning signal while multiplicative dependencies prevent reward hacking. We contribute SynBio-Reason, a benchmark of approximately 4,753 circuits spanning six canonical circuit types and nine tasks from code repair to de novo design, with held-out biological parts enabling rigorous out-of-distribution evaluation. A four-stage curriculum progressively shifts optimization pressure from basic code generation toward functional correctness, enabling models to acquire compositional reasoning capabilities incrementally. Our framework demonstrates that hierarchical verification combined with curriculum learning enables compact language models to generate functionally correct genetic circuits, including generalization to novel biological parts and rediscovery of canonical designs from synthetic biology literature.