RetrOrchestrator: A Multi-Step Retrosynthesis Agent Dynamically Orchestrating Single-Step Transition Models
Abstract
Multi-step retrosynthesis planning is a fundamental challenge in organic chemistry, defined by its enormous search space. Existing methods typically formulate it as a Markov Decision Process (MDP) with a fixed choice of transition model (i.e., a single-step retrosynthesis model), and focus on improving how to search through better policies and value functions. However, how the transition space itself is navigated remains largely unexplored. This limitation is particularly urgent given our observation of pronounced skill disparity among single-step prediction models: different models exhibit substantially different performance across molecule states. Motivated by this observation, we introduce RetrOrchestrator, an LLM-powered agent that explicitly accounts for model skill disparity by reframing retrosynthesis planning as a Partially Observable Markov Decision Process (POMDP). By regarding each single-step prediction model as a tool, we further propose a scaffold-aware reinforcement learning algorithm to optimize navigation policy within the transition space. As a result, RetrOrchestrator jointly searches which molecule to expand and which single-step model to apply for the molecule at the current step. Empirically, RetrOrchestrator significantly outperforms static baselines on the Retro*-190 benchmark, achieving a state-of-the-art 94.21\% success rate as well as a Pareto front in both wallclock time and number of model queries.