STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories
Abstract
Discrete biological sequence optimization demands iterative refinement while satisfying strict syntactic constraints. Diffusion-based approaches provide strong progressive refinement but are not naturally aligned with discrete, grammar-constrained edit operations, whereas autoregressive LLMs readily produce valid sequences yet often lack explicit long-horizon planning. To close this gap, we introduce STRIDE (Sequence Trajectory Refinement via Internalized Denoising Emulation), a post-training framework that recasts optimization as an intrinsic reasoning problem in edit space. Rather than relying on external agentic search loops, STRIDE trains an LLM to emit a full trajectory of atomic edits as explicit Chain-of-Thought, effectively internalizing a trajectory-based refinement policy under discrete constraints. We instantiate STRIDE with a curriculum that combines supervised fine-tuning on Levenshtein-aligned shortest-edit demonstrations with GRPO-style reinforcement learning (and variants) to align edit trajectories with task rewards. Across protein and molecule optimization benchmarks, STRIDE consistently outperforms a diverse set of baselines, while producing candidates that maintain high structural validity and achieve improved target properties.