Steering Large Language Models through the DMTA Cycle: Structure-Based Drug Design via Knowledge-Driven Bi-Level Thompson Sampling
Abstract
Structure-based drug design (SBDD) can be effectively realized through an iterative refinement via the Design-Make-Test-Analyze (DMTA) cycle, which is a common workflow used by human experts. However, most LLMs function as one-shot generators that lack feedback mechanisms, leaving the DMTA loop disconnected. In this work, we propose K-BTS, a Knowledge-Driven Bi-level Thompson Sampling framework that formalizes iterative SBDD as a Dynamic Hierarchical Multi-Armed Bandit problem. K-BTS closes the DMTA loop by decoupling decisions into two levels: an upper-level policy that prioritizes high-potential molecular lineages and a lower-level mechanism that retrieves explicit chemical rules to guide LLM generation. By integrating a dual-level Bayesian update, the framework transforms sparse docking scores into reusable experience. On the CrossDocked2020 benchmark, K-BTS achieves a state-of-the-art Top-1 average docking score. The results from diverse dimensions show that K-BTS ensures search determinism through a smooth, monotonic convergence that synchronizes structural drift with affinity improvement.