BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization
Abstract
Lay Summary
The rapid advancement of generative models has enabled the creation of large libraries of de novo molecules, yet assessing which of these are truly drug-like remains an unresolved challenge. Traditional rules and property-based filters offer only coarse approximations, and most learning-based models lack integration of biological context, relying heavily on molecular structure alone. Furthermore, the highly scattered nature of approved drugs in chemical space makes it difficult to define a boundary that captures drug-likeness without overgeneralization.To address this, we propose \textsc{BoundDr.E}, a deep one-class boundary learner that defines drug-likeness as a compact, data-driven region around approved drugs, without relying on negative samples. Our method iteratively refines this region via an Expectation-Maximization-like optimization and embeds molecules into a unified space that integrates both structural and biomedical knowledge through multi-modal mixup.Empirical results show strong and consistent performance across time-based, scaffold-based, and cross-dataset evaluations, as well as in zero-shot toxic compound filtering. These findings suggest that BoundDr.E provides a robust and biologically grounded framework for drug-likeness prediction, offering a scalable solution for prioritizing AI-generated compounds in early-stage drug discovery.