Deep learning virtual screening with active signature learning improves the identification of small-molecule modulators of complex phenotypes
Abstract
Phenotypic drug discovery holds promise for developing new medicines but is limited by throughput and scalability. Current application of AI to improve screening efficiency relied on single-use models trained on a phenotype-specific high throughput screen. We introduce a generalizable deep learning framework leveraging omics data to prioritize compounds for virtually any phenotype using a single model. We also developed a novel closed-loop active signature learning procedure to optimize the omics signature associated with a target phenotype. We trained our model on over 425,000 perturbation signatures and validated it using a new single-cell transcriptomics benchmark dataset profiling 88 perturbations across 10 cell lines. Our approach outperformed published methods by 15-80\% and led to a 16-19X increase in productivity in two hematology phenotypic discovery campaigns, providing the first experimental validation that deep learning and omics data can improve the productivity of phenotypic discovery in a real-world setting. We next demonstrated the ability of our active signature learning algorithm to refine hit compound prioritization and gain mechanistic insights through an integrative lab-in-the-loop framework. This approach enables rational drug design targeting complex phenotypes, ushering in a new era of drug discovery.