Poster
in
Workshop: AI for Science: Scaling in AI for Scientific Discovery
Ensemble Guidance: Towards Generative 3D SBDD in Bioactive Chemical Spaces
Charles Harris · Arian Jamasb · Pietro LiĆ³ · Tom Blundell
Keywords: [ Diffusion Models ] [ molecule design ] [ Generative modelling ]
Many works use diffusion generative modelling for 3D Structure-based Drug Design. %However, one critical unaddressed issue thus far is the datasets these models are trained on. The data these models are trained on are predominantly sourced from the Protein Data Bank (PDB); these datasets capture a severely constrained and skewed subset of chemical space, heavily biasing generated molecules to be non-drug like whilst significantly narrowing the diversity of the chemical landscapes generative models observe during training.While there is some evidence these methods can generate complimentary molecules, this raises concerns about efficacy in novel hit discovery compared to virtual screening of large molecule libraries. Here, we introduce ensemble guidance, a technique for composing learned distributions from multiple diffusion models to guide SBDD models to generate molecules with more appropriate properties and higher diversity. For example, ensemble guidance reduces the frequency of highly polar phosphate groups from 0.32 per molecule to 0.Finally, we propose many areas of future work and hope that ensemble guidance can be fruitfully applied to a number of other (bio)molecular design tasks in data-limited regimes.