Spotlight
in
Workshop: AI for Science: Scaling in AI for Scientific Discovery
NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries
Ewa M. Nowara · Pedro O. Pinheiro · Sai Pooja Mahajan · Omar Mahmood · Andrew Watkins · Saeed Saremi · Michael Maser
Keywords: [ Generative Models ] [ Drug discovery ] [ Voxel Structures ] [ Machine Learning ] [ molecules ] [ 3D Generation ]
We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random noise (pinheiro20233d). However, sampling in 3D-voxel space is computationally expensive and use in library generation is prohibitively slow. Here, we instead perform neural empirical Bayes sampling (saremi2019neural) in the learned latent space of a vector-quantized variational autoencoder. NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality. Moreover, NEBULA generalizes better to unseen drug-like molecules, as demonstrated on two public datasets and multiple recently released drugs. We expect the approach herein to be highly enabling for machine learning-based drug discovery. Code will be publicly released upon acceptance.