Timezone: »

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson

Wed Jul 20 10:25 AM -- 10:30 AM (PDT) @ Hall F

Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing in silico and in vitro properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.

Author Information

Samuel Stanton (Prescient Design Genentech)
Wesley Maddox (New York University)
Nate Gruver (New York University)
Phillip Maffettone (BigHat Biosciences)
Emily Delaney (BigHat Biosciences)
Peyton Greenside (Bighat Biosciences)
Andrew Wilson (New York University)
Andrew Wilson

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors