Timezone: »
Bayesian optimization, which uses a probabilistic surrogate for an expensive black-box function, provides a framework for protein design that requires a small amount of labeled data. In this paper, we compare three approaches to constructing surrogate models for protein design on synthetic benchmarks. We find that neural network ensembles trained directly on primary sequences outperform string kernel Gaussian processes and models built on pretrained embeddings. We show that this superior performance is likely due to improved robustness on out of distribution data. Transferring these insights into practice, we apply our approach to optimizing the Stoke's shift of green fluorescent protein, discovering and synthesizing novel variants with improved functional properties.
Author Information
Nate Gruver (New York University)
Nate Gruver (New York University)
More from the Same Authors
-
2022 : On Feature Learning in the Presence of Spurious Correlations »
Pavel Izmailov · Polina Kirichenko · Nate Gruver · Andrew Wilson -
2023 : Protein Design with Guided Discrete Diffusion »
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson -
2022 Spotlight: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson -
2022 Poster: Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders »
Samuel Stanton · Wesley Maddox · Nate Gruver · Phillip Maffettone · Emily Delaney · Peyton Greenside · Andrew Wilson