Timezone: »

Effective Surrogate Models for Protein Design with Bayesian Optimization
Nate Gruver · Nate Gruver

Bayesian optimization, which uses a probabilistic surrogate for an expensive black-box function, provides a framework for protein design that requires a small amount of labeled data. In this paper, we compare three approaches to constructing surrogate models for protein design on synthetic benchmarks. We find that neural network ensembles trained directly on primary sequences outperform string kernel Gaussian processes and models built on pretrained embeddings. We show that this superior performance is likely due to improved robustness on out of distribution data. Transferring these insights into practice, we apply our approach to optimizing the Stoke's shift of green fluorescent protein, discovering and synthesizing novel variants with improved functional properties.

Author Information

Nate Gruver (New York University)
Nate Gruver (New York University)

More from the Same Authors