Skip to yearly menu bar Skip to main content

Workshop Poster
Workshop: ICML 2021 Workshop on Computational Biology

Effective Surrogate Models for Protein Design with Bayesian Optimization

Nate Gruver


Bayesian optimization, which uses a probabilistic surrogate for an expensive black-box function, provides a framework for protein design that requires a small amount of labeled data. In this paper, we compare three approaches to constructing surrogate models for protein design on synthetic benchmarks. We find that neural network ensembles trained directly on primary sequences outperform string kernel Gaussian processes and models built on pretrained embeddings. We show that this superior performance is likely due to improved robustness on out of distribution data. Transferring these insights into practice, we apply our approach to optimizing the Stoke's shift of green fluorescent protein, discovering and synthesizing novel variants with improved functional properties.

Chat is not available.