Workshop Poster
in
Workshop: ICML 2021 Workshop on Computational Biology
Effective Surrogate Models for Protein Design with Bayesian Optimization
Nate Gruver
Bayesian optimization, which uses a probabilistic surrogate for an expensive black-box function, provides a framework for protein design that requires a small amount of labeled data. In this paper, we compare three approaches to constructing surrogate models for protein design on synthetic benchmarks. We find that neural network ensembles trained directly on primary sequences outperform string kernel Gaussian processes and models built on pretrained embeddings. We show that this superior performance is likely due to improved robustness on out of distribution data. Transferring these insights into practice, we apply our approach to optimizing the Stoke's shift of green fluorescent protein, discovering and synthesizing novel variants with improved functional properties.