Poster
in
Workshop: Structured Probabilistic Inference and Generative Modeling
Optimizing protein fitness using Bi-level Gibbs sampling with Graph-based Smoothing
Andrew Kirjner · Jason Yim · Raman Samusevich · Tommi Jaakkola · Regina Barzilay · Ila R. Fiete
Keywords: [ discrete optimization ] [ MCMC ] [ Protein Design ] [ proteins ]
The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Bi-level Gibbs sampling with Graph-based Smoothing (BiGGS) which uses the gradients of a trained fitness predictor to sample many mutations towards higher fitness. Bi-level Gibbs first samples sequence locations then sequence edits. We introduce graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results.