Timezone: »

 
Optimizing protein fitness using Bi-level Gibbs sampling with Graph-based Smoothing
Andrew Kirjner · Jason Yim · Raman Samusevich · Tommi Jaakkola · Regina Barzilay · Ila R. Fiete
Event URL: https://openreview.net/forum?id=kuJ1d8r07a »

The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Bi-level Gibbs sampling with Graph-based Smoothing (BiGGS) which uses the gradients of a trained fitness predictor to sample many mutations towards higher fitness. Bi-level Gibbs first samples sequence locations then sequence edits. We introduce graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results.

Author Information

Andrew Kirjner (Massachusetts Institute of Technology)
Jason Yim (Massachusetts Institute of Technology)
Raman Samusevich (Czech Technical University of Prague)
Tommi Jaakkola (MIT)
Regina Barzilay (MIT CSAIL)
Regina Barzilay

Regina Barzilay is an Israeli-American computer scientist. She is a professor at the Massachusetts Institute of Technology and a faculty lead for artificial intelligence at the MIT Jameel Clinic. Her research interests are in natural language processing and applications of deep learning to chemistry and oncology.

Ila R. Fiete (MIT)

More from the Same Authors