Timezone: »

Learning inverse folding from millions of predicted structures
Chloe Hsu · Robert Verkuil · Jason Liu · Zeming Lin · Brian Hie · Tom Sercu · Adam Lerer · Alexander Rives

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #114

We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.

Author Information

Chloe Hsu (University of California, Berkeley)
Robert Verkuil (Facebook AI Research)
Jason Liu (Facebook AI Reseach)
Zeming Lin (New York University)
Brian Hie (Stanford University)
Tom Sercu (Facebook AI Research)
Adam Lerer (Facebook AI Research)
Alexander Rives (FAIR)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors