Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild
Extrapolative Protein Design through Triplet-based Preference Learning
Mostafa Karimi · Sharmi Banerjee · Tommi Jaakkola · Bella Dubrov · Shang Shang · Ron Benson
Keywords: [ Extrapolative Biological Design ] [ Protein Language Models ] [ Protein Design ] [ preference learning ]
Extrapolative protein design is a crucial task for automated drug discovery to design proteins with higher fitness than what has been seen in training (eg. higher stability, tighter binding affinity, etc.). The current state-of-the-art methods assume that one can safely steer protein design in the extrapolation region by learning from pairs alone. We hypothesize that (1) noisy pairs do not accurately approximate gradient to improve fitness (2) it is challenging for the models to learn higher order relationships among designs (triplets, etc) from noisy pairs alone. Motivated by the success of alignment in large language models, we have developed an extrapolative protein design via triplet-based preference learning for both better approximation of gradient and directly modeling ranks of triplets fitness. We evaluated our model's performance in designing AAV and GFP proteins and demonstrated that the proposed framework significantly improves the generative models' effectiveness in extrapolation tasks.