ICML Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

Alex Hawkins-Hooker · Jakub Kmec · Oliver Bent · Paul Duckworth

Keywords: [ Protein Language Models ] [ fine tuning ] [ bayesian optimisation ] [ fitness prediction ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Although various schemes have been proposed for exploiting the distributional knowledge captured by protein language models (PLMs) to enhance supervised fitness prediction and design, lack of head-to-head comparison across different prediction strategies and different classes of PLM has made it challenging to identify the best-performing methods. Here, we extend previously proposed ranking-based loss functions to adapt the likelihoods of family-based and masked protein language models, and demonstrate that the best configurations outperform state-of-the-art approaches based on frozen embeddings in the low-data setting. Furthermore, we propose ensembling strategies that exploit the strong dependence of the mutational distributions learned by PLMs on sequence context, showing that they can be used to guide efficient optimisation strategies over fitness landscapes.

Chat is not available.

Spotlight in Workshop: Accessible and Efficient Foundation Models for Biological Discovery

Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

Alex Hawkins-Hooker · Jakub Kmec · Oliver Bent · Paul Duckworth

Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery