Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery
Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design
Alex Hawkins-Hooker · Jakub Kmec · Oliver Bent · Paul Duckworth
Keywords: [ Protein Language Models ] [ fine tuning ] [ bayesian optimisation ] [ fitness prediction ]
Although various schemes have been proposed for exploiting the distributional knowledge captured by protein language models (PLMs) to enhance supervised fitness prediction and design, lack of head-to-head comparison across different prediction strategies and different classes of PLM has made it challenging to identify the best-performing methods. Here, we extend previously proposed ranking-based loss functions to adapt the likelihoods of family-based and masked protein language models, and demonstrate that the best configurations outperform state-of-the-art approaches based on frozen embeddings in the low-data setting. Furthermore, we propose ensembling strategies that exploit the strong dependence of the mutational distributions learned by PLMs on sequence context, showing that they can be used to guide efficient optimisation strategies over fitness landscapes.