Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

2Bits of Protein: Efficient Protein Language Models at the Scale of 2-bits

Ollie Turnbull · Mohamed Baioumy · Charlotte Deane

Keywords: [ Language Model ] [ quantization ] [ pLM ] [ low precision ] [ encoder ]


Abstract:

Protein language models have become an increasingly popular tool across various biological tasks, from variant effect prediction to novel sequence generation. However, state-of-the-art models often have up to billions of parameters. Such large model architectures restrict usage to groups with the necessary compute infrastructure or necessitate the use of cloud computing, incurring substantial costs and raising data privacy concerns. In this work, we investigate a ternary protein language model, which uses low-precision weights to reduce model size, energy demand, and computational requirements, making it suitable for operation on edge devices such as laptops. This addresses privacy concerns by ensuring data remains on-device and eliminates the costs associated with cloud services. We train a ternary protein language model and benchmark it against ESM-2 (8M) using the ProteinGym benchmark, demonstrating that our model achieves comparable performance while being more suitable for edge deployment. A discussion is provided on ways to improve the ternary model to outperform ESM-2 in terms of accuracy.

Chat is not available.