Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

ProtMamba: a homology-aware but alignment-free protein state space model

Damiano Sgarbossa · Cyril Malbranke · Anne-Florence Bitbol

Keywords: [ protein fitness prediction ] [ Protein Design ] [ foundation model ] [ state space model ]


Abstract:

Protein design has important implications for drug discovery, personalized medicine, and biotechnology. Models based on multiple sequence alignments efficiently capture the evolutionary information in homologous protein sequences, but multiple sequence alignment construction is imperfect. We present ProtMamba, a homology-aware but alignment-free protein language model based on the Mamba architecture. In contrast with attention-based models, ProtMamba efficiently handles very long context, comprising hundreds of protein sequences. Trained on a large dataset of concatenated homologous sequences, ProtMamba combines autoregressive and masked language modeling through a fill-in-the-middle objective. We demonstrate ProtMamba’s usefulness for the generation of novel sequences and for fitness prediction. ProtMamba reaches competitive performance with other protein language models despite its smaller size, which sheds light on the importance of long-context conditioning.

Chat is not available.