Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

Compressing the Latent Space of Single-Sequence Protein Predictors for Multimodal Generation

Amy X. Lu · Wilson Yan · Vladimir Gligorijevic · Pieter Abbeel · Kevin Yang · Nathan Frey

Keywords: [ protein embeddings ] [ neural compression ] [ multimodal generation ]


Abstract: ESMFold learns a joint latent space of sequence and structure while requiring only sequence as input. However, the latent space of ESMFold is disorganized and we find pathologies, similar to those observed in large language models, that render these models unusable for multimodal representation learning. Meanwhile, latent diffusion in both continuous and discrete spaces have improved efficiency and performance in image and multimodal generation, but are built on an abundance of knowledge on autoencoders for images.To create a protein encoder which captures structural and functional information for generative modeling in the latent space, we create CHEAP (Compressed Hourglass Embedding Adaptations of Proteins) representations, and find that the channel dimension of ESMFold latent spaces can be compressed by up to $256\times$ while retaining rich structural, sequence, and functional information, as demonstrated on protein understanding benchmarks and reconstruction performance.

Chat is not available.