Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Synergy of Scientific and Machine Learning Modelling (SynS & ML) Workshop

Understanding Energy-Based Modeling of Proteins via an Empirically Motivated Minimal Ground Truth Model

Peter Fields · Wave Ngampruetikorn · Rama Ranganathan · David Schwab · Stephanie Palmer

Keywords: [ generative model ] [ Amino Acids ] [ DCA ] [ EBM ] [ Statistical Physics ] [ Potts Model ] [ proteins ]


Abstract: Energy-based models (EBM) of sequences of evolutionary related families of proteins have the ability to learn the generic constraints necessary to make novel functional sequences, which have been validated by $\textit{in vivo}$ experiments. However, these learned energy functions require re-scaling by a temperature parameter in order to sample novel functional sequences. Here, we generate data from a minimal model motivated by a wide array of empirical evidence for a synergistic cluster of amino acids, or sector, within a sequence. We find our setting captures salient learning behaviors similar to those exhibited by EBMs fitted to real proteins, namely the necessity for temperature tuning to increase generative performance. We discuss how this guides insight into the functional sequence space of proteins and suggest how our model may be exploited to further understanding of the essential functional features within protein sequences.

Chat is not available.