Skip to yearly menu bar Skip to main content

Workshop Poster
Workshop: ICML 2021 Workshop on Computational Biology

Improving confident peptide identifications across mass spectrometry runs by learning deep representations of TIMS-MS1 features

Soroor Hediyeh-zadeh


It is known that more than 100,000 detectable peptide species elute in single shotgun proteomics runs. The mass spectrometer, however, only selects a small subset of most abundant peptides for sequencing at each survey scan in a run. This compromises consistent quantification of peptides across runs, leading to the prevalent problem of missing values. When a peptide is identified by sequencing, its MS1 measurements are known to the experimenter. Therefore, peptide identities can be transferred between runs based on similarity of MS1 attributes. The accuracy of the existing approaches to peptide identity propagation (PIP) is limited by the selection of runs used as reference for information propagation, and the specified tolerance in deviation of MS1 measurements. These approaches are also inherently limited by the lack of probability measure to assign confidence and filter likely false positive results. We propose to learn the identity of query peptides by mapping them to a latent space of peptide MS1 representations. We then use this embedding space to propagate sequence information between runs. We observed that peptide sequences can have very small occurrences, so the embedding network had to be learned by few-shot learning frameworks. We also observed that the same peptide can occur at different retention gradient time in different studies, hampering the correct identification of peptides. We addressed this challenge by modifying the loss function of prototypical networks. We demonstrated that embedding MS1 attributes of the peptides and propagating sequence information on the embedding space can improve recovery of low abundance peptides in a small cancer dataset.

Chat is not available.