Timezone: »

De novo mass spectrometry peptide sequencing with a transformer model
Melih Yilmaz · William Fondrie · Wout Bittremieux · Sewoong Oh · William Noble

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #112

Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids -the peptide- responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to address this de novo sequencing problem, challenges that arise when modeling tandem mass spectra have led to complex models that combine multiple neural networks and post-processing steps. We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with spectra with never-before-seen peptide labels. Casanovo not only achieves superior performance but does so at a fraction of the model complexity and inference time required by other methods.

Author Information

Melih Yilmaz (University of Washington)
Melih Yilmaz

I’m a PhD student at the Paul G. Allen School of Computer Science & Engineering, University of Washington, where I’m fortunate to be advised by William Noble. My research interests are in Machine Learning, Computational Biology and Biomedical Data Science. Focusing on proteomics, my current research builds deep learning methods to analyze mass spectrometry data. Previously, I was a Research Intern at Stanford BMIR with Tina Hernandez-Boussard. I received a B.Sc. in Electrical Engineering from Koc University, where I worked with Murat Tekalp.

William Fondrie (Talus Bioscience)
Wout Bittremieux (University of California San Diego)
Wout Bittremieux

Dr. Wout Bittremieux is a postdoctoral researcher at the Dorrestein Laboratory of the University of California San Diego. His research employs advanced computational techniques to solve fundamental biological questions, by developing algorithmic solutions and machine learning methods for the analysis of mass spectrometry-based proteomics and metabolomics data. He has developed several innovative tools to analyze large mass spectral data volumes, such as the ANN-SoLo tool for extremely fast open modification searching, the falcon tool for efficient spectrum clustering, and the GLEAMS deep neural network to efficiently process hundreds of millions of mass spectra. His work is ideally positioned at the intersection of bioinformatics, machine learning, and mass spectrometry. In 2020, Dr. Bittremieux received the prestigious Postdoctoral Career Development Award from the American Society for Mass Spectrometry, and in 2021 he was named a "Rising Star in Proteomics and Metabolomics" by the Journal of Proteome Research for demonstrating 'incredible originality and promise for the future of proteomics and metabolomics.' Dr. Bittremieux is an avid contributor to the computational mass spectrometry community. He has published widely in internationally recognized scientific journals, is an active member of the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), and leads the CompMS interest group of the International Society for Computational Biology (ISCB). Furthermore, he has contributed to the development of mass spectrometry data standards defined by the Proteomics Standards Initiative (PSI) and is an active developer of the Global Natural Products Social Molecular Networking (GNPS) platform, through which his tools reach tens of thousands of monthly users worldwide.

Sewoong Oh (University of Washington)
William Noble (University of Washington)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors