Timezone: »
Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids -the peptide- responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to address this de novo sequencing problem, challenges that arise when modeling tandem mass spectra have led to complex models that combine multiple neural networks and post-processing steps. We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with spectra with never-before-seen peptide labels. Casanovo not only achieves superior performance but does so at a fraction of the model complexity and inference time required by other methods.
Author Information
Melih Yilmaz (University of Washington)

I’m a PhD student at the Paul G. Allen School of Computer Science & Engineering, University of Washington, where I’m fortunate to be advised by William Noble. My research interests are in Machine Learning, Computational Biology and Biomedical Data Science. Focusing on proteomics, my current research builds deep learning methods to analyze mass spectrometry data. Previously, I was a Research Intern at Stanford BMIR with Tina Hernandez-Boussard. I received a B.Sc. in Electrical Engineering from Koc University, where I worked with Murat Tekalp.
William Fondrie (Talus Bioscience)
Wout Bittremieux (University of California San Diego)

Dr. Wout Bittremieux is a postdoctoral researcher at the Dorrestein Laboratory of the University of California San Diego. His research employs advanced computational techniques to solve fundamental biological questions, by developing algorithmic solutions and machine learning methods for the analysis of mass spectrometry-based proteomics and metabolomics data. He has developed several innovative tools to analyze large mass spectral data volumes, such as the ANN-SoLo tool for extremely fast open modification searching, the falcon tool for efficient spectrum clustering, and the GLEAMS deep neural network to efficiently process hundreds of millions of mass spectra. His work is ideally positioned at the intersection of bioinformatics, machine learning, and mass spectrometry. In 2020, Dr. Bittremieux received the prestigious Postdoctoral Career Development Award from the American Society for Mass Spectrometry, and in 2021 he was named a "Rising Star in Proteomics and Metabolomics" by the Journal of Proteome Research for demonstrating 'incredible originality and promise for the future of proteomics and metabolomics.' Dr. Bittremieux is an avid contributor to the computational mass spectrometry community. He has published widely in internationally recognized scientific journals, is an active member of the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), and leads the CompMS interest group of the International Society for Computational Biology (ISCB). Furthermore, he has contributed to the development of mass spectrometry data standards defined by the Proteomics Standards Initiative (PSI) and is an active developer of the Global Natural Products Social Molecular Networking (GNPS) platform, through which his tools reach tens of thousands of monthly users worldwide.
Sewoong Oh (University of Washington)
William Noble (University of Washington)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: De novo mass spectrometry peptide sequencing with a transformer model »
Wed. Jul 20th 03:00 -- 03:05 PM Room Hall G
More from the Same Authors
-
2022 Poster: MAML and ANIL Provably Learn Representations »
Liam Collins · Aryan Mokhtari · Sewoong Oh · Sanjay Shakkottai -
2022 Spotlight: MAML and ANIL Provably Learn Representations »
Liam Collins · Aryan Mokhtari · Sewoong Oh · Sanjay Shakkottai -
2021 Poster: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Spotlight: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Poster: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2021 Spotlight: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2020 Poster: Optimal transport mapping via input convex neural networks »
Ashok Vardhan Makkuva · Amirhossein Taghvaei · Sewoong Oh · Jason Lee -
2020 Poster: InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs »
Zinan Lin · Kiran Thekumparampil · Giulia Fanti · Sewoong Oh -
2020 Poster: Meta-learning for Mixed Linear Regression »
Weihao Kong · Raghav Somani · Zhao Song · Sham Kakade · Sewoong Oh