P0884 Revealing Hidden Proteomes and Peptidomes

Nuno F Bandeira , University of California, San Diego, La Jolla, CA
The dominant paradigm for high-throughput protein identification is based on enzymatic digestion of proteins into peptides followed by tandem mass spectrometry to generate MS/MS spectra that are then computationally matched one spectrum at a time against protein sequence databases. While this paradigm has enabled nearly all large-scale proteomics studies to date, its typical low spectrum identification rate of only 15-20% remains a serious limitation that further worsens when analyzing endogenous peptides or species with poorly-annotated genomes. For example, the limitations when searching for post-translational modifications (PTMs) are so dire that most labs still only allow for 4-8 PTMs per search (about half or which due to sample handling procedures) even though more than 500 PTMs are currently known. We will describe new computational developments enabling up to 2-fold increases in peptide identifications, 4-fold improvements in identification of complex PTMs, automated discovery of unexpected and highly-modified modified peptides and sequencing-grade de novo sequencing of full-length polymorphic proteins. In addition to traditional methods that interpret each spectrum in isolation, we will also focus on a novel paradigm that determines consensus interpretations for multiple spectra from related peptides. Computational methods and tools developed at UCSD’s Center for Computational Mass Spectrometry are easily accessible and freely available at the center’s website (http://proteomics.ucsd.edu) and have already enabled searching hundreds of millions of spectra from over one thousand researchers.