Benchmarking mass spectrometry based proteomics algorithms using a simulated database

Awan, Muaaz Gul; Awan, Abdullah Gul; Saeed, Fahad

doi:10.1007/s13721-021-00298-3

Benchmarking mass spectrometry based proteomics algorithms using a simulated database

Short Communication
Published: 26 March 2021

Volume 10, article number 23, (2021)
Cite this article

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

331 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Protein sequencing algorithms process data from a variety of instruments that has been generated under diverse experimental conditions. Currently there is no way to predict the accuracy of an algorithm for a given data set. Most of the published algorithms and associated software has been evaluated on limited number of experimental data sets. However, these performance evaluations do not cover the complete search space the algorithm and the software might encounter in real-world. To this end, we present a database of simulated spectra that can be used to benchmark any spectra to peptide search engine. We demonstrate the usability of this database by bench marking two popular peptide sequencing engines. We show wide variation in the accuracy of peptide deductions and a complete quality profile of a given algorithm can be useful for practitioners and algorithm developers. All benchmarking data is available at https://users.cs.fiu.edu/~fsaeed/Benchmark.html

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

The proposed database is available at: https://users.cs.fiu.edu/ fsaeed/Benchmark.html

References

Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422(6928):198
Article Google Scholar
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Article Google Scholar
Diament BJ, Noble WS (2011) Faster request searching for peptide identification from tandem mass spectra. J Proteome Res 10(9):3871–3879
Article Google Scholar
Ebhardt HA, Root A, Sander C, Aebersold R (2015) Applications of targeted proteomics in systems biology and translational medicine. Proteomics 15(18):3193–3208
Article Google Scholar
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4(3):207–214
Article Google Scholar
Freytag S, Tian L, Ingrid L, Ng M, Bahlo M (2018) Comparison of clustering tools in r for medium-sized 10x genomics single-cell RNA-sequencing data. F1000Research 7
Gul Awan M, Saeed F (2016) MS-reduce: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing. Bioinformatics 32(10):1518–1526
Article Google Scholar
Gul Awan M, Saeed F (2018) Mass-simulator: a highly configurable simulator for generating ms/ms datasets for benchmarking of proteomics algorithms. Proteomics 18(20):1800206
Article Google Scholar
Iglesias-Gato D, Wikström P, Tyanova S, Lavallee C, Thysell E, Carlsson J, Hägglöf C, Cox J, Andrén O, Stattin P et al (2016) The proteome of primary prostate cancer. Eur Urol 69(5):942–952
Article Google Scholar
Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4(11):923
Article Google Scholar
Käll L, Storey JD, MacCoss MJ, Noble WS (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7(01):29–34
Article Google Scholar
Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by ms/ms and database search. Anal Chem 74(20):5383–5392
Article Google Scholar
Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI (2017) Msfragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14(5):513
Article Google Scholar
Ma B (2015) Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom 26(11):1885–1894
Article Google Scholar
McIlwain S, Tamura K, Kertesz-Farkas A, Grant CE, Diament B, Frewen B, Howbert JJ, Hoopmann MR, Kall L, Eng JK et al (2014) Crux: rapid open source protein tandem mass spectrometry analysis. J Proteome Res 13(10):4488–4491
Article Google Scholar
PedroM C, Bengt F (2016) Emerging systems biology approaches in nanotoxicology: towards a mechanism-based understanding of nanomaterial hazard and risk. Toxicol Appl Pharmacol 299:101–111
Article Google Scholar
Saeed F (2015) Big data proteogenomics and high performance computing: Challenges and opportunities. In Signal and information processing (GlobalSIP). In: 2015 IEEE Global Conference on. IEEE, pp 141–145
Savitski MM, Wilhelm M, Hahne H, Kuster B, Bantscheff M (2015) A scalable approach for protein false discovery rate estimation in large proteomic data sets. Mol Cell Proteom 14(9):2394–2404
Article Google Scholar
Shteynberg D, Deutsch EW, Lam H, Eng JK, Sun Z, Tasman N, Mendoza L, Moritz RL, Aebersold R, Nesvizhskii AI (2011) iprophet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteom 10(12):M111-007690
Article Google Scholar
Tsai T-H, Song E, Zhu R, Di Poto C, Wang M, Luo Y, Varghese RS, Tadesse MG, Ziada DH, Desai CS et al (2015) LC-MS/MS-based serum proteomics for identification of candidate biomarkers for hepatocellular carcinoma. Proteomics 15(13):2369–2381
Article Google Scholar
Zhenqin W, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) Moleculenet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Article Google Scholar

Download references

Acknowledgements

Research reported in this paper was supported by NIGMS of the National Institutes of Health under award number: R01GM134384. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Fahad Saeed was further supported by the National Science Foundations (NSF) under the Award Numbers NSF CAREER OAC-1925960. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation.

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Muaaz Gul Awan
Al-Khwarizmi Institute of Computer Science (KICS), University of Engineering & Technology (UET), Lahore, Pakistan
Abdullah Gul Awan
School of Computing and Information Sciences, Florida International University, Miami, FL, USA
Fahad Saeed

Authors

Muaaz Gul Awan
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Gul Awan
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Saeed
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MA devised the method and database, wrote the software for generation of the database and wrote the manuscript. AA performed experiments and analyzed the results. FS proposed the initial idea, designed the experiments, supervised the research, and wrote and edited the manuscript.

Corresponding author

Correspondence to Fahad Saeed.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Code availability

Proposed databases were generated using the MaSS-Simulator software (Gul Awan and Saeed 2018).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1599 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Awan, M.G., Awan, A.G. & Saeed, F. Benchmarking mass spectrometry based proteomics algorithms using a simulated database. Netw Model Anal Health Inform Bioinforma 10, 23 (2021). https://doi.org/10.1007/s13721-021-00298-3

Download citation

Received: 21 September 2020
Revised: 26 February 2021
Accepted: 12 March 2021
Published: 26 March 2021
DOI: https://doi.org/10.1007/s13721-021-00298-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarking mass spectrometry based proteomics algorithms using a simulated database

Abstract

Access this article

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 1599 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation