Skip to main content

Plant Proteogenomics: From Protein Extraction to Improved Gene Predictions

  • Protocol
  • First Online:
Proteomics for Biomarker Discovery

Abstract

Historically many genome annotation strategies have lacked experimental evidence at the protein level, which and have instead relied heavily on ab initio gene prediction tools, which consequently resulted in many incorrectly annotated genomic sequences. Proteogenomics aims to address these issues using mass spectrometry (MS)-based proteomics, genomic mapping, and providing statistical significance measures such as false discovery rates (FDRs) to validate the mapped peptides. Presented here is a tool capable of meeting this goal, the UCSD proteogenomic pipeline, which maps peptide-spectrum matches (PSMs) to the genome using the Inspect MS/MS database search tool and assigns a statistical significance to the match using a target-decoy search approach to assign estimated FDRs. This pipeline also provides the option of using a more reliable approach to proteogenomics by determining the precise false-positive rates (FPRs) and p-values of each PSM by calculating their spectral probabilities and rescoring each PSM accordingly. In addition to the protein prediction challenges in the rapidly growing number of sequenced plant genomes, it is difficult to extract high-quality protein samples from many plant species. For that reason, this chapter contains methods for protein extraction and trypsin digestion that reliably produce samples suitable for proteogenomic analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Windsor AJ, Mitchell-Olds T (2006) Comparative genomics as a tool for gene discovery. Curr Opin Biotechnol 17:161–167

    Article  PubMed  CAS  Google Scholar 

  2. Aivaliotis M, Gevaert K, Falb M et al (2007) Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis. J Proteome Res 6:2195–2204

    Article  PubMed  CAS  Google Scholar 

  3. Gallien S, Perrodou E, Carapito C et al (2009) Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol. Genome Res 19:128–135

    Article  PubMed  CAS  Google Scholar 

  4. Nielsen P, Krogh A (2005) Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics 21:4322–4329

    Article  PubMed  CAS  Google Scholar 

  5. Jaffe JD, Berg HC, Church GM (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4:59–77

    Article  PubMed  CAS  Google Scholar 

  6. Domon B, Aebersold R (2006) Mass spectrometry and protein analysis. Science 312:212–217

    Article  PubMed  CAS  Google Scholar 

  7. Washburn MP, Wolters D, Yates JR 3rd (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotech 19:242–247

    Article  CAS  Google Scholar 

  8. Mann M, Pandey A (2001) Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases. Trends Biochem Sci 26:54–61

    Article  PubMed  CAS  Google Scholar 

  9. Ansong C, Purvine SO, Adkins JN, Lipton MS, Smith RD (2008) Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomic Proteomic 7:50–62

    Article  PubMed  CAS  Google Scholar 

  10. Armengaud J (2009) A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr Opin Microbiol 12:292–300

    Article  PubMed  CAS  Google Scholar 

  11. de Groot A, Dulermo R, Ortet P et al (2009) Alliance of proteomics and genomics to unravel the specificities of Sahara bacterium Deinococcus deserti. PLoS Genet 5:e1000434

    Article  PubMed  Google Scholar 

  12. Jaffe JD, Stange-Thomann N, Smith C et al (2004) The complete genome and proteome of Mycoplasma mobile. Genome Res 14: 1447–1461

    Article  PubMed  CAS  Google Scholar 

  13. Zivanovic Y, Armengaud J, Lagorce A et al (2009) Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea. Genome Biol 10:R70

    Article  PubMed  Google Scholar 

  14. Hurkman WJ, Tanaka CK (1986) Solubilization of plant membrane proteins for analysis by two-dimensional gel electrophoresis. Plant Physiol 81:802–806

    Article  PubMed  CAS  Google Scholar 

  15. Vincent D, Wheatley MD, Cramer GR (2006) Optimization of protein extraction and solubilization for mature grape berry clusters. Electrophoresis 27:1853–1865

    Article  PubMed  CAS  Google Scholar 

  16. Manza LL, Stamer SL, Ham AJ, Codreanu SG, Liebler DC (2005) Sample preparation and digestion for proteomic analyses using spin filters. Proteomics 5:1742–1745

    Article  PubMed  CAS  Google Scholar 

  17. Wisniewski JR, Zougman A, Nagaraj N, Mann M (2009) Universal sample preparation method for proteome analysis. Nat Methods 6:359–362

    Article  PubMed  CAS  Google Scholar 

  18. Chick JM, Haynes PA, Molloy MP, Bjellqvist B, Baker MS, Len AC (2008) Characterization of the rat liver membrane proteome using peptide immobilized pH gradient isoelectric focusing. J Proteome Res 7:1036–1045

    Article  PubMed  CAS  Google Scholar 

  19. Scherl A, Shaffer SA, Taylor GK, Kulasekara HD, Miller SI, Goodlett DR (2008) Genome-specific gas-phase fractionation strategy for improved shotgun proteomic profiling of proteotypic peptides. Anal Chem 80:1182–1191

    Article  PubMed  CAS  Google Scholar 

  20. Panchaud A, Scherl A, Shaffer SA et al (2009) Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. Anal Chem 81:6481–6488

    Article  PubMed  CAS  Google Scholar 

  21. Yates JR 3rd, Eng JK, McCormack AL, Schieltz D (1995) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal Chem 67:1426–1436

    Article  PubMed  CAS  Google Scholar 

  22. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567

    Article  PubMed  CAS  Google Scholar 

  23. Pevzner PA, Mulyukov Z, Dancik V, Tang CL (2001) Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res 11: 290–299

    Article  PubMed  CAS  Google Scholar 

  24. Mann M, Wilm M (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 66:4390–4399

    Article  PubMed  CAS  Google Scholar 

  25. Brosch M, Choudhary J (2010) Scoring and validation of tandem MS peptide identification methods. Methods Mol Biol 604:43–53

    Article  PubMed  CAS  Google Scholar 

  26. States DJ, Omenn GS, Blackwell TW et al (2006) Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol 24:333–338

    Article  PubMed  CAS  Google Scholar 

  27. Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214

    Article  PubMed  CAS  Google Scholar 

  28. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445

    Article  PubMed  CAS  Google Scholar 

  29. Choi H, Ghosh D, Nesvizhskii AI (2008) Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J Proteome Res 7:286–292

    Article  PubMed  CAS  Google Scholar 

  30. Kall L, Storey JD, MacCoss MJ, Noble WS (2008) Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res 7:40–44

    Article  PubMed  Google Scholar 

  31. Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160

    Article  Google Scholar 

  32. Alves G, Yu YK (2008) Statistical Characterization of a 1D Random Potential Problem—with applications in score statistics of MS-based peptide sequencing. Physica A 387:6538–6544

    Article  PubMed  CAS  Google Scholar 

  33. Kim S, Gupta N, Pevzner PA (2008) Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res 7:3354–3363

    Article  PubMed  CAS  Google Scholar 

  34. Gupta N, Bandeira N, Keich U, Pevzner PA (2011) Target-decoy approach and false discovery rate: when things may go wrong. J Am Soc Mass Spectrom 22:1111–1120

    Article  PubMed  CAS  Google Scholar 

  35. Venter E, Smith RD, Payne SH (2011) Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 6:e27587

    Article  PubMed  CAS  Google Scholar 

  36. Castellana NE, Payne SH, Shen Z, Stanke M, Bafna V, Briggs SP (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci USA 105: 21034–21038

    Article  PubMed  CAS  Google Scholar 

  37. Tanner S, Shu H, Frank A et al (2005) InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal Chem 77:4626–4639

    Article  PubMed  CAS  Google Scholar 

  38. Tanner S, Shen Z, Ng J et al (2007) Improving gene annotation using peptide mass spectrometry. Genome Res 17:231–239

    Article  PubMed  CAS  Google Scholar 

  39. Desiere F, Deutsch EW, Nesvizhskii AI et al (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 6:R9

    Article  PubMed  Google Scholar 

  40. Apweiler R, Bairoch A, Wu CH et al (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32:D115–D119

    Article  PubMed  CAS  Google Scholar 

  41. Fermin D, Allen BB, Blackwell TW et al (2006) Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol 7:R35

    Article  PubMed  Google Scholar 

  42. Edwards NJ (2007) Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol Syst Biol 3:102

    PubMed  Google Scholar 

  43. de Bruijn NG, Erdos P (1946) A combinatorial problem. Koninklijke Netherlands: Academe Van Wetenschappen 49:758–764

    Google Scholar 

  44. Bern M, Cai Y, Goldberg D (2007) Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal Chem 79:1393–1400

    Article  PubMed  CAS  Google Scholar 

  45. Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77:964–973

    Article  PubMed  CAS  Google Scholar 

  46. Kim S, Gupta N, Bandeira N, Pevzner PA (2009) Spectral dictionaries: integrating de novo peptide sequencing with database search of tandem mass spectra. Mol Cell Proteomics 8:53–69

    Article  PubMed  CAS  Google Scholar 

  47. Mo L, Dutta D, Wan Y, Chen T (2007) MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. Anal Chem 79:4870–4878

    Article  PubMed  CAS  Google Scholar 

  48. Ma B, Zhang K, Hendrie C et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17:2337–2342

    Article  PubMed  CAS  Google Scholar 

  49. Pevzner PA, Dancik V, Tang CL (2000) Mutation-tolerant protein identification by mass spectrometry. J Comput Biol 7:777–787

    Article  PubMed  CAS  Google Scholar 

  50. Kim S, Mischerikow N, Bandeira N et al (2010) The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteomics 9:2840–2852

    Article  PubMed  CAS  Google Scholar 

  51. Gupta N, Pevzner PA (2009) False discovery rates of protein identifications: a strike against the two-peptide rule. J Proteome Res 8:4173–4181

    Article  PubMed  CAS  Google Scholar 

  52. Gupta N, Benhamida J, Bhargava V et al (2008) Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 18:1133–1142

    Article  PubMed  CAS  Google Scholar 

  53. Christie-Oleza JA, Miotello G, Armengaud J (2012) High-throughput proteogenomics of Ruegeria pomeroyi: seeding a better genomic annotation for the whole marine Roseobacter clade. BMC Genomics 13:73

    Article  PubMed  CAS  Google Scholar 

  54. Dasari S, Chambers MC, Slebos RJ, Zimmerman LJ, Ham AJ, Tabb DL (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9:1716–1726

    Article  PubMed  CAS  Google Scholar 

  55. Wang J, Bourne PE, Bandeira N (2011) Peptide identification by database search of mixture tandem mass spectra. Mol Cell Proteomics 10(M111):010017

    PubMed  Google Scholar 

  56. Zhang N, Li XJ, Ye M, Pan S, Schwikowski B, Aebersold R (2005) ProbIDtree: an automated software program capable of identifying multiple peptides from a single collision-induced dissociation spectrum collected by a tandem mass spectrometer. Proteomics 5: 4096–4106

    Article  PubMed  CAS  Google Scholar 

  57. Wang J, Perez-Santiago J, Katz JE, Mallick P, Bandeira N (2010) Peptide identification from mixture tandem mass spectra. Mol Cell Proteomics 9:1476–1485

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors acknowledge funding support from the Australian Research Council and the NSF Grape Research Coordination Network. P.A.H. acknowledges Robert Black for continued support and encouragement.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Chapman, B. et al. (2013). Plant Proteogenomics: From Protein Extraction to Improved Gene Predictions. In: Zhou, M., Veenstra, T. (eds) Proteomics for Biomarker Discovery. Methods in Molecular Biology, vol 1002. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-360-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-360-2_21

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-359-6

  • Online ISBN: 978-1-62703-360-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics