Skip to main content

Quality control of gene predictions

  • Chapter
Modern Genome Annotation
  • 890 Accesses

Abstract

A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigó et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual-or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Design Selection 17: 349–356

    Article  CAS  Google Scholar 

  • Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247–D251

    Article  PubMed  CAS  Google Scholar 

  • Gnomon description (2003) http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.html

    Google Scholar 

  • Guigó R, Flicek P, Abril J, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic V, Birney E, Castelo R, Eyras E, Ucla C, Gingeras T, Harrow J, Hubbard T, Lewis S, Reese M (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1): S2.1–S3.1

    Article  Google Scholar 

  • Hubbard T, Aken B, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer S, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Overduin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez X, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610–D617

    Article  PubMed  CAS  Google Scholar 

  • Letunic I, Copley R, Schmidt S, Ciccarelli F, Doerks T, Schultz J, Ponting C, Bork P (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32: D142–D144

    Article  PubMed  CAS  Google Scholar 

  • Mott R, Schultz J, Bork P, Ponting C (2002) Predicting protein cellular localization using a domain projection method. Genome Res 12: 1168–1740

    Article  PubMed  CAS  Google Scholar 

  • Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Banyai L, Patthy L (2007) MisPred Database for mispredicted and abnormal proteins. http://mispred.enzim.hu/index.html

    Google Scholar 

  • Tordai H, Nagy A, Farkas K, Bányai L, Patthy L (2005) Modules, multidomain proteins and organismic complexity. FEBS J 272: 5064–5078

    Article  PubMed  CAS  Google Scholar 

  • Tress M, Martelli P, Frankish A, Reeves G, Wesselink J, Yeats C, Olason P, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski R, López G, Sadowski M, Watson J, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis S, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones D, Lengauer T, Orengo C, Patthy L, Thornton J, Tramontano A, Valencia A (2007) The implications of alternative splicing in the ENCODE protein complement. P Natl Acad Sci USA 104: 5495–5500

    Article  CAS  Google Scholar 

  • Unneberg P, Claverie J (2007) Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data. PLoS ONE 2: e254

    Article  PubMed  Google Scholar 

  • Wheelan S, Marchler-Bauer A, Bryant S (2000) Domain size distributions can predict domain boundaries. Bioinformatics 16: 613–618

    Article  PubMed  CAS  Google Scholar 

  • Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR (2007) Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 7: 19

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Patthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag/Wien

About this chapter

Cite this chapter

Nagy, A. et al. (2008). Quality control of gene predictions. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_3

Download citation

Publish with us

Policies and ethics