Quality control of gene predictions

Nagy, A.; Hegyi, H.; Farkas, K.; Tordai, H.; Kozma, E.; Bányai, L.; Patthy, L.

doi:10.1007/978-3-211-75123-7_3

A. Nagy³,
H. Hegyi³,
K. Farkas³,
H. Tordai³,
E. Kozma³,
L. Bányai³ &
…
L. Patthy³

890 Accesses

Abstract

A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigó et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual-or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Design Selection 17: 349–356
Article CAS Google Scholar
Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247–D251
Article PubMed CAS Google Scholar
Gnomon description (2003) http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.html
Google Scholar
Guigó R, Flicek P, Abril J, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic V, Birney E, Castelo R, Eyras E, Ucla C, Gingeras T, Harrow J, Hubbard T, Lewis S, Reese M (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1): S2.1–S3.1
Article Google Scholar
Hubbard T, Aken B, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer S, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Overduin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez X, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610–D617
Article PubMed CAS Google Scholar
Letunic I, Copley R, Schmidt S, Ciccarelli F, Doerks T, Schultz J, Ponting C, Bork P (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32: D142–D144
Article PubMed CAS Google Scholar
Mott R, Schultz J, Bork P, Ponting C (2002) Predicting protein cellular localization using a domain projection method. Genome Res 12: 1168–1740
Article PubMed CAS Google Scholar
Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Banyai L, Patthy L (2007) MisPred Database for mispredicted and abnormal proteins. http://mispred.enzim.hu/index.html
Google Scholar
Tordai H, Nagy A, Farkas K, Bányai L, Patthy L (2005) Modules, multidomain proteins and organismic complexity. FEBS J 272: 5064–5078
Article PubMed CAS Google Scholar
Tress M, Martelli P, Frankish A, Reeves G, Wesselink J, Yeats C, Olason P, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski R, López G, Sadowski M, Watson J, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis S, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones D, Lengauer T, Orengo C, Patthy L, Thornton J, Tramontano A, Valencia A (2007) The implications of alternative splicing in the ENCODE protein complement. P Natl Acad Sci USA 104: 5495–5500
Article CAS Google Scholar
Unneberg P, Claverie J (2007) Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data. PLoS ONE 2: e254
Article PubMed Google Scholar
Wheelan S, Marchler-Bauer A, Bryant S (2000) Domain size distributions can predict domain boundaries. Bioinformatics 16: 613–618
Article PubMed CAS Google Scholar
Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR (2007) Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 7: 19
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Biological Research Center, Hungarian Academy of Sciences, Institute of Enzymology, Budapest, Hungary
A. Nagy, H. Hegyi, K. Farkas, H. Tordai, E. Kozma, L. Bányai & L. Patthy

Authors

A. Nagy
View author publications
You can also search for this author in PubMed Google Scholar
H. Hegyi
View author publications
You can also search for this author in PubMed Google Scholar
K. Farkas
View author publications
You can also search for this author in PubMed Google Scholar
H. Tordai
View author publications
You can also search for this author in PubMed Google Scholar
E. Kozma
View author publications
You can also search for this author in PubMed Google Scholar
L. Bányai
View author publications
You can also search for this author in PubMed Google Scholar
L. Patthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Patthy .

Editor information

Editors and Affiliations

Wissenschaftszentrum Weihenstephan, TU München, Freising, Germany
Dmitrij Frishman
Structural and Computational Programme, Spanish National Cancer Research Centre, Madrid, Spain
Alfonso Valencia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nagy, A. et al. (2008). Quality control of gene predictions. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-211-75123-7_3
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-75122-0
Online ISBN: 978-3-211-75123-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics