Abstract
Correct modeling of protein-coding genes based on genome and cDNA data is a prerequisite for functional studies. Various programs such as MAKER, Cufflinks, Oases, and Trinity have been developed, each with advantages and drawbacks. Manual integration of different models for a single gene is cumbersome and becomes a daunting task for 14,000–18,000 genes in a typical holometabolous insect. We developed methods to evaluate the output of MAKER, Cufflinks, Oases and Trinity and select the best models to constitute the MCOT1.0 set for Manduca sexta, a biochemical model insect. To apply these methods in other organisms, we improved the algorithm (designated MCuNovo Gene Selector) and automated the data processing. In this chapter, we describe background information of algorithm development and how to prepare and run this program.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46
Koboldt DC et al (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Park PJ (2009) ChIP–seq: advantages and challenges of a maturing technology. Nat Rev Genet 10(10):669–680
Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13(5):329–342
Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491
Trapnell C et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578
Grabherr M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652
Schulz M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (Oxford, England) 28(8):1086–1092
Cao X, Jiang H (2015) Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect. Insect Biochem Mol Biol 62:2–10
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225
Lomsadze A et al (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 33(20):6494–6506
Haas BJ et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9(1):1–22
Brown JB et al (2014) Diversity and dynamics of the Drosophila transcriptome. Nature 512(7515):393–399
Saha S et al (2017) Improved annotation of the insect vector of citrus greening disease: Biocuration by a diverse genomics community. Database 1–20
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212
Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
Chang Z et al (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30
Hoff KJ et al (2016) BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32(5):767–769
Pertea M et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11(9):1650–1667
Liu J et al (2016) BinPacker: packing-based De Novo transcriptome assembly from RNA-seq data. PLoS Comput Biol 12(2):e1004772
Acknowledgments
This study is supported by NIH grants GM58634 and AI112662. This work was approved for publication by the Director of Oklahoma Agricultural Experimental Station and supported in part under project OKLO2450.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Cao, X., Jiang, H. (2019). Integrated Modeling of Structural Genes Using MCuNovo. In: Brown, S., Pfrender, M. (eds) Insect Genomics. Methods in Molecular Biology, vol 1858. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8775-7_5
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8775-7_5
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8774-0
Online ISBN: 978-1-4939-8775-7
eBook Packages: Springer Protocols