Abstract
Bacillus subtilis and Escherichia coli, as widely used microbial species, are of great significance in studying microbial community relationships, adaptive evolution in various niches, engineering cell factories that produce specific products, and designing genome reduction. The pan-genome analysis is an effective method for studying the characteristics and functions of genes among and within species. Many research directions and conclusions usually depend on accurate gene identification and reliable pan-genome results. However, there currently lack enough studies showing how to achieve high-quality pan-genome results between or within certain species. This chapter will take Bacillus subtilis as an example to introduce a stepwise manner for improving the quality of the pan-genome by gradually removing confounding strains step-by-step, and ultimately obtaining a reliable high-quality pan-genome landscape of Bacillus subtilis, which could be used as a quality control protocol in pan-genome analysis pipeline. Finally, we suggest furtherĀ improvingĀ the pan-genome analysis results of Escherichia coli to prove the feasibility and credibility of the quality control protocol for obtaining high-quality pan-genome landscape.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Golicz AA, Bayer PE, Bhalla PL et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132ā145
Vogan AA, Higgs PG (2011) The advantages and disadvantages of horizontal gene transfer and the emergence of the first species. Biol Direct 6:1
Medini D, Donati C, Tettelin H et al (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589ā594
Vernikos G, Medini D, Riley DR et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148ā154
Earl AM, Losick R, Kolter R (2008) Ecology and genomics of Bacillus subtilis. Trends Microbiol 16:269ā275
Higgins D, Dworkin J (2012) Recent progress in Bacillus subtilis sporulation. FEMS Microbiol Rev 36:131ā148
Golicz AA, Bayer PE, Bhalla PL et al (2020) Pan-genomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132ā145
Jafari A, Aslani MM, Bouzari S (2012) Escherichia coli: a brief review of diarrheagenic pathotypes and their role in diarrheal diseases in Iran. Iran J Microbiol 4:102ā117
Tettelin H, Masignani V, Cieslewicz MJ et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A 102:13950ā13955
Poulsen BE et al (2019) Defining the core essential genome of Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 116:10072ā10080
Davies MR et al (2019) Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat Genet 51:1035ā1043
Bayer PE et al (2019) Variation in abundance of predicted resistance genes in the Brassica oleracea pan-genome. Plant Biotechnol J 17:789ā800
Li R, Li Y, Zheng H et al (2010) Building the sequence map of the human pan-genome. Nat Biotechnol 28:57ā63
Sherman RM, Salzberg SL (2020) Pan-genomics in the human genome era. Nat Rev GenetĀ 21:243ā254
Chan AP, Sutton G, DePew J et al (2015) A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii. Genome Biol 16:143
Wu H, Wang D, Gao F (2021) Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains. Brief BioinformĀ 22:1951ā1971
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068ā2069
Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236ā1240
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157
Lagesen K, Hallin P, RĆødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100ā3108
Tamura K, Peterson D, Peterson N et al (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731ā2739
Pritchard L, Glover RH, Humphris S et al (2015) Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods 8(1):12ā24
Page AJ, Cummins CA, Hunt M et al (2015) Roary: rapid large-scale prokaryote pan-genome analysis. Bioinformatics 31:3691ā3693
Gao F, Zhang C-T (2006) GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res 34:W686āW691
Luo H, Lin Y, Liu T et al (2021)Ā DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res 49:D677āD686
Yu G, Smith DK, Zhu H et al (2017) ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28ā36
Yu G, Li F, Qin Y et al (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976ā978
Zhao Y, Wu J, Yang J et al (2012) PGAP: pan-genomes analysis pipeline. Bioinformatics 28:416ā418
Zhao Y, Jia X, Yang J et al (2014) PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics 30:1297ā1299
Yang ZK, Luo H, Zhang Y et al (2019) Pan-genomic analysis provides novel insights into the association of E. coli with human host and its minimal genome. Bioinformatics 35:1987ā1991
Acknowledgments
The authors would like to thank Prof. Chun-Ting Zhang for the invaluable assistance and inspiring discussions.Ā The present work was supported in part by National Key Research and Development Program of China (grant number 2018YFA0903700) and National Natural Science Foundation of China (Grant numbers 21621004 and 31571358).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Wu, H., Yang, ZK., Yang, T., Wang, D., Luo, H., Gao, F. (2022). An Effective Preprocessing Method for High-Quality Pan-Genome Analysis of Bacillus subtilis and Escherichia coli. In: Zhang, R. (eds) Essential Genes and Genomes. Methods in Molecular Biology, vol 2377. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1720-5_21
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1720-5_21
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1719-9
Online ISBN: 978-1-0716-1720-5
eBook Packages: Springer Protocols