Skip to main content

An Effective Preprocessing Method for High-Quality Pan-Genome Analysis of Bacillus subtilis and Escherichia coli

  • Protocol
  • First Online:
Essential Genes and Genomes

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2377))

Abstract

Bacillus subtilis and Escherichia coli, as widely used microbial species, are of great significance in studying microbial community relationships, adaptive evolution in various niches, engineering cell factories that produce specific products, and designing genome reduction. The pan-genome analysis is an effective method for studying the characteristics and functions of genes among and within species. Many research directions and conclusions usually depend on accurate gene identification and reliable pan-genome results. However, there currently lack enough studies showing how to achieve high-quality pan-genome results between or within certain species. This chapter will take Bacillus subtilis as an example to introduce a stepwise manner for improving the quality of the pan-genome by gradually removing confounding strains step-by-step, and ultimately obtaining a reliable high-quality pan-genome landscape of Bacillus subtilis, which could be used as a quality control protocol in pan-genome analysis pipeline. Finally, we suggest furtherĀ improvingĀ the pan-genome analysis results of Escherichia coli to prove the feasibility and credibility of the quality control protocol for obtaining high-quality pan-genome landscape.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Golicz AA, Bayer PE, Bhalla PL et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132ā€“145

    ArticleĀ  CASĀ  Google ScholarĀ 

  2. Vogan AA, Higgs PG (2011) The advantages and disadvantages of horizontal gene transfer and the emergence of the first species. Biol Direct 6:1

    ArticleĀ  CASĀ  Google ScholarĀ 

  3. Medini D, Donati C, Tettelin H et al (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589ā€“594

    ArticleĀ  CASĀ  Google ScholarĀ 

  4. Vernikos G, Medini D, Riley DR et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148ā€“154

    ArticleĀ  CASĀ  Google ScholarĀ 

  5. Earl AM, Losick R, Kolter R (2008) Ecology and genomics of Bacillus subtilis. Trends Microbiol 16:269ā€“275

    ArticleĀ  CASĀ  Google ScholarĀ 

  6. Higgins D, Dworkin J (2012) Recent progress in Bacillus subtilis sporulation. FEMS Microbiol Rev 36:131ā€“148

    ArticleĀ  CASĀ  Google ScholarĀ 

  7. Golicz AA, Bayer PE, Bhalla PL et al (2020) Pan-genomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132ā€“145

    ArticleĀ  CASĀ  Google ScholarĀ 

  8. Jafari A, Aslani MM, Bouzari S (2012) Escherichia coli: a brief review of diarrheagenic pathotypes and their role in diarrheal diseases in Iran. Iran J Microbiol 4:102ā€“117

    PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  9. Tettelin H, Masignani V, Cieslewicz MJ et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A 102:13950ā€“13955

    ArticleĀ  CASĀ  Google ScholarĀ 

  10. Poulsen BE et al (2019) Defining the core essential genome of Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 116:10072ā€“10080

    ArticleĀ  CASĀ  Google ScholarĀ 

  11. Davies MR et al (2019) Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat Genet 51:1035ā€“1043

    ArticleĀ  CASĀ  Google ScholarĀ 

  12. Bayer PE et al (2019) Variation in abundance of predicted resistance genes in the Brassica oleracea pan-genome. Plant Biotechnol J 17:789ā€“800

    ArticleĀ  CASĀ  Google ScholarĀ 

  13. Li R, Li Y, Zheng H et al (2010) Building the sequence map of the human pan-genome. Nat Biotechnol 28:57ā€“63

    ArticleĀ  CASĀ  Google ScholarĀ 

  14. Sherman RM, Salzberg SL (2020) Pan-genomics in the human genome era. Nat Rev GenetĀ 21:243ā€“254

    Google ScholarĀ 

  15. Chan AP, Sutton G, DePew J et al (2015) A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii. Genome Biol 16:143

    ArticleĀ  CASĀ  Google ScholarĀ 

  16. Wu H, Wang D, Gao F (2021) Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains. Brief BioinformĀ 22:1951ā€“1971

    Google ScholarĀ 

  17. Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068ā€“2069

    ArticleĀ  CASĀ  Google ScholarĀ 

  18. Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236ā€“1240

    ArticleĀ  CASĀ  Google ScholarĀ 

  19. Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157

    ArticleĀ  CASĀ  Google ScholarĀ 

  20. Lagesen K, Hallin P, RĆødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100ā€“3108

    ArticleĀ  CASĀ  Google ScholarĀ 

  21. Tamura K, Peterson D, Peterson N et al (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731ā€“2739

    ArticleĀ  CASĀ  Google ScholarĀ 

  22. Pritchard L, Glover RH, Humphris S et al (2015) Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods 8(1):12ā€“24

    ArticleĀ  Google ScholarĀ 

  23. Page AJ, Cummins CA, Hunt M et al (2015) Roary: rapid large-scale prokaryote pan-genome analysis. Bioinformatics 31:3691ā€“3693

    ArticleĀ  CASĀ  Google ScholarĀ 

  24. Gao F, Zhang C-T (2006) GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res 34:W686ā€“W691

    Google ScholarĀ 

  25. Luo H, Lin Y, Liu T et al (2021)Ā DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res 49:D677ā€“D686

    Google ScholarĀ 

  26. Yu G, Smith DK, Zhu H et al (2017) ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28ā€“36

    ArticleĀ  Google ScholarĀ 

  27. Yu G, Li F, Qin Y et al (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976ā€“978

    ArticleĀ  CASĀ  Google ScholarĀ 

  28. Zhao Y, Wu J, Yang J et al (2012) PGAP: pan-genomes analysis pipeline. Bioinformatics 28:416ā€“418

    ArticleĀ  CASĀ  Google ScholarĀ 

  29. Zhao Y, Jia X, Yang J et al (2014) PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics 30:1297ā€“1299

    ArticleĀ  CASĀ  Google ScholarĀ 

  30. Yang ZK, Luo H, Zhang Y et al (2019) Pan-genomic analysis provides novel insights into the association of E. coli with human host and its minimal genome. Bioinformatics 35:1987ā€“1991

    ArticleĀ  CASĀ  Google ScholarĀ 

Download references

Acknowledgments

The authors would like to thank Prof. Chun-Ting Zhang for the invaluable assistance and inspiring discussions.Ā The present work was supported in part by National Key Research and Development Program of China (grant number 2018YFA0903700) and National Natural Science Foundation of China (Grant numbers 21621004 and 31571358).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Wu, H., Yang, ZK., Yang, T., Wang, D., Luo, H., Gao, F. (2022). An Effective Preprocessing Method for High-Quality Pan-Genome Analysis of Bacillus subtilis and Escherichia coli. In: Zhang, R. (eds) Essential Genes and Genomes. Methods in Molecular Biology, vol 2377. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1720-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1720-5_21

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1719-9

  • Online ISBN: 978-1-0716-1720-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics