ReviewSingle nucleotide polymorphisms in Mycobacterium tuberculosis and the need for a curated database
Section snippets
Why are SNPs important for our understanding of TB?
The declaration of tuberculosis (TB) as a global public health emergency in 19931 led to renewed efforts to study the biology of the Mycobacterium tuberculosis complex (MTBC). For many years, the main research focus was on individual genes and proteins, but the generation of the first M. tuberculosis genome sequence in 19982 opened the door for more comprehensive approaches. In particular, comparative genomics studies have helped us gain a better insight into the genetic diversity and
What are SNPs and how many do we observe?
SNPs are the most common form of genetic variation in MTBC, after insertions and deletions (InDels). A total of 9037 SNPs were discovered by sequencing 21 clinical strains of MTBC.5 Generally, SNPs represent single nucleotide differences between at least two DNA sequences. The term “SNP” is often used interchangeably with “mutation”, “polymorphism” or “substitution”. Strictly speaking, a change in a single base pair is generally referred to as a (point) mutation, and happens through errors
SNPs are phylogenetically informative in MTBC
The comparably low frequency of SNPs and limited ongoing horizontal gene transfer in MTBC result in low levels of homoplasy (i.e. the independent occurrence of the same SNP in phylogenetically unrelated strains).5, 6 Hence, SNPs represent robust markers for inferring phylogenies and for strain classification.12 SNPs can also be used to measure evolutionary distances between strains, i.e. to estimate the time of divergence of strains from their genetic distance, if a mutation rate is known.26
The
The functional consequences of SNPs
In addition to being useful phylogenetic markers, SNPs carry functional information. The best-characterized “SNPs” in MTBC are drug resistance-conferring mutations. Drug resistance in MTBC is largely caused by single nucleotide mutations.61, 62, 63, 64 Many drug resistance-conferring mutations have been identified, and are publicly available in the TBDReaMDB database65 (currently containing information on 1447 mutations relevant for most anti-TB drugs (Table 1)). This kind of molecular
How do we discover new SNPs in MTBC?
In the upcoming years, we expect whole genome sequencing to at least partially replace all previous genotyping methods for MTBC. So far, large-scale DNA sequencing projects have usually been performed by specialized Sequencing Centres, but new benchtop sequencing devices increasingly allow for “do-it-yourself” approaches in the standard laboratory.80 In this section, we elaborate on some of the technical aspects of NGS genome analysis, with a particular focus on the workflow during the
The need for a new SNP database for MTBC
So far, most SNP data in MTBC have been computed and stored on local workstations, and only made available upon publication. For the raw NGS reads, NCBI, EBI and DDJB have created Sequence Read Archives (SRA) where these data can be deposited (http://www.ncbi.nlm.nih.gov/sra, http://www.ebi.ac.uk/ena/home, http://trace.ddbj.nig.ac.jp/dra/index_e.shtml). These archives contain the raw sequencing reads in SRA format, which can be downloaded and converted to FastQ files (//www.ncbi.nlm.nih.gov/books/NBK47537/
Features of a new MTBC SNP database
Given the existing features of TBDB (discussed above), this database represents an ideal starting point for an extended SNP database for MTBC (Box 1). TBDB already includes important aspects such as the relational tables and annotations. Unique and highly valuable modules such as the phylogenetic context could be extended to deal with larger numbers of taxa (strains) and characters (SNPs). So far, SNPs in TBDB are identified based on their position in the reference genome, but with larger
Conclusions
NGS studies of MTBC clinical isolates are discovering thousands of SNPs. Studying the functional effects of these SNPs and their association with phylogenetic clades should become an increasing part of the research portfolio. MTBC consist of a diverse population of strains, and this diversity should be considered when developing new tools and strategies to combat TB. A new, extended, and well-curated database is necessary to accommodate these rapidly accumulating SNP data in a user-friendly and
Acknowledgements
We thank Mireia Coscollá and Iñaki Comas as well as the other members of our group for the inspiring discussions and comments on the manuscript. The work in our laboratory is supported by the Swiss National Science Foundation (grant number PP0033-119205) and the National Institutes of Health (AI090928 and HSN266200700022C).
References (103)
- et al.
Tuberculosis: global approaches to a global disease
Curr Opin Biotechnol
(2010) - et al.
Does M. tuberculosis genomic diversity explain disease diversity?
Drug Discov Today Dis Mech
(2010) - et al.
Global phylogeography of Mycobacterium tuberculosis and implications for tuberculosis product development
Lancet Infect Dis
(2007) - et al.
Computational tools to study and understand the intricate biology of mycobacteria
Tuberc (Edinb)
(2011) - et al.
Resolving lineage assignation on Mycobacterium tuberculosis clinical isolates classified by spoligotyping with a new high-throughput 3R SNPs based method
Infect Genet Evol
(2010) - et al.
Possible underlying mechanisms for successful emergence of the Mycobacterium tuberculosis Beijing genotype strains
Lancet Infect Dis
(2010) - et al.
Changing Mycobacterium tuberculosis population highlights clade-specific pathogenic characteristics
Tuberc (Edinb)
(2009) - et al.
Scanning of genetic diversity of evolutionarily sequential Mycobacterium tuberculosis Beijing family strains based on genome wide analysis
Infect Genet Evol
(2012) Genetics of drug resistance in tuberculosis
Clin Chest Med
(1997)- et al.
Molecular genetic basis of antimicrobial agent resistance in Mycobacterium tuberculosis: 1998 update
Tuber Lung Dis
(1998)
Comparisons of dN/dS are time dependent for closely related bacterial genomes
J Theor Biol
Genetic engineering of Mycobacterium tuberculosis: a review
Tuberculosis
Strain diversity, epistasis and the evolution of drug resistance in Mycobacterium tuberculosis
Clin Microbiol Infect
Next-generation DNA sequencing techniques
N Biotechnol
MTCID: a database of genetic polymorphisms in clinical isolates of Mycobacterium tuberculosis
Tuberc (Edinb)
Global tuberculosis control
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
Nature
A new evolutionary scenario for the Mycobacterium tuberculosis complex
Proc Natl Acad Sci U S A
Genomic deletions suggest a phylogeny for the Mycobacterium tuberculosis complex
J Infect Dis
Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved
Nat Genet
High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography
PLoS Biol
The genome of Mycobacterium africanum West African 2 reveals a lineage-specific locus and genome erosion common to the M. tuberculosis complex
PLoS Negl Trop Dis
A role for systems epidemiology in tuberculosis research
Trends Microbiol
What is systems biology?
Front Physiol
The past and future of tuberculosis research
PLoS Pathog
The case for cloud computing in genome informatics
Genome Biol
Principles of population genetics
Novel genetic polymorphisms that further delineate the phylogeny of the Mycobacterium tuberculosis complex
J Bacteriol
After the bottleneck: genome-wide diversification of the Mycobacterium tuberculosis complex by mutation, recombination, and natural selection
Genome Res
Evidence that mutation is universally biased towards AT in bacteria
PLoS Genet
Sequence-based analysis uncovers an abundance of non-coding RNA in the total transcriptome of Mycobacterium tuberculosis
PLoS Pathog
The complete genome of an individual by massively parallel DNA sequencing
Nature
Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens
Annu Rev Microbiol
Bacterial genetic signatures of human social phenomena among M. tuberculosis from an aboriginal Canadian population
Mol Biol Evol
Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection
Nat Genet
SNP genotyping: technologies and biomedical applications
Annu Rev Biomed Eng
SNP and mutation analysis
Adv Exp Med Biol
Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing
Proc Natl Acad Sci U S A
Modeling bacterial evolution with comparative-genome-based marker systems: application to Mycobacterium tuberculosis evolution and pathogenesis
J Bacteriol
Myths and misconceptions: the origin and evolution of Mycobacterium tuberculosis
Nat Rev Microbiol
Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination
Proc Natl Acad Sci U S A
Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains
J Bacteriol
The complete genome sequence of Mycobacterium bovis
Proc Natl Acad Sci U S A
Silent nucleotide polymorphisms and a phylogeny for Mycobacterium tuberculosis
Emerg Infect Dis
Evolution and diversity of clonal bacteria: the paradigm of Mycobacterium tuberculosis
PLoS ONE
Variable host-pathogen compatibility in Mycobacterium tuberculosis
Proc Natl Acad Sci U S A
Global phylogeny of Mycobacterium tuberculosis based on single nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution, phylogenetic accuracy of other DNA fingerprinting systems, and recommendations for a minimal standard SNP set
J Bacteriol
Cited by (42)
High-Throughput Variant Detection Using a Color-Mixing Strategy
2022, Journal of Molecular DiagnosticsMacro-geographical specificities of the prevailing tuberculosis epidemic as seen through SITVIT2, an updated version of the Mycobacterium tuberculosis genotyping database
2019, Infection, Genetics and EvolutionCitation Excerpt :Haarlem is believed to have been introduced in Central Africa during European colonization (Filliol et al., 2003); in our study, this last lineage was globally prevalent among pansusceptible TB strains. As underlined recently (Stucki and Gagneux, 2013; Kohl et al., 2014;van Soolingen, 2014; Coll et al., 2014b), multidisciplinary approaches and epidemiology based on multiple genetic markers and techniques are revolutionizing and becoming the new era of “gold standard” for studying evolutionary relationships, epidemiological links and transmission pathways of members of M. tuberculosis complex. Even though SNPs might be ideal for defining deep phylogenetic groupings, they offer insufficient discriminatory power for routine molecular epidemiological investigation in genetically monomorphic bacterial pathogens (Achtman, 2008; Comas et al., 2009).
Genetic diversity of Mycobacterium tuberculosis isolates causing pulmonary and extrapulmonary tuberculosis in the capital of Iran
2019, Molecular Phylogenetics and EvolutionBioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis
2016, Infection, Genetics and Evolution