Toxin-antitoxin system gene mutations driving Mycobacterium tuberculosis transmission revealed by whole genome sequencing

Background The toxin-antitoxin (TA) system plays a vital role in the virulence and pathogenicity of Mycobacterium tuberculosis (M. tuberculosis). However, the regulatory mechanisms and the impact of gene mutations on M. tuberculosis transmission remain poorly understood. Objective To investigate the influence of gene mutations in the toxin-antitoxin system on M. tuberculosis transmission dynamics. Method We performed whole-genome sequencing on the analyzed strains of M. tuberculosis. The genes associated with the toxin-antitoxin system were obtained from the National Center for Biotechnology Information (NCBI) Gene database. Mutations correlating with enhanced transmission within the genes were identified by using random forest, gradient boosting decision tree, and generalized linear mixed models. Results A total of 13,518 M. tuberculosis isolates were analyzed, with 42.29% (n = 5,717) found to be part of genomic clusters. Lineage 4 accounted for the majority of isolates (n = 6488, 48%), followed by lineage 2 (n = 5133, 37.97%). 23 single nucleotide polymorphisms (SNPs) showed a positive correlation with clustering, including vapB1 G34A, vapB24 A76C, vapB2 T171C, mazF2 C85T, mazE2 G104A, vapB31 T112C, relB T226A, vapB11 C54T, mazE5 T344C, vapB14 A29G, parE1 (C103T, C88T), and parD1 C134T. Six SNPs, including vapB6 A29C, vapB31 T112C, parD1 C134T, vapB37 G205C, Rv2653c A80C, and vapB22 C167T, were associated with transmission clades across different countries. Notably, our findings highlighted the positive association of vapB6 A29C, vapB31 T112C, parD1 C134T, vapB37 G205C, vapB19 C188T, and Rv2653c A80C with transmission clades across diverse regions. Furthermore, our analysis identified 32 SNPs that exhibited significant associations with clade size. Conclusion Our study presents potential associations between mutations in genes related to the toxin-antitoxin system and the transmission dynamics of M. tuberculosis. However, it is important to acknowledge the presence of confounding factors and limitations in our study. Further research is required to establish causation and assess the functional significance of these mutations. These findings provide a foundation for future investigations and the formulation of strategies aimed at controlling TB transmission.


Introduction
Tuberculosis (TB) is a global health threat caused by the highly successful human pathogen Mycobacterium tuberculosis (M.tuberculosis).According to a report by the World Health Organization (WHO), an estimated 10.6 million new TB cases occurred worldwide in 2022, resulting in over 1.3 million deaths (World Health Organization, 2023).Despite the substantial global burden of TB, our knowledge regarding the factors influencing its transmission remains limited.Therefore, it is imperative to delve deeper into the mechanisms underlying the spread of M. tuberculosis.
The toxin-antitoxin (TA) system plays a critical biological role in M. tuberculosis.Composed of toxins and antitoxins, this system forms a small genetic unit that is widely present in prokaryotes (Schuster and Bertram, 2013;Dai et al., 2022).TA systems have been shown to assist cells in stress adaptation, antibiotic resistance, biofilm formation, persisters, and disease development.Toxins are typically translated into proteins, while antitoxins can be either proteins or RNA (Ogura and Hiraga, 1983;Aizenman et al., 1996;Magnuson, 2007;Fineran et al., 2009;Wang and Wood, 2011;Lobato-Márquez et al., 2016).Based on the nature of antitoxins and the mechanisms which inhibit toxin activity, TA modules can be classified into six distinct types (Page and Peti, 2016).Among these types, type II TA systems are wellcharacterized, where antitoxins directly interact with toxins to neutralize their effects.Bioinformatics and phylogenetic analyses have revealed the presence of numerous TA systems encoded in the M. tuberculosis genome.The retention of these TA systems in members of the M. tuberculosis complex suggests their crucial role in regulating metabolic pathways essential for bacterial pathogenesis.Type II TA systems predominate in M. tuberculosis.The abundance of TA loci in the M. tuberculosis genome raises important questions about their functional diversity (Ramage et al., 2009;Tandon et al., 2019).Previous studies have extensively investigated the various functions of TA systems in M. tuberculosis and their potential impact on pathogenic mechanisms (Schippers et al., 2005;Guo et al., 2016).These systems are believed to play a key role in M. tuberculosis 's response to stressors such as nutrient starvation and antibiotic treatment, promoting its survival and drug resistance (Kim et al., 2018).Additionally, TA systems are associated with the formation of persistent cells, a subpopulation exhibiting drug tolerance that plays a crucial role in establishing chronic infections in M. tuberculosis (Merfa et al., 2016).While the importance of toxin-antitoxin systems in M. tuberculosis has been acknowledged, our understanding of their specific mechanisms and functions within this bacterium remains limited.Therefore, comprehensive research is required to explore the roles of TA systems and gain deeper insights into the complex biology of M. tuberculosis.
Driven by the need to better understand the mechanisms underlying M. tuberculosis transmission, we conducted an extensive study investigating the impact of mutations in TA system genes on its spread.Our research aims to elucidate how genetic variations within this system can influence M. tuberculosis strain transmission dynamics.Utilizing whole-genome sequencing (WGS), we analyzed the genetic variations present in M. tuberculosis isolates at a highresolution level.This enabled us to identify specific mutations within the TA system genes that may be associated with M. tuberculosis transmission.Advanced statistical and bioinformatics techniques, including random forest, gradient boosting decision tree, and generalized linear mixed models, were employed for comprehensive analyses to identify key genetic variants linked to transmission dynamics.We acknowledge challenges posed by confounding factors and population dynamics in our analysis.Future research should incorporate social networks and regional interactions for a more comprehensive understanding.Limitations of our study include a focus on gene analysis, potentially overlooking other important genetic influences such as drug resistance mutations or virulence determinants.Therefore, more comprehensive studies are needed to address these limitations adequately.Our study has yielded significant results, identifying multiple single nucleotide polymorphisms (SNPs) within the toxin-antitoxin system genes that positively correlate with clustering, suggesting their potential role in M. tuberculosis transmission.Furthermore, some of these SNPs were found to be associated with transmission clades across different geographical regions, indicating their potential global impact on the spread of M. tuberculosis.These findings provide valuable insights into the transmission dynamics of this pathogen and contribute to a more thorough understanding of M. tuberculosis transmission.

Sample collection
We collected a total of 1,550 samples from patients with culturepositive pulmonary tuberculosis at two medical institutions in China: the Shandong Public Health Clinical Research Center (SPHCC) and Weifang Respiratory Disease Hospital (WRDH).These samples were obtained through analysis of sputum specimens.The sample collection spanned the period from 2011 to 2018.It is important to note that all samples were collected anonymously, and therefore, informed consent was not required as per the approved research protocol.Our study received ethical approval from the Ethics Committee of Shandong Provincial Hospital, which is affiliated with Shandong First Medical University (No.2017-337).This approval ensures that our research Hou et al. 10.3389/fmicb.2024.1398886Frontiers in Microbiology 03 frontiersin.orgadheres to ethical guidelines and safeguards the rights and privacy of the participants involved in the study.

DNA extraction and sequencing
Genomic deoxyribonucleic acid (DNA) was successfully extracted from 1,468 of the 1,550 Shandong M. tuberculosis isolates.Gene sequencing was performed at the Beijing Genomic Institute.The genomic DNA was sequenced using an Illumina HiSeq 4,000 system.The resulting sequence data were deposited in the National Center for Biotechnology Information (NCBI) BioProject PRJNA1002108.Quality control of the sequence reads was conducted using Fast QC software, and a total of 1,447 samples passed the quality control criteria.Low-quality raw reads with a sequencing base ≤20 or sequencing fragments length ≤ 20 were excluded from the paired-end sequencing process.During the analysis, two isolates were accidentally lost, resulting in 1445 isolates being included for further analysis.The reads of these 1,445 strains, along with 12,132 M. tuberculosis isolates downloaded from previous studies and collected from 52 countries and 18 regions worldwide, were aligned to the H37Rv reference genome (NC_000962.3)using BWA-MEM (version 0.7.17-r1188) (Luo et al., 2015;Yang et al., 2017;Coll et al., 2018;Hicks et al., 2018;Koster et al., 2018;Liu et al., 2018;Chen et al., 2019;Huang et al., 2019;Jiang et al., 2020).To improve the alignment quality, clipped alignments and duplicated reads were removed using samclip (v0.4.0) and samtools markdup (v1.15), respectively.Samples with a coverage rate below 98% or a depth less than 20× were excluded from the analysis (Jajou et al., 2019;Yang et al., 2021).Additionally, 55 Mycobacterium bovis isolates, one Mycobacterium caprae isolate, and three Mycobacterium orygis isolates were also excluded.In summary, a total of 13,518 genomes were analyzed in this study.Specific sample numbers can be found in Supplementary Tables 1, 2.

Single nucleotide polymorphism (SNP) analysis
After performing variant calling, we proceeded with additional filtering steps to enhance the quality of the detected variants.This involved employing Free Bayes (version 1.3.2) with an included filter parameter "FMT/GT = "1/1″ && QUAL> = 100 && FMT/DP > = 10 && (FMT/AO)/(FMT/DP) > = 0. " and Bcftools (version 1.15.1) for further refinement of the identified variants.To ensure the accuracy of our analysis, we excluded SNPs located within repetitive regions.This includes polymorphic sequences rich in GC found in PE/PPE genes, directly repeated SNPs, and repetitive bases identified using Tandem Repeat Finder (version 4.09) and RepeatMask (version 4.1.2-P1)(Li et al., 2009;Liu et al., 2019).The annotation for each candidate SNP was determined using SnpEff, version 4.11.The resulting output was obtained by utilizing the Python programming language (Cingolani et al., 2012).

Phylogenetic analysis
Phylogenetic lineages were inferred based on specific SNPs following the methodology described by Coll et al. (2014) (Supplementary Tables 1, 2).Maximum-likelihood phylogenetic and phylogenomic analyses were conducted using IQ-TREE version 1.6.12.The phylogeny was constructed using the general time reversible (GTR) model of nucleotide substitution with the GAMMA model of rate heterogeneity, and bootstrap replicates were performed with 100 iterations.To establish the phylogenetic relationships, the genome of the Mycobacterium canettii strain CIPT 140010059 (NC_15848.1)was used as an outgroup (Nguyen et al., 2015).The resulting phylogenetic tree was visualized and annotated using the online phylogenetic tree visualization tool iTOL. 1

Genotypic drug resistance prediction
We utilized the web-based tool TBProfiler (version 4.3.0) to analyze M. tuberculosis WGS data for drug resistance prediction (Phelan et al., 2019).Drug resistance was predicted using the curated drug-resistance Tuberculosis Database within TBProfiler.This database has undergone extensive testing on over 17,000 samples with genotypic and phenotypic data.The resistanceassociated polymorphisms (SNPs and indels) identified by TBProfiler were further evaluated based on the WHO-endorsed catalog of molecular targets for M. tuberculosis complex drugsusceptibility testing and resistance interpretation (Walker et al., 2022).This additional assessment ensures reliable and accurate interpretation of drug resistance profiles.For more detailed information on the predicted drug resistance results, please refer to Supplementary Table 3.

Propagation analysis
To explore the influence of mutations in toxin-antitoxin system genes on the transmission of M. tuberculosis, we conducted analyses on transmission clusters and transmission clades (Seto et al., 2017).Building upon prior research (Walker et al., 2013), we defined genome-based transmission clusters as pairs of isolates separated by ≤12 SNPs.Genome-based transmission clades were defined as pairs of isolates separated by ≤25 SNPs.To classify the transmission clades into different categories, we adopted a classification system established by previous scholars.The transmission clades were categorized into three groups based on their size: large (above the 75th percentile), medium (between the 25th and 75th percentiles), and small (below the 25th percentile) (Chiner-Oms et al., 2019).For a comprehensive analysis of global distribution patterns and transmission dynamics among M. tuberculosis isolates, we classified them into two main groups: cross-country clades and within-country clades.Crosscountry clades consisted of isolates originating from two or more different countries.Additionally, we further classified the M. tuberculosis isolates into cross-regional and within-regional clades based on their geographic location, using the United Nations standard regions (UN M.49).Cross-regional clades comprised isolates from two or more different regions.

Acquisition of toxin-antitoxin system genes
Initially, our analysis started with the retrieval of all genes correlated with Mycobacterium tuberculosis from the NCBI database, which yielded a comprehensive set of 4,015 genes.We concentrated our study on the specific strain, Mycobacterium tuberculosis H37Rv, and meticulously filtered that list down to 4,009 genes, guided by their respective organism names.Subsequently, our attention was directed toward refining the gene selection, with a focus on identifying those associated with the toxin-antitoxin system.This involved evaluating their functional descriptions and characteristic annotations, resulting in the successful identification of 78 genes directly implicated in the toxin-antitoxin system.To further our investigation on these genes, we employed Python, a versatile programming language with robust data analysis capabilities, to identify mutations within the set of toxinantitoxin system genes (Supplementary Table 4).

Statistical analysis and modeling
Categorical data were presented as frequencies and percentages.In order to improve statistical reliability, Mutations observed fewer than 10 times were discarded prior to continuing analysis.Statistical analyses were performed by generalized linear mixed models (GLMM) in R (version 4.2.3).In addition, Python 3.7.4 with the Scikit-learn library was used to implement random forest and gradient boosting decision tree algorithms for further data analysis.To evaluate the performance of the models, all samples were randomly divided into training and test sets at a ratio of 7:3.Various metrics such as Kappa, sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), negative likelihood ratio (NLR), and area under curve (AUC) were calculated to assess the models' effectiveness (Luo et al., 2022).Importantly, after fitting the models, we assessed the importance of input variables on the model's predictions.By assigning scores to each input feature, we identified the top-performing variables by taking the intersection of both conditions.This approach allowed us to identify the most influential features contributing to the precision of predicting risk factors (Bi et al., 2018;Agarwal et al., 2019).All models included lineage and geographical location as covariates to correct for potential confounding factors.All statistical tests were two-tailed, with p-values less than 0.05 considered statistically significant.

Characteristics of study samples
A total of 13,518 isolates were included in this study.We identified a total of 70,346 SNPs related to the toxin-antitoxin system.Out of the included strains, 6,488 (48%) belonged to lineage 4, 5,133 (37.97%) belonged to lineage 2, and only 10 strains (0.07%) belonged to lineage 6, while 29 strains (0.21%) belonged to lineage 7. By dividing the isolates into clusters based on 12 SNPs, a total of 5,717 strains clustered together, resulting in a clustering rate of 0.42.The M. tuberculosis isolates were further categorized into 1,955 clusters, with the number of isolates per cluster ranging from 2 to 146.Among the lineage 4 group, 3,245 (50.02%) isolates formed clusters, while within the lineage 2 group, 2,043 (39.80%) isolates formed clusters.Additionally, the majority of the M. tuberculosis strains analyzed in this study originated from Eastern Asia (n = 3,170, 23.45%) and Northern America (n = 1,646, 12.18%).Other regions contributing substantial sample sizes include Eastern Africa (n = 1731, 12.81%), Western Europe (n = 1,578, 11.67%), Northern Europe (n = 1,262, 9.34%), and Eastern Europe (n = 1,118, 8.27%), see Figure 1.Applying a threshold of 25 SNPs for clades, a total of 7,808 isolates claded together, resulting in a clading rate of 0.58.The M. tuberculosis isolates were further grouped into 2,218 clades, with the number of isolates per clade ranging from 2 to 192.Among these clades, there were 187 crosscountry clades, consisting of 2 to 3 countries per clade, and 164 crossregional clades, consisting of 2 to 3 regions per clade, as shown in Table 1.The phylogenetic tree of M. tuberculosis isolates was constructed as described in Figure 2.

Relationship between toxin-antitoxin system gene mutations and transmission clusters
We conducted a filtering process to exclude sites with less than 10 mutations, resulting in a final selection of 182 SNPs for further analysis.Our investigation aimed to explore the correlation between these 182 SNPs and clustering by comparing isolates within clusters to those outside clusters.The generalized linear mixed model (GLMM) revealed that 27 SNPs were statistically significant for clustering (p < 0.05) (Supplementary Table 5).Among these significant SNPs, there were 18 nonsynonymous SNPs, one start lost site, one stop gained site, and seven synonymous SNPs.Notably, these genetic variations showed a positive correlation with transmission clusters in M. tuberculosis isolates, see Table 2 for details.Furthermore, we employed random forest and gradient boosting decision tree models to establish prediction models (Table 3; Figure 3; Supplementary Table 14).However, the SNPs Rv0298 G213A, Rv1103c G56A, and Rv2871 G28C did not contribute significantly to the gradient boosting decision tree model.In summary, our findings suggested that the presence of Rv0064A (vapB1, G34A), Rv0239 (vapB24, A76C),Rv0300 (vapB2, T171C), Rv0659c (mazF2, C85T),

Relationship between toxin-antitoxin system gene mutations and transmission clusters of lineages
After excluding sites with less than 10 mutations, a total of 46 SNPs were identified and included for further analysis.Specifically focusing on clustered isolates belonging to lineage 2, we investigated Frontiers in Microbiology 05 frontiersin.orgthe relationship between these 46 SNPs and non-clustered isolates.

Relationship between toxin-antitoxin system gene mutations and clade size
After excluding sites with less than 10 mutations, a total of 128 SNPs within the toxin-antitoxin system were identified and included for analysis.The results revealed that 32 SNPs were significantly associated with clade size (p < 0.05).Among these significant SNPs, there were 21 nonsynonymous SNPs, two stop gained SNPs, and nine synonymous SNPs, all of which displayed a positive correlation with clade size.Notable examples include vapB1 G34A, mazE2 G104A, vapB11 C54T, mazE5 T344C, vapB14 A29G, parE1 C88T, parD1 C134T, vapB15 T6C, parE2 C48G, mazF8 A97G, and vapB46 G70A.For more detailed information, please refer to Supplementary Figure 5.

Discussion
Consistent with prior research findings, our study further emphasizes the diverse functionality of TA systems in M. tuberculosis.These redundant TA systems serve as a backup mechanism enabling cellular adaptation and survival under adverse conditions (Min et al., 2012).They play a critical role in M. tuberculosis's stress response, including nutrient deprivation, by regulating essential cellular processes like DNA replication, protein translation, and cell division.Moreover, TA systems contribute to the formation of drug resistance and persistence in M. tuberculosis.However, it is important to acknowledge that certain studies have reported conflicting results regarding the specific contributions of TA systems to persistence formation and stress conditions (Yu et al., 2020;Sharma et al., 2021).These discrepancies may arise from variations in experimental setups or genetic differences among M. tuberculosis strains used in different investigations.Therefore, additional research is needed to precisely determine the roles of TA systems in persistence formation, stress responses, and their impact on M. tuberculosis pathogenesis.In our study, we focused on examining the relationship between gene mutations in toxin-antitoxin systems and the transmission dynamics of M. tuberculosis.The M. tuberculosis genome contains numerous toxin-antitoxin systems, including VapBC, MazEF, ParDE, and RelBE (Ramage et al., 2009;Tandon et al., 2019).To gain deeper insights into the significance of these toxin-antitoxin systems in M. tuberculosis transmission, we analyzed the prevalence and genetic variation of specific toxin-antitoxin system genes across various clusters and evolutionary branches.Our analysis detected multiple mutations in these genes, suggesting they could be involved in M. tuberculosis transmission.In our study, we have found a strong association between SNPs in the VapB antitoxin-related genes and the transmission of M. tuberculosis.Specifically, we identified several significant SNPs that were linked to transmission, including vapB1 G34A, vapB24 A76C, vapB31 T112C, vapB14 A29G, and vapB15 (T6C, G237A).We observed that vapB24 A76C and vapB23 T2C were particularly associated with transmission, especially in lineage 2. Additionally, vapB2 T171C, vapB11 C54T, vapB14 A29G, vapB15 G237A, vapB17 G213C, and vapB20 A54C were significantly related to transmission, especially in lineage 4. Furthermore, we found that vapB43 G28C was associated with transmission in lineage 4, while vapB6 A29C, vapB31 T112C, and vapB37 G205C were correlated with cross-country and cross-regional transmission.We also found that vapB1 G34A, vapB11 C54T, vapB14 A29G, vapB15 T6C, and vapB46 G70A were related to clade size.The VapBC system is crucial for regulating the behavior and adaptation of M. tuberculosis under diverse environmental stresses.It comprises stable VapC toxins and labile VapB antitoxins, whose interplay is essential for bacterial growth, survival, and response to stress conditions (Robson et al., 2009;Winther and Gerdes, 2011).During periods of stress, antitoxin molecules are degraded, leading to the release of toxins, such as VapC, through their RNase activity (Min et al., 2012).Consequently, these toxins inhibit or slow down cellular metabolism, providing a survival advantage to the bacterium during adverse conditions.The delicate balance between VapB antitoxins and VapC toxins is crucial for maintaining bacterial homeostasis and ensuring appropriate responses to external stimuli (McKenzie et al., 2012).Overall, our study provides compelling evidence for the significant association between SNPs in VapB antitoxin-related genes and M. tuberculosis transmission.These findings shed light on the intricate role of the VapBC toxin-antitoxin system in regulating bacterial behavior and underscore the importance of genetic variations within this system in shaping transmission dynamics.
Our study has revealed the association between SNPs in other TA system genes and the transmission of M. tuberculosis.Specifically, we focused on the MazEF family, which consists of nine TA systems encoded in an operon (Ahmed et al., 2022).We found a close connection between the mazE6 G156A and mazF2 C85T gene polymorphisms and the transmission clusters, particularly within lineage 2. These variants exhibited significant correlations with the formation and expansion of transmission clusters.However, mazE6 G156A is a synonymous mutation (Arg52Arg), which does not directly alter the protein's function but may still affect the protein through other mechanisms.For example, in certain situations, synonymous mutations can lead to changes in transcription regulatory elements, thereby influencing gene expression levels.However, further research is needed to confirm these effects.Similarly, the mazF2 C85T variant may alter the stability of the MazF and modulate the delicate balance between toxin and antitoxin interactions (Leplae et al., 2011).Furthermore, our study identified a strong correlation between the mazE2 G104A, mazE5 T344C, and mazF8 A97G gene polymorphisms and the transmission clusters, especially within lineage 4 and clade size.While it's plausible that these genetic variations influence the MazEF system activity, stability, and domain structure, our ability to fully elucidate these mechanisms is currently limited.Therefore, it's crucial to interpret these functional implications cautiously and consider other potential contributory factors to M. tuberculosis transmission.Furthermore, no SNPs in the MazEF system were found to be associated with cross-country and cross-regional transmission of M. tuberculosis in our study.Future investigations should aim to provide a more comprehensive understanding of these effects, confirm these hypotheses, and uncover the precise impact of these mutations on the dynamics of M. tuberculosis transmission.
The ParDE toxin-antitoxin system in M. tuberculosis plays a crucial role in bacterial transmission dynamics.Our research has identified specific genetic variations in the parE and parD genes, such as parE1 C88T, parE2 C48G, parE1 C103T, parD2 A196G, parE1 C25G, and parD1 C134T, that are closely linked to transmission clusters, particularly within lineage 4 and lineage 2. These genetic variants impact cross-country and cross-regional transmissions, highlighting the significance of the ParDE system in the spread of M. tuberculosis.Variations in the parD gene, including those involving Rv2142A (parD2) and Rv1960c (parD1), can modify the activity and regulatory mechanisms of the ParD antitoxin.Similarly, variations in the parE gene, particularly those affecting Rv1959c (parE1), influence the function and stability of the ParE toxin, thus impacting its interaction with the ParE antitoxin (Xu et al., 2018).Understanding these genetic interactions is crucial for deciphering M. tuberculosis In terms of drug development and therapeutic interventions, our research findings could potentially have significant implications.The diverse functions of TA systems suggest potential targets for novel therapeutic strategies in M. tuberculosis.Understanding the relationship between genetic variations and functional consequences within these TA systems might help us discover new methods to disrupt or modulate their activity, thereby affecting the survival and transmission dynamics of the bacterium.Firstly, interventions targeting specific SNPs in TA systems such as VapBC, MazEF, ParDE, and RelBE could possibly directly alter the stability and activity of their toxins or antitoxins, thus impacting the growth, survival, and adaptability of M. tuberculosis (Robson et al., 2009;Leplae et al., 2011;Winther and Gerdes, 2011;McKenzie et al., 2012;Ahmed et al., 2022).The SNPs we discovered, including vapB24 A76C and vapB23 T2C, have the potential to serve as genetic markers for targeted drug design, allowing for more personalized treatment approaches.Additionally, mutations like parE1 C88T, parE2 C48G, and parE1 C103T show associations with cross-national and cross-regional transmissions of M. tuberculosis, which could aid in the development of more effective treatment plans to reduce global transmission.However, it's important to note that while these genetic insights hold potential, they still require experimental validation to confirm their clinical significance and functional implications.Each mutation may lead to different functional impacts, and there might be other complexities involved, such as drug tolerance or adaptability of the bacterium under different environmental conditions.Therefore, further research is needed to delve deeper into the functional impacts of these genetic variations and precisely determine their roles in new drug development and treatment strategies.It is crucial to validate these findings through rigorous experimental studies and clinical trials before implementing them in clinical practice.Future research should aim to elucidate the specific mechanisms underlying these genetic variations and their contributions to drug response and transmission dynamics.By gaining a better understanding of the functional implications, we can more accurately tailor treatment strategies and contribute to the development of more targeted and effective interventions.
Our findings emphasize that both synonymous and non-synonymous mutations can influence the transmission of M. tuberculosis, suggesting that synonymous mutations in TA system genes are not universally neutral, in line with prior research by Shen et al. (2022).We believe that synonymous mutations may affect mRNA stability, splicing, or secondary structure formation.Changes in these regulatory elements can influence gene expression patterns and protein folding, thereby impacting bacterial adaptability and transmission capacity.Additionally, synonymous mutations may be part of a compensatory mechanism.While synonymous mutations themselves may not directly provide selective advantages, they may be associated with compensatory changes in other regions of the genome.These compensatory mutations could restore proper In our study, we investigated the impact of mutations in TA system genes on tuberculosis transmission.However, it is crucial to acknowledge that these correlations alone do not establish a causal relationship and should be interpreted with caution.Our modeling approach has limitations, notably in addressing potential confounding factors, such as population mobility, social networks, and interregional interactions.These elements may influence M. tuberculosis transmission but were not fully integrated into our models.We recognize that our primary focus on mutations within TA system genes may have led us to overlook other significant genetic influences, including SNPs related to drug resistance mutations or virulence determinants.While our findings contribute to the growing body of knowledge regarding the impact of toxin-antitoxin system gene mutations on tuberculosis transmission, further research is necessary to explore these intersections and understand their functional significance in detail.Limitations also arise from the sheer number of genes and computational resources required, which restricted our Conduct ROC curve analysis to evaluate the performance of models for the relationship between mutations in toxin-antitoxin system genes and transmission clusters.Moreover, we lack a clear understanding of the cross-interactions and mutual regulation among the TA systems of M. tuberculosis, adding another layer of complexity to our study.Additionally, uncertainties inherent in the phylogenetic inference method used, such as homoplasy or recombination events, can present challenges when accurately determining evolutionary relationships.Therefore, future research should consider alternative methods to validate these findings and develop a more nuanced understanding of tuberculosis transmission.Further experimental validation is necessary to confirm the specific impact of TA system gene mutations.Future investigations should focus on refining our models to account for potential biases or shortcomings, and expanding research scope to explore the functional significance of these mutations and their direct influence on tuberculosis transmission.We also discuss the limitations of using H37Rv as a single reference genome for analyzing M. tuberculosis WGS data, particularly regarding virulence and transmission.Recent studies suggest that relying solely on H37Rv may not fully capture the virulence characteristics of M. tuberculosis.H37Rv, commonly used as a reference genome in molecular epidemiology and drug resistance studies, does not represent the genetic diversity and variations present across all M. tuberculosis strains.Polymorphic loci involving genes associated with pathogenicity and host immune response, such as phospholipase C, membrane lipoproteins, adenylate cyclase gene family members, and PE/PPE gene family members, show significant differences between H37Rv and clinical isolates.Several gene families, including PE/PPE, exhibit higher substitution frequencies compared to the entire genome.Widespread genetic variability is observed at these polymorphic loci among M. tuberculosis clinical isolates (Fleischmann et al., 2002;O'Toole and Gautam, 2017).Phylogenetic and epidemiological analyses reveal independent occurrences of these polymorphisms, suggesting selective pressures driving these changes.Future research should incorporate genome sequences of additional reference strains, especially those directly obtained from clinical isolates, to comprehensively understand factors related to M. tuberculosis virulence and enable further investigations.For drug resistance inference, our analysis primarily utilized the TBProfiler platform.While incorporating additional tools/methods such as PhyResSE or bioinformatic SNP analysis could enhance robustness, resource constraints limited their implementation in this study.Thus, our results should be interpreted within the context of utilizing TBProfiler alongside the WHO-endorsed catalog.Future studies with expanded resources could consider alternative tools/methods for validation and complementation.

Conclusion
The results of this study suggest that mutations in toxin-antitoxin genes may increase the risk of M. tuberculosis transmission, underscoring the significance of conducting further research to explore the impact of these mutations on M. tuberculosis control and transmission.These findings offer new insights into the development of drug treatment strategies against tuberculosis.

FIGURE 1
FIGURE 1Distribution of 13,518 strains of Mycobacterium tuberculosis in 18 regions of the world.

FIGURE 2 A
FIGURE 2A phylogenetic tree is depicted for the Mycobacterium tuberculosis isolates, with the outer circle indicating mutation sites of the toxin-antitoxin system genes.(A) Phylogenetic tree for the Mycobacterium tuberculosis isolates of lineage 2. (B) Phylogenetic tree for the Mycobacterium tuberculosis isolates of lineage 4.

FIGURE 3
FIGURE 3 (A) ROC analysis showing the performance of the random forest model.(B) ROC analysis showing the performance of the gradient boosting decision tree.ability to analyze SNPs beyond the scope of our current investigation.

TABLE 1
The characteristics of Mycobacterium tuberculosis isolates.

TABLE 2
Positive correlation between toxin-antitoxin system gene mutations and transmission clusters.

TABLE 3
The performance of various models for discriminating clustered isolates from non-clustered isolates.between proteins, maintain enzyme activity, or optimize cellular functions affected by primary mutations, ultimately enhancing transmission capacity.Although the specific mechanisms and advantages of synonymous mutations in tuberculosis transmission are not yet fully understood, we cannot overlook their potential significance.Future research should consider the functional consequences of synonymous mutations and explore their interactions with other genetic factors, including non-synonymous mutations, drug resistance mutations, or virulence determinants.In our study, we combined local and global datasets to increase sample size for robust analysis of M. tuberculosis genetic variations.This approach helped identify shared and distinct variants across regions, enhancing our understanding of global pathogen diversity.Despite potential interactions limitations such as variability from different protocols and sequencing technologies, stringent quality control measures, including SNP filtering within repetitive regions, were applied to minimize biases.Our novel findings contribute valuable insights into global M. tuberculosis genetic characteristics, advancing knowledge on tuberculosis pathogenesis and evolution.In future research, separate and comparative analyses of local and global data can be considered to highlight region-specific variations.

TABLE 4
Positive correlation between toxin-antitoxin system gene mutations and transmission clusters of lineage4.
OR, odds ratio; CI, confidence interval.*Represents a stop SNP.