Diversity of Mycobacterium tuberculosis Complex Lineages Associated with Pulmonary Tuberculosis in Southwestern, Uganda

Uganda is among the 22 countries in the world with a high burden of tuberculosis. The southwestern region of the country has consistently registered a high TB/HIV incidence rate. This study is aimed at characterizing the Mycobacterium tuberculosis complex (MTBC) genotypic diversity in southwestern Uganda. A total of 283 sputum samples from patients with pulmonary tuberculosis were genotyped using specific single nucleotide polymorphism markers for lineages 3 and 4. Most of the patients were males with a mean age of 34. The lineage 4 Ugandan family was found to be the most dominant strains accounting for 59.7% of all cases followed by lineage 3 at 15.2%. The lineage 4 non-Ugandan family accounted for 14.5% of all cases while 4.2% showed amplification for both lineage 4 and lineage 3. Eighteen samples (6.4%) of the strains remained unclassified since they could not be matched to any lineage based on the genotyping technique used. This study demonstrates that a wide diversity of strains is causing pulmonary tuberculosis in this region with those belonging to the lineage 4 Ugandan family being more predominant. However, to confirm this, further studies using more discriminative genotyping methods are necessary.


Introduction
Tuberculosis (TB) is an ancient communicable disease that has been in existence over the past millennia and up to date remains a major global public health problem. It is caused by closely related acid-fast bacteria known as Mycobacterium tuberculosis complex (MTBC) with seven human-adopted MTBC lineages (designated lineage 1 through lineage 7) and two lineages adapted to various wild and domestic animal species but capable of causing human infection being implicated [1][2][3][4]. Epidemiological studies reveal that there is a strong phylogeographical structuring of this organism [5][6][7] and well adapted to sympatric human populations [8,9]. This makes these lineages more predominant in specific human populations of certain geographical locations [1,7,10]. For instance, lineage 2 (L2) is most predominant in East Asia and is more associated with virulence and drug-resistant than other lineages [11,12]. L1 (moderately virulent) and L3 mainly occur in areas around the Indian Ocean whereas L5 and L6 are highly restricted to West Africa [13,14]. L4 is commonly found in Africa, Central America, Europe, and South America [15], while L7 is highly and exclusively restricted in Ethiopia [16]. This adoption could be a result of coevolution [17,18]. Studies have also shown that human migration greatly contributes to the spread of TB and causes an increase in the genetic diversity of MTBC [9,[19][20][21][22][23].
Uganda is listed among the world's 22 countries with the highest TB burden countries [24] and the third-largest refugee host country in the world after Turkey and Pakistan with over 1.36 million refugees [25][26][27]. The refugees come from countries such as the Democratic Republic of the Congo (DRC), Burundi, Ethiopia, Eritrea, Rwanda, Somalia, and South Sudan. Over the past five years, there has been a mass refugee influx from DRC and South Sudan into the country with the southwestern region of the country accommodating most of these refugees. Ninety-two percent (92%) of these refugees live alongside the local communities where accommodation and farming land are given to them [27]. Earlier studies conducted in central Uganda indicate that majority of the TB cases are due to MTBC lineage 4 sublineage Ugandan family (L4-U) [28][29][30], Euro-American lineage. This sublineage is defined by a deletion in 724 region of difference (RD) and (SNP) typing 33-36, 40 and 43 spoligotype fingerprint spacers missing, and several single nucleotide polymorphisms (SNPs). Similarly, a 2010 study in southwestern Uganda done by Bazira and colleagues using spoligotyping indicated that the majority of TB strains in the region belong to the Uganda genotype [31]. However, the fact that there has been an influx of immigrants in a region that consistently registers high TB incidence rates calls for a greater understanding of the molecular epidemiology and genetic variability of MTBC in the region to enable better control of this causative agent of TB.

Study
Setting. This study was conducted between May 2018 and April 2019 in the southwestern region of Uganda, a region with fifteen administrative districts and borders Tanzania, Rwanda, and the Democratic Republic of Congo (Supplementary File 1: Figure 1). The region is heavily affected by the TB/HIV epidemic and constantly registers high TB incidence rates [32][33][34]. There were four patient recruitment centers which included two regional referral hospitals (Kabale and Mbarara regional referral hospitals) and two health centers within the refugee camps in the region (Oruchinga and Nakivale health center IV).

Patient Recruitment and Sample Collection.
The patient's clinical information including age, sex, HIV status, previous history of TB, and economic status was recorded. Sputum samples were collected from patients aged ≥18 years. Cases of pulmonary tuberculosis (PTB) were diagnosed at the sample collection centers by either GeneXpert Cepheid test for samples analyzed at the regional referral hospitals or sputum smear microscopy for samples analyzed in the health centers. Those samples diagnosed positive for PTB were then transported (not more than 72 hours after collection time) in a cold box to Mbarara University of Science and Technology Genomics and Translational Laboratory for processing and molecular analysis.

DNA Extraction and Confirmation of MTB in Sputum
Samples. The genomic DNA from each patient sample was processed by standardized protocols [35,36]. All the samples were then screened and confirmed as MTB by detection of a 123 bp fragment of the IS6110 gene which is common among the members of the MTBC.

Single Nucleotide Polymorphism (SNP)
Typing. SNP typing to determine the MTB lineages was performed by RT-PCR (Bio-Rad CFX96 Touch™). Lineage-specific primers were as follows: Rv004C for MTB L4-U, Rv2962C for MTB L4-NU, and Rv0129C for MTB L3 based on a previous study of Wampande et al. [37] and their accompanying hybridization probes (Supplementary File 2: Table 1). The MTBC lineages were identified based on differences in melting temperature (T m ). Briefly, the assays were performed in 20 μl reaction mixture containing 3.75 μl of PCR water, 1.25 μl (0.5 μM final concentration) of each primer, 0.625 μl (0.25 μM final concentration) of each probe, 9.5 μl of 2X Lunar® Universal genotyping master mix, and 3 μl (5-50 ng) of extracted genomic DNA. The Bio-Rad CFX96 Touch™ Real-Time PCR Detection System was programmed for PCR amplification and a melting curve stage. For each of the three uniplex assays, the amplification stage consisted of a pre-PCR stage performed at 95°C for 10 min, an amplification stage with denaturation at 95°C for 30 s, primer annealing (50°C for Rv004C or 52°C for Rv0129C or 51°C for Rv2962C) for 30 s, and extension at 60°C for 30 s for 45 cycles. The melting curve analysis consisted of denaturation of the amplicons at 95°C for 10 s to produce single-stranded DNA, probe annealing temperature at 65°C for 05 s with a continuous acquisition mode to allow capture of the fluorescence, and probe melting temperature ranging from 40 to 80°C. In all the assays, kc32969 (L4-U), H37Rv (L4-NU), and delicus (L3) genomic DNA (courtesy of Makerere Molecular Labs) were used as positive control while nontemplate mix as a negative control. The Bio-Rad CFX96 Touch™ software was used to determine the MTB lineages through the analysis of amplicon melting temperature (T m ) (Supplementary File 3: Figure 2).

Statistical
Analysis. Patient demographic data were converted into Excel tables and then exported to SPSS version 25 (IBM, Chicago, USA) for analysis. Bivariate analysis was performed using the chi-square test to determine the relationship between categorical variables (independent variables) and dependent variables. Multinomial logistic regression models were fitted to evaluate the relationship between MTB lineage (dependent variable) and patients' country of origin/ethnicity (primary independent variable). Patients' characteristics of age, gender, and economic status were treated as covariates when fitting the final model. Statistical significance was considered at p ≤ 0:05. Chi-square and Fisher's exact tests were computed, and a p value of ≤0.05 was considered evidence of a significant difference.  (Figure 1). Of the 37 samples from the refugees, 8.1% belonged to L4-U family while 1.1%, 3.2%, and 0.4% were L4-NU, L3, and unclassified strains, respectively. In those samples from the Ugandan patients, 51.6% belonged to L4-U family while 13.4%, 12%, and 6% were L4-NU, L3, and unclassified strains, respectively (Table 1). Bivariate analysis taking lineage as the outcome showed that the proportions of patients infected with each MTBC lineage did not differ according to age, sex, HIV status, level of income, or history of TB in the past. Despite the fact that the Ugandan genotype was found in a higher proportion of those confirmed by GeneXpert or microscopy, the proportion was not statistically significant (p = 0:074) ( Table 1). A multinomial logistic regression analysis was further performed to determine the relationship between the independent variable "patients' nationality" and the MTB lineages circulating in southwestern Uganda (Table 2). It was revealed that the Ugandan patients were significantly likely to have L4-U strains (OR: 0.501; 95% CI: 0.143-1.758; p value: 0.281) than the refugee patients when other factors were held constant. Furthermore, the model projected those Ugandan patients were less likely to have strains of lineage 3 (OR = 0:298), both lineages 3 and 4 (OR = 0:868) than the non-Ugandan patients compared to having lineage 4 non-Ugandan family (reference category); however, this was not statistically significant (p > 0:05). Conversely, the study showed that Ugandan patients were 1.342 times more likely to have the unclassified strains (95% CI: 0.130-13.54) than the refugee patients when other factors were held constant, but this was not statistically significant (p > 0:05).

Discussion
To gain insight into the MTBC population structure causing PTB in southwestern Uganda, a region where TB infection is widespread, we utilized SNP typing to analyze 283 sputum samples from Ugandan and non-Ugandan patients diagnosed with PTB. SNP typing was chosen since it was optimized and found to be discriminative by Wampande and colleagues [37]. Our findings revealed heterogeneity of MTBC causing PTB in this area, with L4 being the most predominant lineage, with L4-U family and L4-NU accounting for 59.7% and 14.5% of all cases, respectively. The L4-U family was represented by 51.6% of Ugandan and 8.1% of non-Ugandan patients while L4-NU was represented by 13.4% and 1.1%, respectively. Our results of L4 being the most prevalent in this study are in keeping with reports that indicate TB epidemic in Africa is primarily caused by L4 strains [15]. L4 is phylogenetically complex, with at least ten distinct sublineages that differ in geographical distribution, with local genotypes accounting for a larger proportion of circulating strains in certain regions [15]. For instance, the L4-Cameroon family is almost exclusively found in West Africa [14] while the Latin American-Mediterranean (LAM) family is most predominant in Zambia [38,39]. Similarly, the L4-U family is more predominant in Uganda as opposed to anywhere else [4,28,31,40]. This is supported by our findings which show that L4-U is most common among the Ugandan patients, as opposed to the refugees and comparable with the 63% recorded in central Uganda [28] and 59.2% reported in an earlier study conducted in Mbarara, southwestern Uganda [31]. Studies indicate that in neighboring countries, the L4-U family is less common. Tanzania, for example, reported 21.8% [40] of L4-U, while Kenya reported 11% [41]. It is thus tempting to speculate that local strains are more likely to transmit in a given local setting compared to others.
Another lineage observed in this study at a relatively high proportion was L3, with a prevalence of 15.2%. The Ugandan patients accounted for 12% of this percentage while non-Ugandans accounted for 3.2%. This finding is comparable to the results of a study conducted in central Uganda, which  3 Tuberculosis Research and Treatment found L3 to be at 11% [28]. Studies conducted in Uganda's neighboring countries such as Tanzania, Sudan, Kenya, and Rwanda have also revealed that L3 (particularly the Central Asian (CAS) family) is a widely implicated lineage in PTB [40]. The MTBC is known to remain stable even in broad cosmopolitan areas such as San Francisco or London where some level of intermingling between locals and immigrants is expected [42]. This is supported by several studies that have shown that MTBC preferably transmits in sympatric host populations [7,9,11,43]. Due to these observations, it is thus easy to speculate that different MTBC lineages could have adapted to different human populations, possibly as a result of MTBC's long coevolutionary history and its human host [6,8,17,18] and that local strains are more likely to transmit in a given local setting compared to others. Our observation that L4-U is predominant in this region is consistent with this hypothesis. However, further work is needed to validate this theory, including studies exploring the interaction between MTBC genetic variation and humans. In our study, 6.4% of the samples were not placed in either of the

Limitations of the Study
The use of three unique SNP markers in this study may have restricted our ability to detect other MTBC lineages causing PTB in this region. Nonetheless, our research used SNP markers validated by a local study and took into account the common lineages in circulation in east and central Africa, where the patient population is from.

Conclusion
There is heterogeneity of MTBC causing PTB in southwestern Uganda with lineage 4 Ugandan family strains being the most predominant. However, this diversity needs to be ascertained further with more discriminative techniques such as Mycobacterial Interspersed Repeat Units-Variable Number of Tandem Repeat (MIRU-VNTR) typing or whole genome sequencing.

Data Availability
Information used in the study can be accessed at http://www .re3data.org/.

Ethical Approval
This was acquired from the Mbarara University of Science and Technology Institutional Review Board committee, and clearance was acquired from the Uganda National Council for Science and Technology Research under reference numbers 13/08-17 and HS2379, respectively.

Conflicts of Interest
The authors declare that they have no competing interests.