Genetic diversity of Mycobacterium tuberculosis isolated from tuberculosis patients in the Serengeti ecosystem in Tanzania

Summary This study was part of a larger cross-sectional survey that was evaluating tuberculosis (TB) infection in humans, livestock and wildlife in the Serengeti ecosystem in Tanzania. The study aimed at evaluating the genetic diversity of Mycobacterium tuberculosis isolates from TB patients attending health facilities in the Serengeti ecosystem. DNA was extracted from 214 sputum cultures obtained from consecutively enrolled newly diagnosed untreated TB patients aged ≥18 years. Spacer oligonucleotide typing (spoligotyping) and Mycobacterium Interspersed Repetitive Units and Variable Number Tandem Repeat (MIRU-VNTR) were used to genotype M. tuberculosis to establish the circulating lineages. Of the214 M. tuberculosis isolates genotyped, 55 (25.7%) belonged to the Central Asian (CAS) family, 52 (24.3%) were T family (an ill-defined family), 38 (17.8%) belonged to the Latin American Mediterranean (LAM) family, 25 (11.7%) to the East-African Indian (EAI) family, 25 (11.7%) comprised of different unassigned (‘Serengeti’) strain families, while 8 (3.7%) belonged to the Beijing family. A minority group that included Haarlem, X, U and S altogether accounted for 11 (5.2%) of all genotypes. MIRU-VNTR typing produced diverse patterns within and between families indicative of unlinked transmission chains. We conclude that, in the Serengeti ecosystem only a few successful families predominate namely CAS, T, LAM and EAI families. Other types found in lower prevalence are Beijing, Haarlem, X, S and MANU. The Haarlem, EAI_Somalia, LAM3 and S/convergent and X2 subfamilies found in this study were not reported in previous studies in Tanzania.


Background
Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from infectious diseases worldwide [1]. Over the last two decades, molecular typing methods such as IS6110-RFLP [2], spoligotyping [3] and MIRU-VNTR [4] have been applied and have revolutionised our understanding of the epidemiology of TB, by providing novel insights into the genetic diversity and population structure of M. tuberculosis complex (MTBC) [5]. Epidemiological data generated through genotyping has been used extensively to further the understanding of TB disease dynamics [6]. For example, at the individual level, cases of recurrence or treatment failure can be explained in terms of reactivation of the same strain, exogenous re-infection or due to polyclonal infection [7]. At a population level, the origins and transmission dynamics of outbreaks can be determined [8e10]; while at global level, TB genotypic lineages have been defined and used to monitor their geographical distribution and spread [6]. A crucial aspect of any TB control program is the ability to quantify the contribution of transmission in order to inform policy makers to direct resources to identify infectious cases to prevent further spread of infection as well as to implement preventative therapy for those who have been infected (children and HIV positive individuals). The CAS family of M. tuberculosis strains are dominant in Tanzania [11e13] with little variations over time period [14,15] with some anti-tuberculosis drug resistance and multidrug resistance [15]. Although these previous studies have been carried in northern Tanzania (4) and Dar es Salaam (1) none has targeted the Serengeti ecosystem and the earlier studies focused largely on TB and its association with HIV/AIDS. However, one cannot conclude that every location in Tanzania is represented by this earlier data which therefore provides a justification for studies in new locations, where findings might potentially provide data that could influence TB control strategies in the country.
In this study we used molecular epidemiological tools to describe the genetic diversity of mycobacteria in the Serengeti ecosystem where humans, livestock and wildlife are in close contact with the possibility of cross-transmission [16]. Specifically, the genotyping was achieved using spoligotyping and MIRU-VNTR typing methods. We report on the genetic diversity of M. tuberculosis isolated from tuberculosis patients resident in three subdistricts of the Serengeti ecosystem.

Study design and settings
This cross-sectional study was conducted in focal health facilities serving three districts of Bunda, Ngorongoro and Mugumu-Serengeti in the Serengeti ecosystem where TB screening is done regularly (Figure 1). The population densities for the three districts according to Tanzanian population statistics (2013) are Bunda (108.3 persons/km 2 ), Serengeti (22.4 persons/km 2 ) and Ngorongoro (11.2 persons/km 2 ) with unpublished and limited information on the incidences of TB and HIV in these areas. These centres included District Designated Hospitals (DDH) in Bunda, Serengeti (Mugumu), Ngorongoro (Waso) and Endulen (in the Ngorongoro Conservation Area). The Bunda District Designated Hospital (DDH) provides health services to nearby villages of neighbouring districts, such as Magu district (Lamadi village very close to Bunda) of Mwanza alongside the Serengeti National Park, some villages of Musoma district and other nearby districts of the Mara region. All patients with symptoms suggestive of TB presenting at these facilities during the study period (October 2010eNovember 2012) were eligible for the study. Only patients who gave written informed and signed consent forms were enrolled in the study.

Sample collection
Sputum samples from self-reporting TB suspects were consecutively collected in transport medium, cetyl-pyridinium chloride (CPC) [17] from October 2010 to November 2012. A total of 472 sputum samples were collected from individuals presenting with TB symptoms. The sputum smears were Ziehl Neelsen-stained and examined microscopically for acid-fast bacilli (AFB) at the health centres. Two hundred and thirty-seven (237, 50.2%) were AFB smear-positive. All sputum samples were then transported in cetylpyridinium chloride (CPC) to the Central TB Reference Laboratory (CTRL) in Dar es Salaam for mycobacterial culture within 7 days of collection.

Sputum sample processing
During sample processing, the sputum-CPC mixture was concentrated by centrifuging at 4000 rpm for 15 min. The supernatants were poured off into a splash proof container. Twenty millilitres (20 ml) of sterile distilled water was added to the sediments and the pellets were suspended by inverting the tubes several times, and then centrifuged at 3500 rpm for 15 min. The supernatant was removed; the pellets were used for culture. On culture 214 (out of 237 sputa, 90.3%) yielded M. tuberculosis colony growths which were available for DNA extraction and subsequent molecular analysis.

Culture and identification
Two L€ owenstein-Jensen slants, one containing 0.75% glycerol and the other 0.6% pyruvate were inoculated with the sediments and incubated at 37 C and growth examined weekly for 8 weeks whereby cultures with no growth after eight weeks were considered negative.

DNA extraction procedures
Extraction of mycobacterial DNA was performed by boiling a loop full of bacteria in 100 mL H 2 O at 80 C for 60 min. Crude DNA extracts were stored at À20 C until when spoligotyping and MIRU-VNTR typing were performed.

Spoligotyping
A commercially available spoligotyping kit (Isogen, Bioscience BV, Maarssen, The Netherlands) was used for spoligotyping as previously described by Kamerbeek et al. [3]. This PCR-based fingerprinting method detects the presence or absence of 43 variable spacer sequences situated between short direct repeat (DR) sequences in the M. tuberculosis genome. The DNA from reference M. tuberculosis H37Rv and Mycobacterium bovis BCG clones were used as positive controls while autoclaved ultrapure water was used as a negative control. Visualization of presence (black squares) or absence (blank squares) of variable spacer sequences on film was achieved after incubation with streptavidin-peroxidase and detection of hybridized DNA using enhanced chemiluminescent ECL (Amersham, Little Chalfont, United Kingdom) detection liquid followed by exposure to X-ray film (Hyperfilm ECL; Amersham) as per manufacturer's instructions (GE Healthcare Life Sciences). Resulting spoligotypes were reported in octal and binary formats (Table 3) and compared to existing patterns in an international spoligotyping database profiles (SpolDB4.0) [18] available at http:// www.pasteur-guadeloupe.fr:8081/SITVITDemo/. Spoligotype patterns were grouped as spoligotype international types (SITs) if they shared identical spoligotype patterns with patterns present in the existing database. In previous studies, isolates which could not be assigned to specific SITs were referred to as orphans [19e21]; in our study we decided to name the isolates with no SITs assigned as 'Serengeti strains'. Spoligotype families were assigned as previously described [18,22].

MIRU-VNTR typing
The standardized 24 loci MIRU-VNTR typing protocol by Supply et al. [23]was followed using primers that amplify 24 polymorphic loci on the mycobacterial genome per DNA isolate. All Beijing genotype isolates and a selection of isolates representing the spoligotype EAI5, CAS1_Kili, CAS1_DELHI and LAM11_ZWE families were genotyped using this method. In brief, 2 ml of mycobacterial DNA was added to a final volume of 25 ml containing 8.375 ml of free RNase water (Qiagen, USA), 5 ml of Q solution, 2.5 ml of 10x buffer, 2 ml of 1.5 mM MgCl 2 (Roche, USA), 4 ml of 0.2 mM dNTPs (Promega, WI USA), 1 ml primer and 0.125 ml of HotStar Taq polymerase (1U).
The PCR conditions included three stages: initial denaturation at 95 C for 15 min (Stage 1), second denaturation at 94 C for 1 min, annealing at 62 C for 1 min, initial extension for 1 min at 72 C(Stage 2) and final extension at 72 C for 10 min followed by cooling to 4 C prior to analysis (Stage 3). A 45-cycles PCR was done on Veriti™ 96-well Thermal Cycler (Applied Bio system, Singapore). The laboratory M. tuberculosis H37Rv reference strain DNA was used as positive control and DNA-free water as a negative control. Amplification products were electrophoretically fractionated in 1% agarose gel (SeaKem ® LE) in 1x SPE buffer at 160 V for 4 h to allow maximum fragment size separation for clear discrimination. The number of tandem repeat units present at each locus was calculated from the size of DNA fragments according to a standardized table (http://www.MIRU-VNTRplus.org). The results were expressed in digital format where each number represented the number of repeat copies at a particular locus. Phylogenetic analysis and creation of dendograms was done using MIRU-VNTRplus (http://www.MIRU-VNTRPlus.org/) to generate a categorical based NJ-Tree dendrogram to enable comparison of strain genotypes within the study area [24,25] in an attempt to establish transmission links.

Mycobacterial cultures
During the study period sputum samples were collected from 472 individuals presenting with TB symptoms. Of these, 237 (50.2%) were smear-positive and when cultured 214 grew M. tuberculosis on LJ media, with colonies that provided DNA which was available for genotyping.

Spoligotyping and distribution of spoligotypes by district
As shown in Table 1, 88.3% (189 out of 214 isolates) of the spoligotypes could be grouped into 9 known spoligotype families, while 25 (11.7%) were 'Serengeti strains'. The Serengeti type strains resemble the CAS strain family, but have not previously been reported in SpolDB4. The Central Asian Strain family (CAS) accounted for 25.7% (n ¼ 55) of all isolates followed by an ill-defined (T) family that accounted for 52 (24.3%) isolates. The Latin American Mediterranean (LAM) family accounted for 38 (17.8%) isolates while the East African-Indian family accounted for 25 (11.7%) isolates. Eight (3.7%) isolates belonged to the Beijing family. The rest of the families were minor and comprised of Haarlem (8, 2.8%), X (1, 0.5%), U (0.9%), S (1, 0.5%) and MANU (1, 0.5%). Breakdown of spoligotypes by districts (Table 1) indicated 33 (60%) of the CAS family overrepresented in Ngorongoro with Serengeti and Bunda districts accounting for 11 (20%) of the strain type each. A relatively high proportion (40.4%) of T family strains was found in Bunda district compared to Serengeti (32.7%) and Ngorongoro (26.9%). As regards the LAM family, nearly half (47.4%) of the strains in this family were found in Ngorongoro, followed by Bunda (36.8%) and Serengeti (15.8%). The rest of the strains found in small proportions were considered minor (Table 1). However, the high number for Bunda could be explained by the relatively larger sample size (almost double, that of Serengeti). The distribution of strain families by district is reflected in Table 1 and the finer categorization into subfamilies is presented in Table 2.

MIRU-VNTR typing and phylogenetics
Phylogenetic relationship between subfamilies indicative of dynamics of transmission from standard 24-loci MIRU-VNTR typing of selected few isolates is presented in a dendrogram ( Figure 2). The standard 24-loci MIRU-VNTR tying results were compared with their respective spoligotyping results ( Table 4). The MIRU-VNTR typing patterns of polymorphisms at different loci along the mycobacterial genome are clearly reflected. While spoligotype patterns constituting the same family are largely identical, the MIRU-VNTR patterns often differed in at least one locus. Most related families and subfamilies had identical patterns of polymorphisms within families and sub-families, respectively. Variability in polymorphisms was observed among the isolates 656_Beijing_Bunda and 588_CAS1_Kili_Bunda which differed in patterns from their corresponding clades (see also Figure 2). The Beijing family (656 from Bunda) differed from other members of the family in at least 6 loci (Table 4). This strain also differed in patterns from other Beijing strain isolates within Bunda (isolates No. 724, 725) as is also reflected in the phylogenetic dendrogram ( Figure 2). The CAS1_DELHI from Ngorongoro (N1367) was also different in MIRU-VNTR typing patterns from that from Serengeti (255). The MIRU-VNTR typing showed the CAS1_Kili strains from Serengeti and Ngorongoro to have matching patterns (Table 4) (Figure 2) 4. Discussion

Spoligotyping and strain profile
The main circulating M. tuberculosis strains in the Serengeti ecosystem appear to be the CAS, T, LAM and the EAI genotypes in that order. These four strain families accounted for 79.4% of all genotypes, while all other named families; Beijing, Haarlem, X, U, S, MANU, accounted for only 8.9% of all genotypes. A significant   percentage (11.7%) of our strains could not be linked to any known spoligotype and were therefore designated as 'Serengeti strains'. The predominance of the four families seen in our study is comparable with the findings of similar studies done in Kilimanjaro and Dar es Salaam, but with some differences [14,15]. Members of these families, though not in the same dominance order, have also been reported in countries neighbouring Tanzania [15], such as Ethiopia [19,26], Zambia [20] and Uganda [27e29], indicating that they are widespread in this region.
A comparison of our study findings with the other two studies previously conducted in Tanzania is shown in Table 5. The major difference between our study and the previous two studies done in Tanzania is that we found 6 (2.8%) isolates belonging to the Haarlem family while the other previous studies [14,15] did not find members of this genotype. In addition, the study by Kibiki et al. [15] did not report X and S families, while that by Eldholm et al. [14] did not report any MANU strain family. Among strain families not previously reported in Tanzania also included, EAI_Somalia, LAM3 and S/convergent and X2 subfamilies ( Table 2). This study also found higher proportion of T family strains than the other two previous studies. The finding of 11.7% of genotypes with no SITs in the international spoligotype database [18] is interesting, possibly reflecting micro evolutionary events in the DR region of an existing strain [6,14,15,20,30]. The new strains (named 'Serengeti strains' in this study) appear to be relatively prevalent in Ngorongoro and Bunda but not in Serengeti. This variant is possibly a CAS1-Kili relative. CAS1-Kili is characterised by the absence of spacers 4e7, 10 and 20e35, whereas the Serengeti strains have additionally lost spacer 36.
Our study found some differences in the distribution of strains by districts. For example, while the T family is relatively uniformly distributed among the three districts, the CAS family predominated in Ngorongoro while EIA was highest in Bunda district. Furthermore, the lowest proportion of Beijing family was found in Ngorongoro compared to high proportions that were found in Serengeti and Bunda districts (Table 1). Other families were confined to single districts e.g. X strains were found only in Ngorongoro, S in Bunda and MANU in Ngorongoro. Comparisons of genotyping results and study sites for previous and current studies conducted in Tanzania are shown in Table 5 and Figure 3, respectively. Further analysis (at subfamily level) of the isolates revealed the CAS1_Kili and LAM_ZWE subfamilies to be predominant in Ngorongoro, T2 and CAS1_DELHI in Serengeti and EA15 and LAM11_ZWE in Bunda ( Table 2). The predominance of various strain families and subfamilies in different districts could be due to geographical isolation. CAS1_Kili for example, is believed to have emerged from the Horn of Africa, and is capable of diversifying into multiple genotypes [31]; together with other members in the CAS family they constitute the modern strains which include genetic group 1 strains belonging to the East Asian lineage (lineage 2) or to the East-African Indian lineage (lineage 3) [32]. In addition, the CAS1_Kili strain has been reported to be the dominant circulating strain in Tanzania [14,15]. It is without doubt that the CAS1_Kili has been successful in this region and as a clone it might be evolving independently acquiring genetic diversity over a long period of time thus having a high transmission level of its conserved circulating clones [33]. This could be responsible for the dominance of the CAS1_Kili family (68.4% of all CAS family) in the Ngorongoro district. We also observed variability in strain predominance with district which can be explained by differences in the characteristics of populations among the three districts. Bunda for example, is a town centre linking people in movement from various places, as such, the area is predisposed to possible introduction of potentially new strains notable in form of Beijing (varying strains of, Figure 2), S and T2-Uganda ( Table 2). The changes in family strain composition in the population has been said to be attributable to increased in and out migration as well as travel across regions thus increasing chances for exposure to different strains [34]. The variability in predominance of strains in this study reflects that the Serengeti ecosystem contains a diverse group of M. tuberculosis strains which could have implications in devising TB control strategies. This is because the findings reflect potential for different sources of infection which could determine the type of strategy for effective control of the disease to restrain the potentially different transmission chains. This is particularly important in cases where they have different degrees of resistance against anti-tuberculosis drugs.

MIRU-VNTR typing and phylogenetics
In our study the dendrogram (Figure 2) from selected representative isolates that included Beijing, EAI5, CAS1_Kili, CAS1_-DELHI and LAM11_ZWE showed that some of the families had identical patterns of polymorphisms located in proximity to each other in the dendrogram. The few exceptions were the 656_Bei-jing_Bunda and 588_CAS_Kili_Bunda that had different patterns from their corresponding clades indicative of different strains that may be new. This underlines the fact that while spoligotyping can provide rapid identification and group spoligotype patterns according to families, MIRU-VNTR typing can finely discriminate strains within and between families and subfamilies as well as establishing transmission links [20,23]. In this study, the results from MIRU-VNTR typing revealed difference in polymorphisms in at least one locus with resultant different patterns within the family as was observed in one of the Beijing families (656 from Bunda) that variably differed from other members of the family in 6 loci (similarly for other Beijing strains isolated within Bunda (isolates No. 724, 725)). The results however, showed the Beijing strains in Ngorongoro to be closely related to Beijing strains in Bunda compared to those from Serengeti. These findings demonstrate an  [14]) and Kilimanjaro (Kibiki et al. study [15]) are cities and townships with no pastoral activities at all and minimal animal-human contacts. (Map Source: www.Googlemaps.com). absence of clustering which could be indicative of the importation of new Beijing strains rather than transmission. MIRU-VNTR typing also revealed the CAS1_Kili family (Table 4, Figure 2) from Serengeti and Ngorongoro to be closer to each other while differing from that from Bunda. This could mean that the circulating strains in Ngorongoro and Serengeti for CAS1_Kili have the same chain of transmission, evolution or the same recurring strain [35,36] circulating in the area. Similarly, there were variations in polymorphisms with the CAS1_DELHI isolate from Ngorongoro differing largely with that from Serengeti with polymorphisms in at least 3 loci (Table 4). Despite the few polymorphisms in those isolates that were typed, the EAI5 strains from Bunda and Serengeti differed at least in one locus, and this difference could only be revealed through MIRU-VNTR typing. Differences in patterns were also observed for the LAM11_ZWE strains from the different districts that significantly differed in at least 4 polymorphic loci.

Conclusion
This study provides for the first time, information on the prevailing human M. tuberculosis strains at the humanelivestockewildlife interface in the Serengeti ecosystem. Only a few successful families (CAS, T, LAM and EAI) were abundant. The other group of families that comprise Beijing, Haarlem, X, S and MANU were less frequent in this study. This study reports for the first time Haarlem, EAI_Somalia, LAM3 and S/convergent and X2 subfamilies which were not reported in previous studies in Tanzania.

Acknowledgements
This study was supported by grants from the Wellcome Trust Grant [WT087546MA] and MUHAS Sida Sarec [000/3177]. We cordially acknowledge the participants for consenting to participate in our study and health authorities in Serengeti, Bunda and Ngorongoro for granting permission to conduct our study in the Serengeti Ecosystem. We thank The MUHAS Authorities, particularly the Biochemistry Department Chair, Dr Mselle for allowing part of this work to be done at his Lab and for providing general support.
Author's contribution EVM carried out the mycobacterial culture, molecular genetic studies, performed data entry, analysis and interpretation, drafted the manuscript and participated in revising it critically for important intellectual content. BZK participated in mycobacterial culture, molecular genetics studies and subsequent revision of the manuscript. KKS participated in designing, coordination and analysis of Molecular studies results (MIRU-VNTR). JDK participated in revising critically the manuscript for important intellectual content. SK participated in critical revision of the manuscript for important intellectual content. HMD participated in critically revising the manuscript for important intellectual content. EMS participated in the molecular studies designing and coordination and in initial analysis of molecular studies (Spoligotyping) results as well as revising the manuscript critically for important intellectual content. AM participated in expertize conception of the study and critically revising the manuscript for important intellectual content. MMR conceived of the study and participated in critically revising the manuscript for its intellectual content. RMW conceived of the study, participated in its design and coordination and helped to draft and critically revising the manuscript. MIM participated in conception, designing and coordination of the study, and helped to draft the manuscript and critically revising it for important intellectual content; agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. PVH conceived of the study, participated in its design and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors read and approved the final version of the manuscript.

Competing interests:
The author(s) declare that they have no competing interests.

Ethical approval:
Ethical clearance was obtained both from the Muhimbili University of Health and Allied Sciences (MUHAS) Ethics Review Committee (Ref.MU/PGS/PhD/R/Vol.1) and The Tanzania National Institute for Medical Research (Ref. No. NIMR/ HQ/R.8a/Vol. IX/ 1299). Participants consented to enrol in the study after completing informed consent forms. Patients who were found to have tuberculosis were offered treatment as stipulated in the Tanzanian National Guidelines for management of tuberculosis.