Distribution and Clonality of drug-resistant tuberculosis in South Africa

Studies have shown that drug-resistant tuberculosis (DR-TB) in South Africa (SA) is clonal and is caused mostly by transmission. Identifying transmission chains is important in controlling DR-TB. This study reports on the sentinel molecular surveillance data of Rifampicin-Resistant (RR) TB in SA, aiming to describe the RR-TB strain population and the estimated transmission of RR-TB cases. RR-TB isolates collected between 2014 and 2018 from eight provinces were genotyped using combination of spoligotyping and 24-loci mycobacterial interspersed repetitive-units-variable-number tandem repeats (MIRU-VNTR) typing. Of the 3007 isolates genotyped, 301 clusters were identified. Cluster size ranged between 2 and 270 cases. Most of the clusters (247/301; 82.0%) were small in size (< 5 cases), 12.0% (37/301) were medium sized (5–10 cases), 3.3% (10/301) were large (11–25 cases) and 2.3% (7/301) were very large with 26–270 cases. The Beijing genotype was responsible for majority of RR-TB cases in Western and Eastern Cape, while the East-African-Indian-Somalian (EAI1_SOM) genotype accounted for a third of RR-TB cases in Mpumalanga. The overall proportion of RR-TB cases estimated to be due to transmission was 42%, with the highest transmission-rate in Western Cape (64%) and the lowest in Northern Cape (9%). Large clusters contribute to the burden of RR-TB in specific geographic areas such as Western Cape, Eastern Cape and Mpumalanga, highlighting the need for community-wide interventions. Most of the clusters identified in the study were small, suggesting close contact transmission events, emphasizing the importance of contact investigations and infection control as the primary interventions in SA.


Background
South Africa (SA) carries a disproportionate burden of drug-resistant tuberculosis (DR-TB) in Africa. The burden of DR-TB is largely driven by transmission [1][2][3][4][5]. Several studies in SA have reported a high-level of clonal DR-TB transmission [6][7][8]. Hence, understanding transmission dynamics of DR-TB remains critical in controlling this epidemic in SA.
Genotyping of M. tuberculosis strains has proven to be a powerful surveillance tool for understanding the transmission dynamics of TB. Several genotyping techniques have been developed to investigate population structure and transmission of M. tuberculosis. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis was considered the gold standard [9].
Although several clonal outbreaks were reported in SA, knowledge regarding the DR-TB population and transmission at a national level is limited. Currently, genotyping results are not routinely used for TB control in SA. Genotyping is primarily used for research purposes in selected population risk groups and in limited geographic areas. Thus, there is a need to undertake broader molecular epidemiological surveillance of DR-TB in SA to describe the DR-TB population and identify transmission events.
In 2014, the Center for TB (CTB), at the National Institute for Communicable Diseases (NICD), in Johannesburg, established the first sentinel molecular surveillance of Rifampicin-Resistant-TB (RR-TB) in SA in order to determine the prevalent RR-TB strains in specific provinces and the extent of RR-TB transmission. RR-TB instead of all TB was chosen based feasibility and cognizant that detection of RR-TB had improved with the introduction of the Xpert MTB/RIF assay as the initial diagnostic test in SA. In the current study we report the RR-TB strain population in selected SA provinces/ districts and the estimated proportion of RR-TB transmission.

Strain lineages and diversity
Based on the spoligotype classification, 92.7% (2789/ 3007) could be assigned into previously described Shared International Type (SIT) types, 1.8% (55/3007) could be assigned to a lineage without SIT, while 5.4% (161/3007) The distribution of strain family stratified by province is shown in Fig. 1 and Table 1. The Beijing family strongly predominated in EC and WC accounting for 71.5% (517/723) and 66.8% (599/897) of the RR-TB isolates, respectively. The prevalence of Beijing was relatively lower in the remaining provinces, ranging between 22.9-32.9%.
The Latin American and Mediterranean (LAM) family was the second most prevalent genotype in five of the eight provinces, representing 22.9% of the strains in NC, while it has half of that in MP (10.7%). The LAM family was least prevalent in EC (7.6%) and WC (5.6%). The prevalence of S was highest in KZN (21.3%) and was mainly represented in this study by two SITs: SIT34 (67%) and SIT71 (15%). Whereas X (mainly X1) was highest in WC (9.8%) and was also detected in MP, KZN, GP, and NC, but occurred at lower frequency (5.6-7.6%) ( Table 1).
The East-African-Indian (EAI) was particularly prevalent in MP, accounting for 32.6% (142/435) of all isolates. In GP and NW, the prevalence of EAI was notable, accounting for 8.4 and 6.7%, respectively. However, it was much lower (< 3%) in the remaining provinces. The EAI family in this study was mainly represented by sublineage East-African-Indian-Somalian (EAI1_SOM) (179/208; 86.1%). The T (mainly T1) was common in all the provinces, with prevalence between 11 and 15% in NW, GP, FS, KZN and NC while it was half of that in MP, EC and WC (5-7%). The Haarlem (H) family seemed to be more prevalent in NW and FS at~6%, while it was much lower in KZN (3%), GP (2.8%) and WC (1.1%). The CAS, U, and MANU were the least prevalent genotypes.
The estimated transmission-rate for strain families identified in the study is shown in Table 3. Beijing family had the highest transmission rate (64.2%) followed by X (45.8%) and EAI (42.3%). The distribution of clusters with greater than four cases/cluster is shown in Table 4. Beijing family showed the highest clustering, especially in EC and WC. The two largest clusters (containing 113 and 270 isolates/ cluster) identified in this study belonged to Beijing family. The cluster with 113 isolates was mostly detected in EC (100/113 cases), with only few cases in FS, MP, KZN and NW. Other Beijing clusters (containing between 5 and 65 isolates/cluster) were also identified in EC ( Table  4). The Beijing cluster with 270 isolates was mostly (262 isolates/cluster) found in WC, with few isolates from EC, GP, MP and KZN ( Table 4). The cluster was identified in all three districts of WC, with majority from City of Cape Town (199/262, 76%), followed by Cape Winelands (42/262, 16%) and West Coast (19/262, 7.3%). The Beijing clusters in the other provinces were small (2-4 cases) with few exceptions (Table 4).
In contrast, the EAI_SOM sub-lineage showed higher clustering in MP. The majority of the EAI_SOM clusters (11/15 clusters) were identified in MP, with the largest cluster containing 43 cases (Table 4).
Some clusters were specific to a certain province. Fifteen clusters of Beijing (contained 5-44 cases) were found only in WC, while three clusters (contained 5-16 cases) were found in EC. The X clusters (X1 [28 cases/ cluster] and X3 [8 cases/cluster]) and a cluster with unknown genotype (5 cases/cluster) were exclusively found in WC. While two other X3 clusters containing five and six cases were identified in MP and WC, respectively ( Table 4).
The LAM clusters were relatively small. Only five clusters had more than four cases including: LAM3 (contained 15, 7 and 5 cases/cluster) and LAM4 (contained 31 and 13 cases/cluster) sub-lineages. The two clusters  and GP (n = 9), while the LAM3 cluster with 15 cases was mostly detected from WC (n = 10) and NW (n = 4).

Discussion
The present study reports the first analysis of sentinel surveillance data on the distribution of RR-TB lineages in SA and transmission. The population structure of RR-TB isolates was dominated by Lineage 2 (Beijing) and Lineage 4 (Euro-American: LAM, T, S and X) strains. These patterns in genotype distribution likely reflect historical movement of strains. SA was located in a geographically central position in the historical trade route between East and West for hundreds of years, explaining the dominance of the Beijing (Eastern origin) and Euro-American strains (European origin) strains in SA [14]. The surveillance data showed geographic variation in RR-TB genotype distribution, which was consistent with previously published studies [15,16]. WC and EC regions showed highly homogeneous strain population, with Beijing genotype representing majority (67 and 71%, respectively) of the RR-TB isolates. Previous studies in EC and WC reported that the Beijing strains account 54-69% of multidrug-resistant (MDR) and extensively drug-resistant (XDR) TB isolates [15,17]. Interestingly, in MP the EAI (mainly EAI1_SOM) was the predominant genotype, representing a third of RR-TB cases. The EAI, however, was underrepresented in most of the other provinces (< 3%), with the exception of GP (8.2%) and NW (6.2%). This is in agreement with a previously published study that showed EAI1_SOM as predominant genotype in MP and GP [18]. Chihota et al. [15] also reported a higher prevalence of EAI1_SOM in GP compared to WC, EC, and KZN Provinces.
In contrast, RR-TB was caused by diverse genotypes in the remaining provinces, with predominance of five families (Beijing, LAM, T, S and X). The Beijing family represented the majority of RR-TB cases (22-32%). The LAM family was also common in all the provinces. The LAM family is prevalent in Latin-America and the Mediterranean regions as well as African countries such as Zambia and Zimbabwe [19,20]. In SA, it is particularly prevalent in WC and KZN provinces [2,15,21]. The LAM3 (F11), is one of the endemic strains among drug-susceptible TB cases in WC [22], while the LAM4 (F15) has been reported as predominant strain among M/XDR-TB cases in KZN [2,[23][24][25]. In our study, the LAM was least prevalent among RR-TB cases in WC, suggesting strain variation among drug-susceptible and DR-TB population. In addition, differences in the distribution of the LAM sub-lineages was observed in this study. The LAM3 was more common in FS (12.6%), NW (8.1%), and NC (6.7%), while LAM4 sub-lineage has a higher representation in KZN (14.3%) and GP (10.6%). The frequent movement of KZN residents to GP, might explain the prevalence of LAM4 family in GP compared to other provinces.
The prevalence of T family (mainly T1) was notable in all provinces. The T family is one of the predominant genotypes reported in Africa [19,26]. However, a lower proportion of this lineage was noted in MP, EC and WC (5-7%). The distribution of S was between 2 and 21% across the provinces, with highest prevalence in KZN and lowest in WC. The S family was previously reported to be prevalent in Algeria and to a lesser extent in SA, Madagascar and Egypt [19]. A study in KZN found S as predominant family among MDR-TB isolates collected between 2005 and 2006 [21]. The proportion of X (X1) lineage was higher (9.8%) in WC. Similarly, a higher prevalence of this family was reported previously in WC and NC provinces [27].
Cluster analysis showed almost half (42%) of the RR-TB cases in the selected districts in SA were due to recent transmission. The highest transmission-rate was found in WC (64%) and the lowest in NC (9%). Most of the clusters had 2-4 cases (82.0%) and likely represent small close contact transmission. The few large clusters (≥26 cases, 2.3%) identified in WC, EC and MP probably represent community transmission. However, it should be considered that clustering is not always indicative of recent transmission, as it can also reflect the persistence of endemic strains.  The majority of large and very large clusters were found in at least two province/district. However, some of the Beijing and X clusters were specific to a certain province/district (Table 4). This may be explained by the lack of strain exchange between geographically separated populations resulting in localized transmission.
Beijing family showed the highest (64.2%) clustering rate in this study. It is reported that the Beijing lineage appears to be more transmissible than other lineages [28]. The two largest clusters belonged to Beijing family and were found in five provinces, with most cases from WC and EC. The cluster with 113 cases corresponding to atypical Beijing strain [29] was mostly detected (100/ 113 cases) in EC. Previous study reported that the atypical Beijing strains are over-represented among RR-TB strains in EC [17]. This genotype was detected previously (2008) in 50% of RR-TB isolates in EC and might have gained a selective advantage over other strains to spread in the community [1,15]. The atypical Beijing, however seem to be less prevalent worldwide with exception of Japan, Vietnam and Taiwan [30][31][32][33][34]. Conversely, the Beijing cluster with 270 isolates was mostly (262) found in WC, with majority (75.5%) of the isolates being from City of Cape Town. The presence of large Beijing clusters may indicate successful circulation of the lineage within the population. Nearly 80% of the reported MDR-TB cases in WC are due to primary transmission [35][36][37]. The distinct Beijing population in EC and WC may be a result of geographically localized outbreaks with limited strain exchange between the two regions. Previously conducted study showed that, 75% of MDR isolates of the Beijing genotype in WC belonged to typical Beijing strains, while 92% of the Beijing genotype in EC belonged to atypical Beijing strains [15].
The X lineage showed the second highest clustering rate (45.8%) in this study. It is one of the predominant strain families in the WC. The X1 cluster containing 28 isolates and X3 cluster with eight cases were identified in WC. The X strains have been reported to have caused large DR-TB outbreaks in WC historically [8,38].
The estimated transmission-rate for MP in this study was relatively high (33%). The transmission of RR-TB in MP seem to be driven by EAI lineage, showing the clustering rate of 42.3%. The two large EAI1_SOM clusters were mostly detected in MP (12/17 and 43/48 case) ( Table 4). The EAI strains are reported to be prevalent in neighbouring country Mozambique, and east African countries such as Sudan, Djibouti, Malawi, and Madagascar [19]. The EAI may be introduced to MP from Mozambique through cross-border movement, as there is high Mozambican migration to SA in search of better economic opportunities [39]. Unlike Beijing and LAM, the EAI seem to be geographically restricted to MP, with limited transmission to GP and NW. The reasons for this geographically restricted transmission may have to do with adaptation of the strain to specific population in that geographical setting and/or it could be due to the low transmissibility of EAI lineage as compared to the other strains [40]. Previous studies reported that the EAI lineage was associated with notably low clustering rates, suggesting they are less likely to be transmitted [40,41]. The transmission-rate in the remaining provinces was lower (10-21%) compared to WC and EC. The clusters were mostly small with few medium clusters in NW, KZN and GP. The cluster LAM4 with 31 cases were mainly detected in KZN (n = 13) and GP (n = 9) ( Table  4). This cluster is likely the same as the F15/LAM4/KZN strain, previously described as endemic in KZN [2,24].
The exact drivers of higher rates of transmission in some provinces (districts) over others is less understood. In WC and EC, the high prevalence of Beijing genotype may play a role in driving transmission. The Beijing family have most often been associated with transmission of DR-TB in WC and EC [15,17,36]. Lack of ventilation due to the cold weather condition in WC was also reported to contribute to transmission. Other possible drivers of transmission include: population density, socio-economic factors (overcrowded living conditions, patterns of congregation and social mixing, public transportation), high HIV prevalence, and inadequate TB control program (inappropriate or non-compliance to treatment, lack of surveillance, diagnostic and treatment delay). Thus, appropriately targeted interventions based on a better understanding of the drivers of RR-TB transmission at district level is needed for designing successful control measures. This could be accomplished by strengthening district-level health systems and collect additional data from patients in order to identify risk factors that facilitate transmission.
This study has several limitations. Firstly, there is a selection bias in the study population because only culture positive samples in selected districts/provinces were included. Also, the surveillance system included patients who accessed health care, thus patients undiagnosed and/or died in the community would not be included. As a result, our findings may not be generalizable to the entire South African population. Secondly, sample collection in the different provinces occurred during different time periods, which could have impacted clustering analysis. Areas that had shorter sampling durations may have missed transmission events and underestimated clustering. Thirdly, the epidemiological data needed to support patient-to-patient transmission within genotypic clusters were not available for this analysis. As cases that share a molecular cluster may also reflect common endemic strains. Lastly, the possibility of overestimating clustering and recent TB transmission-rates is possible considering that the basis of the clustering analysis was done using traditional typing, whereas WGS could have offered a better resolution of strains and further discrimination between individuals in clusters. Despite these limitations, our study provides important information on the circulating RR-TB strains and potential transmission hotspots in SA.

Conclusions
Our study provides the first broad insight into RR-TB population structure and transmission in SA. Distinct distribution in RR-TB genotypes was observed in this study, highlighting the need for geographically targeted intervention as well as further research to understand the reasons for such local expansions with specific genotypes. The higher prevalence of EAI1_SOM genotype in MP is of concern requiring further investigation. Large clusters contribute to the burden of RR-TB in WC, EC and MP, highlighting the need for community-wide interventions that decrease transmission.
The high proportion of small clusters identified in the study suggest close contact transmission events, emphasizing the importance of contact case investigations and infection control as the primary intervention in SA. It highlights the urgent need for implementation of World Health Organization (WHO) and National Department of Health (NDOH) guidelines regarding the treatment of infection with TB preventative therapy for all high-risk contacts exposed to RR-TB at the household level. This will help in reducing household transmission thus reducing the burden of morbidity and mortality as a result of TB.

Study population and setting
The surveillance included patients newly diagnosed with RR-TB via Xpert MTB/RIF assay between 2014 and 2018. The surveillance was implemented at eight of the nine provinces, with at least one district targeted per province. In 2014, the surveillance was initiated in the following districts ( Based on operational and feasibility issues, some of the districts were limited to sentinel hospitals with feeder clinics while for some districts the surveillance covered all facilities. The staggered timelines was in part due to the implementation considerations (approvals, logistics etc.) and new areas were added sequentially. Additionally, the health system operates differently in each area and these also impacted on how the surveillance was set up.
All RR-TB samples were submitted to CTB, NICD for culture and genotyping. All the methods were carried out in accordance with relevant guidelines.

Genotyping methods and analysis
All culture confirmed samples were genotyped by combination of spoligotyping and 24-loci MIRU-VNTR typing. Spoligotyping was performed using the international standardized method [10] and patterns were analysed and classified by shared-international-type (SIT) in accordance with the Fourth-International-Spoligotyping-Database [20]. Standard 24-loci MIRU-VNTR typing was performed using the commercial kit (Genoscreen, Lille, France) and 24-capillary ABI 3500xl genetic analyzer (Applied Biosystems, California, USA) as described by the manufacturer. Sizing of PCR fragments and MIRU-VNTR allele assignation were performed using Gene-Mapper software 5.0 (Applied Biosystems).

Cluster definition and analysis
A genotype cluster was defined as two or more patients having identical patterns by both spoligotyping and 24-MIRU-VNTR typing. A non-clustered (unique) case was defined as any case from the study population having a unique pattern not shared by any other case. The proportion of cases attributed to recent TB transmission (transmission-rate) was calculated by the n-1 method according to the formula: (n c − c)/n in which n = total number of cases in the sample, c = is the number of clusters (genotypes represented by at least two cases) and n c = is the total number of cases in a cluster of two or more [42]. The genotype diversity was also calculated (diversity = number of SITs divided by the total number of isolates).
Descriptive statistics were used to present the number and proportion of clustered strains and clusters and distribution of cluster size. We defined the size of a cluster by categorising cases into four groups, according to the size of the genotypic cluster: