Whole genomic sequencing as a tool for diagnosis of drug and multidrug-resistance tuberculosis in an endemic region in Mexico

Background Whole genome sequencing (WGS) has been proposed as a tool for diagnosing drug resistance in tuberculosis. However, reports of its effectiveness in endemic countries with important numbers of drug resistance are scarce. The goal of this study was to evaluate the effectiveness of this procedure in isolates from a tuberculosis endemic region in Mexico. Methods WGS analysis was performed in 81 tuberculosis positive clinical isolates with a known phenotypic profile of resistance against first-line drugs (isoniazid, rifampin, ethambutol, pyrazinamide and streptomycin). Mutations related to drug resistance were identified for each isolate; drug resistant genotypes were predicted and compared with the phenotypic profile. Genotypes and transmission clusters based on genetic distances were also characterized. Findings Prediction by WGS analysis of resistance against isoniazid, rifampicin, ethambutol, pyrazinamide and streptomycin showed sensitivity values of 84%, 96%, 71%, 75% and 29%, while specificity values were 100%, 94%, 90%, 90% and 98%, respectively. Prediction of multidrug resistance showed a sensitivity of 89% and specificity of 97%. Moreover, WGS analysis revealed polymorphisms related to second-line drug resistance, enabling classification of eight and two clinical isolates as pre- and extreme drug-resistant cases, respectively. Lastly, four lineages were identified in the population (L1, L2, L3 and L4). The most frequent of these was L4, which included 90% (77) of the isolates. Six transmission clusters were identified; the most frequent was TC6, which included 13 isolates with a L4.1.1 and a predominantly multidrug-resistant condition. Conclusions The results illustrate the utility of WGS for establishing the potential for prediction of resistance against first and second line drugs in isolates of tuberculosis from the region. They also demonstrate the feasibility of this procedure for use as a tool to support the epidemiological surveillance of drug- and multidrug-resistant tuberculosis.


Introduction
According to the annual report of the World Health Organization in 2017, 10.5 million new cases and 1.7 million deaths were related to tuberculosis (TB), establishing TB as the infectious disease with the greatest impact on human health. [1] Close to 25% of TB cases recorded worldwide every year show resistance (DR-TB) to at least one of the four antibiotics used in first line treatment against this disease; Rifampicin (R), Isoniazid (H), Ethambutol (E) and Pyrazinamide (Z). Of these DR-TB isolates, about 5% evolve into multi-drug resistant tuberculosis; i.e., they present a combined resistance to isoniazid and rifampicin (MDR-TB), which may evolve into an aggravated form known as extreme drug-resistant tuberculosis (XDR-TB), which are MDR-TB isolates that possess simultaneous resistance to a flouroquinolone and at least one of the three second-line injectable drugs (amikacin, kanamycin and capreomicin). The number of isolates with these characteristics is steadily increasing, reducing the possibility of eliminating TB as a global public health problem by 2020. [2] Whole Genome Sequencing (WGS) provides accurate information about polymorphisms, insertions and deletions (indels) of potential relevance to the rapid prediction of drug susceptibility phenotypes of clinical importance [3,4]. The implementation of WGS has potential for improving traditional phenotypic procedures to detect DR-TB cases and to influence clinical decision-making in high-income settings [5][6][7][8][9][10][11], in addition, this technique has also been successfully used to examine transmission dynamics, identify transmission clusters and reconstruct timelines of acquisition of antimicrobial resistance [12][13][14][15].
The current body of knowledge regarding mutations that confer drug resistance has led to the suggestion that identification of true DR-TB isolates may be more complex than previously thought [16], raising the need to further this type of study in order to identify the changes involved and increase diagnostic efficiency. Moreover, there is a lack of evidence regarding the utility of WGS in emerging or poorly-developed countries, which are precisely those most affected by drug-resistant tuberculosis, and where information relating to the polymorphism associated with drug resistance is limited [6,[17][18][19][20][21] In Mexico, tuberculosis is endemic. According to the 2017 Global Report, 22,869 new cases of TB were reported in 2016, with an incidence of 22 cases per 100,000 inhabitants [1]. DR-TB cases were observed in 2.5% of these cases, and there was also a prevalent population of 610 individuals with MDR-TB. Moreover, increasing numbers of XDR-TB cases are reported every year [22]. These figures make Mexico the country in the Americas with the third highest  contribution of tuberculosis, including aggravated forms of TB such as DR-TB, MDR-TB and  possibly XDR-TB. [23] Previous studies using Sanger sequencing protocols have shown that Mexico presents a considerable diversity of mutations associated with DR-TB, as well as variations related to geographical regions [24][25][26][27][28][29][30][31], which act to limit the utility of new genotypic diagnostic procedures and make WGS a very attractive alternative. This study therefore attempts to assess the diagnostic value of WGS for the detection of resistance to first line antituberculosis drugs in isolates circulating in a region of Mexico with a high incidence of DR-TB.

Recovery of mycobacterial strains, culture and drug susceptibility test
An initial collection of 91 isolates of mycobacterial strains, with positive drug resistance determined by the phenotypic test method (BACTEC, MGIT 960 Becton-Dickinson) from 2012 to 2016, was randomly selected from the strain bank of the Public Health Laboratory of Veracruz. This is the state reference laboratory where sensibility test for first line drugs is made to all suspicious cases of tuberculosis drug resistant. Demographic information of the patients bearing the selected strains was obtained from clinical laboratory records and anonymised.

DNA extraction, WGS sequencing and bioinformatics analysis
Extraction and purification of the genomic DNA from isolates was carried out following Van Soolingen et al. [32]. Quantification was conducted initially with a nanodrop (ThermoScientific, USA) with subsequent adjustment to a concentration of 0.2 ng/μL. The libraries were prepared according to the Nextera XT (Illumina, CA., USA) protocol, using 1ng of DNA quantified by Qubit fluorometer (Invitrogen, CA, USA). Quality was determined using TapeStation (Agilent Genomics) and the pool was normalized and loaded into a 300-cycle mid output cartridge at 1.8pM. Sequencing was performed using NexSeq 500 (Illumina, CA., USA), in a 2 × 150 paired-end format.
Due to the possible presence of DNA material that is not of the M. tuberculosis complex (MTBC), Kraken software V2 [33] was used to classify the WGS reads and identify other Mycobacteria species. We further focused only on those reads that belong to the Mycobacterium tuberculosis species. The WGS analysis, including mapping and variant calling (SNP and INDELS), was performed following a previous pipeline [34]. Briefly, sequencing reads were mapped and aligned to an inferred MTBC most likely common ancestor genome using BWA. Variants were then separated into INDELS (small insertions and deletions included) and single nucleotide polymorphism (SNPs) using VarScan. In order to detect drug resistance-associated mutations with different frequencies within the population, single polymorphisms with at least 10 reads in both strands and a quality score of 20 were classified into two categories. We considered those mutations that involved a frequency range of 10 and 89% as variant-SNPs, and those with at least 90% frequency in the sample as constant-SNPs. An INDEL was considered where the mutation was present with at least 10x depth coverage. Variant annotation was performed using M. tuberculosis H37Rv annotation reference (GenBank accession number: AL123456.2). In addition, SNPs annotated in regions that were difficult to map, such as repetitive sequences and PPE/PE-PGRS genes, as well as those detected near INDELS, were excluded from the analysis.

Identification of drug resistance-associated variants
In order to determine mutations related to drug resistance in the isolates included in the study, SNP data obtained by WGS were compared to a list of genes known to confer resistance to each anti-tuberculosis drug [35]:

Phylogenetic analysis and identification of transmission clusters
With the aim to classify the isolates, and to identify whether they had a specific genotype and whether these were related, a phylogenetic analysis was performed. We classified all of the strains according to the presence of the specific lineages or sub-lineages of phylogenetic variants proposed by Coll et al. 2014 [36] and Stucki et al. 2016 [37]. In order to detect highly likely genomic transmission clusters, we performed a pairwise genetic distance analysis based on a concatenated SNP alignment obtained from every constant-SNP from every isolate. A �12 SNP threshold was applied to delineate the transmission clusters, as proposed by Walker et al. 2013 [38]. A phylogeny including all the clinical isolates, with three samples used as an outgroup (one belonging to Lineage (L) 5, another to L6 and Mycobacterium canetti [34]), was inferred using the maximum likelihood phylogenetic approach implemented in MEGA V6 [39] by applying a general time reversible model of nucleotide substitution, with a gamma distribution (GTRGAMMA) and considering 1000 bootstraps. The tree was visualized in iTOL v. 4 (https://itol.embl.de/tree) [40].

Analysis of the diagnostic efficiency of WGS
Drug resistant profiles for each first-line drug, as predicted by WGS, were compared to the BACTEC MGIT phenotypic assay that was used as a reference standard for the diagnostic. The sensitivity (Sen), specificity (Spe), positive predictive values (PPV) and negative predictive values (NPV) were determined with 95% confidence intervals. Data was analyzed using STATA V.13. Inter-test agreement for both results was expressed by the percentage of overall agreement and kappa (κ) statistic, according to Landis et al. [41].

Ethical concerns
No physical interventions took place with the patients and all of the information collected was treated as confidential. The ethics committee of the Public Health Institute at University of Veracruz oversaw and approved the ethical issues involved in this study

Isolate recovery, patient data and resistotyping
Ninety-one clinical isolates, obtained from the same number of individuals with TB and resistance to at least one first-line drug, were initially included in the study. Further genome analysis confirmed the presence of the species M. tuberculosis in 81 of the isolates. Four isolates presented M. fortuitum and two presented M. abscessus and it was not possible to determine the species in a further four isolates. These ten isolates were therefore excluded from the global study.

Polymorphism conferring resistance against first-line drugs
According to the BACTEC MGIT phenotypic assay, 61 isolates showed H resistance, of which 51 (62%) presented polymorphisms that could explain their condition, while no mutations were detected in ten (12%) of the isolates, according to the WGS analysis. The most frequent mutation identified was katG315S/T in 35 (60%) isolates, followed by the intergenic position 1673425 (c/t) of the fabG1 gene in nine (14%) isolates and the intergenic positions 2726105-2726145 of ahpC-OxyR, which were observed in seven (11%) isolates (Table 1).
Of the 51 isolates that were rifampicin-resistant according to the BACTEC MGIT phenotypic assay, 49 (96%) had a mutation in the rpoB gene. Some isolates presented more than one mutation, while two isolates presented no mutations. One sensitive isolate, as determined by the BACTEC assay, showed the rpoB430L/P mutation while another strain had the mutation rpoB432Q/L. Thirteen mutations were observed, the most frequent of which was found at rpoB450S/L in 35 (71%) of the isolates ( Table 2).
Of the 24 isolates with phenotypically E resistance, 17 (71%) presented polymorphisms that explained their condition, according to WGS. However, seven (30%) of the resistant isolates showed no mutation, mine while six isolates, defined as sensitive by the BACTEC MGIT phenotypic assay, showed one mutation related to resistance to this drug. The most frequent mutation was observed at embB306M/V/I in eleven isolates, followed by the embA promoter C-12T mutation at the intergenic genome position 4243221, which was observed in eight isolates (Table 3).
Of the 28 isolates with resistance to Z according to the phenotypic assay, 21 (75%) had mutations at pncA while seven presented a wild-type genotype. Four isolates phenotypically identified as Z-susceptible presented a mutation that could be associated with resistance. In total, 17 mutations were identified in pncA. The most frequent of these was pncA120L/P, which was detected in 14 isolates (Table 4).
Of the 38 isolates with phenotypic S resistance, only eleven (29%) presented a mutation that would explain the resistant condition ( Table 5). The remaining 27 (71%) isolates presented no change. Only one phenotypically-determined sensitive isolate showed one polymorphism associated with resistance. Nine mutations were observed in three genes (gidB, rpsL and rrs); the most abundant of which were rpsL43K/R and a c/t mutation at position 1472362 on the rrs gene, both found in three isolates each. (Table 5).

Polymorphism conferring multidrug and second line drug resistance
Of the 46 isolates identified as MDR-TB by the BACTEC MGIT method, 41 (89%) showed simultaneous mutations in the genes associated with resistance against rifampicin and isoniazid by WGS. Four isolates (7%) lacked any polymorphism associated with isoniazid resistance in the katG, inhA, oxyR-ahpC or FabG1 genes analyzed, and two isolates (3%) did not present a mutation associated with rifampicin resistance in rpoB, rpoA and rpoC genes.
It was not possible to determine phenotypic resistance against second-line drugs in the isolates analyzed because of limitations in reagents and infrastructure. However, the WGS analysis of genes related with resistance against these drugs showed that twelve isolates presented a mutation that would confer resistance to fluoroquinolones; ten had a mutation at gyrA,  conferring resistance to ofloxacin, and two had one mutation at gyrB, conferring resistance to levofloxacin (Table 6). In addition, five isolates presented mutations at eis and rrs, conferring resistance to kanamycin, amikacin and capreomycin. Finally, no polymorphisms were observed in genes associated with resistance to linezolid (rrl and rplC), aminoglycosides (tlyA), ethionamide (ethR, mshA, inhA, ndh and ethA) and p-amino salicylic acid (thyA). A joint analysis of MDR-TB isolates with the mutations related to resistance against second-line drugs showed the presence of eight isolates that could potentially be considered as pre-XDR-TB (with resistance to H and R, associated with resistance to FQ or a second-line injectable, but not both), as well as three isolates that were recognized as XDR-TB (with resistance to H and R, and resistance to any of the fluoroquinolones (such as levofloxacin or moxifloxacin)), and to at least one of the three injectable second-line drugs (amikacin, capreomycin or kanamycin).

Concordance between WGS mutations and phenotypic resistance and kappa test
Considering the diagnosis obtained by BACTEC MGIT phenotypic assay as the reference test, the efficacy of WGS in terms of predicting resistance against H, R, E, Z and S showed
The WGS analysis allowed identification of six transmission clusters (TC); TC1 included two isolates with L3, also sharing the polymorphisms related to a pre-XDR-TB character. The rest of the TC were located within L4; TC2 included three isolates with sub-lineage 4.3 (LAM), TC3 included three isolates, TC4 had two isolates and TC5 had three isolates, all included in the sub-lineage 4.1.2 (H). Finally, TC6 included thirteen isolates with sub-lineage 4.1.1 (X1-X3), all sharing a multidrug resistant condition. One presented the polymorphisms that could potentially classify it as pre-XDR-TB and another as XDR-TB (Fig 1).

Discussion
Mexico is a country that presents endemic tuberculosis and more than 20,000 cases are reported annually, of which close to 1,500 are DR-TB. The state of Veracruz has a population of close to seven million inhabitants, and contributes 13% (2,500) of the total number of TB cases in Mexico, close to 15% of the annually reported number of DR-TB cases in the country and occupies first place in terms of prevalent cases of MDR-TB [22]. For these reasons, Veracruz is positioned as one of the most significant contributors to the DR-TB and MDR-TB problem in Mexico. The epidemiological characteristics of the population included in this study, such as a high proportion of individuals with T2DM (36%) and primary treatment (47%), reflect previous reports for the region [24,25,30,42] and confirm the influence of T2DM as an important factor in the development of DR-TB and MDR-TB, as well as highlighting the difficulties for diagnosis, control and preventive management of DR-TB by the health agencies as well as evidencing the urgent need to develop new diagnostic procedures.
Among the 81 DR-TB isolates subjected to WGS, more than 200 polymorphisms were identified in 60 positions of the more than 35 genes analyzed. This evidences the diversity of mutations present in the drug-resistant isolates in circulation, and is in agreement with previous reports describing mutations identified by Sanger-type sequencing [27][28][29][30][43][44][45].
A sensitivity of 84%, specificity of 100% and kappa concordance of 88% were determined for the diagnosis of resistance to H; these values are similar to that described in reports from different locations [8][9][10][11]21] (Table 7). Ten isolates showed no mutations that could explain their phenotypic condition of resistance. The possibility exists that other mutations or unreported genes, such as pump flux, may be actively participating but further studies are required to confirm this possibility. There was a noteworthy presence of isolated carriers of unusual mutations, such as katGP325S and katGN138H [46], as well as an important number of mutations in the intergenic position of ahpC-OxyR, described as compensatory for isoniazid resistance and associated with katG mutations [47]. Considering all the above, the information obtained supports the proposal of the use of WGS as a useful tool for the diagnosis of isoniazid resistance [47].
The sensitivity of 96% and specificity of 94% observed for the diagnostic of R by WGS coincides with previous reports [8][9][10][11]21] (Table 7). Two isolates were notable for being phenotypically susceptible to rifampicin, despite the fact that they were carriers of the mutations  rpoB432Q/L and rpoB430L/P respectively, which have been considered as "disputed mutations" [48] that confers low-level R resistance [49]. Isolates carrying this type of mutation are phenotypically misclassified as rifampicin susceptible, and routine genotypic tests fail in terms of their detection [48]. This indicates that WGS could improve the local diagnostic of R resistance, and of MDR, considering that rifampicin resistance has become a subrogated marker of MDR-TB [1]. In this context, the analyses of the 51 R-resistant isolates included in this study showed that 46 (90%) of these were also MDR-TB according to the BACTEC MGIT phenotypic test, while 41 (80%) presented, through WGS, the mutations that explain that condition. The sensitivity for the diagnosis of resistance to E by WGS was 71%, which is one of the lowest described to date [8][9][10][11]21] (Table 7). Seven resistant isolates showed no mutations, while six sensitive isolates showed a mutation associated with resistance to this drug. These contradictory results can explain the low sensitivity and positive predictive values observed. Such variations in the occurrence of mutations associated with resistance to E are similar to those reported previously [50]. Resistant isolates lacking in mutations and the presence of mutations related to resistance in sensitive isolates show the possibility of failures in the BAC-TEC MGIT phenotypic assay, and further analysis is required in order to identify the factors that cause this low sensitivity and to improve the efficiency of WGS in predicting resistance against this drug among the isolates from the setting.
The sensitivity of 75% observed for Z was low compared to other reports [8][9][10][11]21] (Table 7); however, the specificity of 90% and kappa agreement of 84% supports the use of WGS for the diagnosis of resistance to this drug. The presence of four susceptible but mutation-carrying isolates and seven resistant but unmutated (pncA or rpsA) isolates can help to explain the low sensitivity found. However, technical problems with in vitro testing of Z resistance associated with inoculum concentration, volume and homogeneity [51], and even changes in the pH of the culture medium, have been considered to influence the diagnostic of false positive resistant Z strains [52]. The diversity and number of mutations and indels observed in pncA was the highest among all of the genes studied, and was similar to values previously described [29]. A deletion in the pncA47 position (2289101; -gttgc) and an insertion in pncA131 (2288850; +cc) were identified; both indels were described in isolates circulating in the region five years ago [29], raising the possibility that these isolates are still actively circulating in the present population.
The 29% sensitivity observed for S through the WGS was among the lowest reported [8][9][10][11]21] (Table 7). The mechanisms of resistance to S are complex and involve several genes. While those considered as canonical (gibB, rpsL and rrs) were characterized here, only a limited number of isolates carried a mutation; a behavior similar to that previously observed in isolates from the region [28], highlighting the need for further studies to identify the mutations involved with resistance against this drug. While S has not been used in Mexico as a first line drug against TB since the 1980s, the fact that 47% of the isolates analyzed presented resistance to this drug evidences the possible presence of "old" S resistant strains still in active circulation in the population and that actually participate in the resistance against first line drugs in TB. This could be of particular relevance in Mexico, considering the WHO recommendation for use of this drug as a second line drug in specific situations [53].
The sensitivity and specificity for the diagnosis of MDR-TB using WGS were 89% and 97%, respectively, and presented a kappa agreement of 92%. These values indicate the feasibility of using this technique as a support for prediction of the diagnosis of these aggravated forms of TB. There is no doubt that this will have a significant value in Mexico, since it is a country that generates an important number of cases of MDR-TB in Latin America.
Through WGS, eight isolates were identified as bearing the mutations that would confer a potential pre-XDR-TB character, and three isolates were predicted as XDR-TB; nevertheless, it was not possible to acquire their phenotypic profiles against second line drugs and confirm the real condition of these isolates. In this sense, previous reports describe that it is possible by means of WGS to identify XDR-TB isolates with sensitivity and specificity values exceeding 85% [6,8,10,18,54]. Thus, the possibility of generating a faster screening diagnosis of pre-and XDR-TB strains using WGS could have important implications for a middle-income country such as Mexico, where at least 6 to 12 months are required to establish a sensitivity profile against second line drugs in isolates suspected of having this condition.
While it was only a relatively small number of isolates studied, four lineages were identified; 90% (76) were placed in L4 (Euro-American), confirming the predominance of this lineage in the population [24,25,42,55,56] and the presence of the "founder effect", where the predominance of specific lineages seems to limit the dispersion of new or less frequent lineages [56][57][58]. Twenty-six (32%) isolates were located within six transmission clusters, of which two were particularly noteworthy: i) the TC1, which included the only two isolates with L3 (CAS) and ii) the TC6, comprising 13 (17%) MDR-TB isolates with sub-lineage 4.1.1 (X), including one isolate identified as pre-XDR-TB and another as XDR-TB. These CAS and X lineages are rarely referred in Mexico, particularly with such a strong tendency towards drug resistance [24,25,42,55,59] and additional studies will be necessary to determine the extension of this sub-lineages in the population.
The main limitation of this study probably relates to the fact that all of the isolates came from a single region of Mexico, even though it is one of the states with the highest number of cases of DR-TB and MDR-TB in the country. It would undoubtedly be necessary to increase the number of isolates sourced from different regions of the country, and to identify the mutations that are most frequently involved in the development of resistance to first-and secondline drugs. This information will guide the application of WGS, for early diagnosis and treatment of DR-TB, as well as the development of epidemiological studies to identify the risk factors associated with the development and transmission of infection generated by DR-TB in the Mexican population.
Finally, it is important to mention that in Mexico, the cost to determine sensitivity to first and second line drugs is close to 150USD, and if an analysis of XpertMTB/RIF is considered first this is increased to 250USD per sample per patient, while the cost to sequence a Mycobacterium genome is approximately 270 USD. These costs reinforce the possibility that the WGS could be used as a support tool for the diagnosis and surveillance of DR-TB.
In conclusion, this study illustrates the utility of WGS in terms of the diagnostic of resistance against first line drugs, MDR-TB and the screening of pre-XDR-and XDR-TB in Mexico; as well as presenting the valuable additional possibility of characterizing the genotypes in circulation and identifying transmission clusters, allowing further development of epidemiological-genomic surveillance studies.