Integrative utility of long read sequencing-based whole genome analysis and phenotypic assay on differentiating isoniazid-resistant signature of Mycobacterium tuberculosis

Background With the advancement of next generation sequencing technologies (NGS), whole-genome sequencing (WGS) has been deployed to a wide range of clinical scenarios. Rapid and accurate classification of drug-resistant Mycobacterium tuberculosis (MTB) would be advantageous in reducing the amplification of additional drug resistance and disease transmission. Methods In this study, a long-read sequencing approach was subjected to the whole-genome sequencing of clinical MTB clones with susceptibility test profiles, including isoniazid (INH) susceptible clones (n = 10) and INH resistant clones (n = 42) isolated from clinical specimens. Non-synonymous variants within the katG or inhA gene associated with INH resistance was identified using Nanopore sequencing coupled with a corresponding analytical workflow. Results In total, 54 nucleotide variants within the katG gene and 39 variants within the inhA gene associated with INH resistance were identified. Consistency among the results of genotypic profiles, susceptibility test, and minimal inhibitory concentration, the high-INH resistance signature was estimated using the area under the receiver operating characteristic curve with the existence of Ser315Thr (AUC = 0.822) or Thr579Asn (AUC = 0.875). Conclusions Taken together, we curated lists of coding variants associated with differential INH resistance using Nanopore sequencing, which may constitute an emerging platform for rapid and accurate identification of drug-resistant MTB clones.


Introduction
Infection with Mycobacterium tuberculosis (MTB) is still considered a health threat with about 10 million incident cases of tuberculosis (TB) and 1 million deaths every year [1]. Efforts of epidemiological control of TB are hampered by increased drug-resistant MTB strains and the absence of rapid and accurate assays for classifying drug resistance profiles. Among the 10 million TB cases, about 19% of newly treated cases or 43% of previously treated cases are resistant to at least one frontline antibiotic [1]. Furthermore, isoniazid (INH) resistance is the most common monoresistance, and it is associated with treatment failure and progression to multidrug-resistant TB [1]. Catalase-peroxidase encoded by the katG gene converts INH to its active form, which in turn acts by blocking the biosynthesis of mycolic acid which is required for cell wall synthesis [2]. Active INH is next targeted by enoyl acyl carrier protein reductase (inhA) and beta-ketoacyl ACP synthase (kasA) [2]. INH resistance is therefore exclusively associated with mutations in the katG or inhA gene [3].
Drug resistance in MTB is routinely evaluated using phenotypic methods, such as susceptibility tests or genotypic approaches [4]; however, discordance between distinct assays was frequently noted [5]. With advancement of high-throughput sequencing, the abovementioned discordance was evaluated using results of whole-genome sequencing (WGS) [6]. Nevertheless, the efficiency and accuracy of amplicon-based analyses toward the MTB genome encounter interference with the high-GC region [7]. Herein, the Oxford Nanopore Technologies (ONT) long-read sequencing platform and corresponding analytic workflow were subjected to the WGS of susceptible, low INH-resistance, and high INH-resistance MTB. Correlations of susceptible test results and minimal inhibitory concentration (MIC) tests with mutant profiles of the katG and inhA genes were monitored. Taken together, the ONT sequencing platform can potentially serve as an auxiliary test toward a precise diagnosis and treatment of MTB disease.

Study overview
We conducted WGS to identify non-synonymous variants within the katG and inhA genes in three groups of MTB clones, including (1) susceptible isolates, (2) low INH-resistant isolates, and (3) high INH-resistant isolates.

Ethics statement of the study cohort and sample collection
Enrollment of clinical isolates was reviewed and approved by the Institutional Review Board of Taipei Medical University (approval no. N201912076). In total, 50 MTB isolates were collected from clinical specimens at Taipei Municipal Wan Fang Hospital.

Susceptible test
All isolates in this study were examined at the Department of Laboratory Medicine, Taipei Municipal Wan Fang Hospital using an agar proportion assay in accordance to the Clinical and Laboratory Standards Institute (CLSI) methods. In brief, bacterial suspensions with a turbidity equivalent to 1.0 McFarland standard were prepared from fresh MTB colonies cultured on Lowenstein-Jensen medium. After examination of the turbidity using a nephelometer, the suspension was applied as the inoculum for all dilutions. One hundred microliters of 10 −2 and 10 −4 dilutions of the standard inoculum was spread on 7H10 agar with or without an anti-TB drugs, including Isoniazid (INH), Rifampin (RIF), Ethambutol (EM), Streptomycin (S) and concentrations in microgram/milliliter. Drug resistance was defined as more than 1% colony growth in the presence of the drug compared to colony growth in the absence of the drug.

Extraction of high-molecular weight (HMW) genomic DNA
MTB isolates were cultured on Lowenstein-Jensen media in the absence of an anti-MTB drug. Multiple colonies from a single isolate were emulsified and inactivated in 200 μL nuclease-free water at 95 °C for 15 min. Genomic (g)DNA was extracted using the Presto ™ Mini gDNA Bacteria Kit (Geneaid, Taipei, Taiwan) according to the manufacturer's instructions. The DNA concentration was measured using a fluorometric kit (GeneCopoeia, Rockville, MD, USA) and a Qubit fluorometer (ThermoFisher Scientific, Wilmington, DE, USA).

Whole genome sequencing and variant identification
WGS of MTB gDNA was conducted using a third-generation long read-sequencing approach. In brief, 500 ng of extracted gDNA was first homogenized with 8000 bps using a g-TUBE device (Covaris, Woburn, MA, USA) according to the manufacturer's instructions. The fractionated gDNA was subjected to library construction using a Ligation Sequencing Kit (SQK-LSK109; Oxford Nanopore Technologies (ONT), Oxford, UK) coupled with a Native Barcoding Expansion kit (EXP-NBD104 and 114; ONT) according the manufacturer's protocol. The barcoded library was captured, washed, and eluted from magnetic beads (AMPure XP, Beckman Coulter, High Wycombe, UK). Seven hundred nanograms of the pooled library was loaded and sequenced on MinION flow cells (FLO-MIN106D R9.4.1; MinION instrument; ONT). The sequence read number of each sample was 160,000 ~ 200,000 to meet a reading depth of 200. For analysis of sequencing results, the MinION-sequenced reads were uploaded through the EPI2ME desktop agent (ONT), and the quality and quantity of sequencing results were assessed using the EPI2ME website algorithm (https:// epi2me. nanop orete ch. com). The coverage rate of sequenced reads to the MTB reference was estimated by applying FastQ custom alignment (ONT). Analytical results of variant classification were aligned to the MTB reference (M. tuberculosis H37Rv, NC_000962.3) using Bacterial Small Variant Calling workflow composed of the Medaka variant calling pipeline with longread datasets via the EPI2ME Labs Launcher (ONT). The sequencing quality, reading depth, and variant calling with long-read datasets were repeatedly assessed using the CLC genomics workbench (Qiagen v21.0.3; CLC bio, Denmark).

Minimal inhibitory concentration (MIC) test
Tests were conducted using Sensititre MYCOTB MIC plates (ThermoFisher Scientific) according to the manufacturer's instructions. In brief, an MTB isolate was subcultured on 7H10 agar (Becton, Dickinson and Co., Sparks, MD, USA). Fresh colonies were resuspended with glass beads in a saline-Tween solution, and the turbidity was adjusted to a McFarland standard of 0.5. After 15 min, 100 µl of the resuspension was poured and mixed with 11 ml 7H9 broth containing oleic acid albumindextrose-catalase (Trek Diagnostic Systems, Cleveland, OH). Diluent (100 µl) was inoculated into each well of the MYCOTB plate that was covered with permanent plastic seals and incubated at 37 °C in 5% CO 2 . Plates were monitored on days 7, 10, 14, and 21 using a mirrored viewer. No visible growth with the lowest concentration was considered the MIC of each antibiotic.

Statistical analysis
Experimental results were statistically analyzed using a one-or two-way analysis of variance (ANOVA) followed by Tukey's multiple-comparison post-hoc test. Analytical results are presented as the mean ± standard error of the mean (SEM) and considered significant at p values of < 0.05 (*p < 0.05; **p < 0.01; ***p < 0.005). The utility of identified variants for predicting INH resistance was evaluated using the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) ratio as implemented in R programming.

Phenotypic resistance patterns of clinical MTB isolates
Drug-resistance patterns determined by the agar proportion method are displayed in Table 1. Ten isolates showed susceptibility to INH, rifampin (RIF), ethambutol (EM), and streptomycin (S). Among the 42 drug-resistant clones, 14 isolates were INH-resistant, and other isolates were multidrug-resistant (MDR)-TB (Table 1).

Statistical analysis of long-read sequencing results
The HMW gDNA extracted from the clinical isolates was subjected to sequencing using a long-read sequencer (MinION, ONT). Numbers with an average of sequenced and aligned reads per sample were filtered and analyzed using the CLC Genomics Workbench (v.21.0.2, Aarhus, Denmark) and EPI2ME desktop agent algorithm (ONT). As shown in Table 2, no statistical differences in sequencing or alignment efficiencies were identified among all groups. On average, results of the custom alignment with sequencing results using the EPI2ME desktop agent algorithm (ONT) showed more than 95% with over 200 × coverage in all samples enrolled in this study, and sequenced reads could be considered to virtually cover the entire MTB genome (Fig. 1A). Nevertheless, low coverage for a highly-repeated or homopolymeric region within the entire MTB genome was noted in this study.

Identification of non-synonymous variants within the katG and inhA genes in INH resistant isolates
The INH resistance of MTB isolates was frequently associated with a mutant katG gene, while inhA mutations were identified in a small number of INH resistant strains [8]. Results of custom alignment with the sequenced results using the EPI2ME desktop agent algorithm (ONT) showed more than 95% with over 200 × coverage to virtually cover the complete MTB genome (Fig. 1A), or over 100 × coverage to virtually cover the entire katG or inhA gene (Fig. 1B). No low coverage for a specific region within the MTB genome, of the katG, or inhA gene was noted.
After depleting the synonymous single nucleotide polymorphism (SNPs) found in all isolates enrolled in this study, five non-synonymous variants within the katG gene were identified in 10 susceptible isolates only (Table 3, upper; susceptible only). Seventeen novel variants (Table 3, upper; susceptible and resistant) and four previously-reported variants ( Table 3, upper, underlined) were characterized in susceptible or INH resistant isolates. Among the variants, the minimal confidence Arg463Leu (R463L) as previously reported was identified in 40% of susceptible isolates and 52.38% of all INH resistant isolates [9]. In addition, 55 amino acid substitutions were identified in INH resistant isolates with the long-reads sequencing results. Among the identified variants, 12 novel amino acid substitutions were identified in low-level INH resistant isolates (  Table 3, lower) [10].
In addition, 17 variants were identified within the coding region of inhA gene in susceptible isolates (Table 4, upper). Eight and ten amino acid substitutions within the coding region of inhA gene were characterized in the low-level or high-level INH resistant strains within the coding region of inhA gene. Among the identified candidates, the Asn139Lys and Asp150Glu variants were frequently identified within the inhA gene in the low-level INH resistant isolates (Table 4, lower; at a frequency of 25% and 35%, respectively). Moreover, 21 amino acid substitutions were identified within

Predictive values of identified variant toward INH resistance is estimated using receiver operating characteristic analysis
Overall agreement between the phenotypic and genotypic approaches for INH-resistance of MTB isolates was estimated in this study. Results of WGS and MIC tests showed better agreement ( Fig. 2A,

Discussion
The turnaround time of DSTs for MTB is a major challenge relevant to the efficacy of precise treatment [11]. Phenotypic methods take up to several weeks due to the low growth rate of MTB, which determines the therapeutic strategy toward MTB disease [12]. Advances in highthroughput sequencing approaches have expanded the practicality of genome analyses in clinical examinations which reduce the turnaround time of DSTs for MTB [13].  (2)  Low INH resistant isolate His24Leu(2)/ Ala58Gly(2)/ Gly83Glu(2)/ Gly102Arg(2)/ Asn139Lys(5)/ Asp150Glu (7) Met232Leu ( Among widely applied platforms, the ONT platform is suitable for sequencing the GC-rich or repetitive prolineglutamate regions within the MTB genome, which might be omitted in results of short-read sequencing [14]. Moreover, the increased accuracy of the ONT platform of assembling a precise MTB genome is achieved with a higher reading depth compared to that of the Illumina platform [7,15]. The MinION quality score does not equate to the same Phred score, a mean quality score of 20.07 regarding the per base quality score of MinION sequencing were assessed using the CLC genomics workbench in this study. In addition to well-characterized variants, emerging non-synonymous variants within the MTB katG or inhA gene were identified using the ONT platform in this study. The putative impacts of these variants on the INH-resistance signature should be further demonstrated. Ser315Thr has been widely demonstrated to be a highconfidence and frequent polymorphism that confers an INH-resistance signature of MTB [16], which was consistently characterized in our study through results of ONT sequencing. It was reported that most INH-resistant isolates harboring the Ser315Thr variant had an MIC of > 5 µg/mL [17], whereas nucleotide polymorphisms within other INH-resistant isolates with high MIC values have been sparsely investigated. In this study, results of the MIC test indicated that the presence of novel amino acid polymorphisms within the C-terminal domain of KatG conferred a high INH-resistance phenomenon.
INH-resistant isolates harboring the Thr579Asn variant consistently had the highest MIC of > 4 µg/mL, which was not previously disclosed in other studies. More functional and structural analysis should be conducted to pursue the biological impacts of the Thr579Asn variant on the KatG enzyme.
Agreement between the results of the phenotypic test and genotyping analysis is considered a critical issue when evaluating the utility of WGS results on determining the drug-resistance signature of MTB infection [18]. Phenotypic drug-resistance of MTB isolates with no resistance-related variants identified using a genotypic assay is the majority of discordances [19]. The discordance could be emphasized as follows. First, the accurate coverage or assembly of a high-quality entire MTB genome using an appropriate sequencing platform, including long read-sequencing strategy, might diminish the discordance between genotyping and phenotypic results [20]. Second, sequencing results of HMW gDNA extracted from an original sputum sample or subcultured isolate is worth further comparison and optimization [21,22]. Third, the DST analysis with MGIT 960 was demonstrated to exert incidences of false-positive drug resistance [23]. Fourth, standardization of the analytic workflow or threshold employed for the naming of nucleotide polymorphisms with sequencing results is required to acquire consistent results [24].

Conclusions
In our study, the long read-sequencing platform was subjected to SNP calling within the entire MTB genome. In addition to the high-confidence and frequent Ser315Thr mutations, the emerging Thr579Asn variant relevant to high INH-resistance was characterized using MIC and WGS analyses. Moreover, Cohen's kappa value indicated better agreement between the MIC and WGS analyses of differentiating high INHresistance of clinical MTB. These results suggested the practicable adoption of the ONT sequencing approach as an alternative to the current diagnostic method for DSTs of MTB.