A complete high quality nanopore-only assembly of an XDR Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions

A better understanding of the genomic changes that facilitate the emergence and spread of drug resistant M. tuberculosis strains is required. Short-read sequencing methods have limited capacity to identify long, repetitive genomic regions and gene duplications. We sequenced an extensively drug resistant (XDR) Beijing sub-lineage 2.2.1.1 “epidemic strain” from the Western Province of Papua New Guinea using long-read sequencing (Oxford Nanopore MinION®). With up to 274 fold coverage from a single flow-cell, we assembled a 4404947bp circular genome containing 3670 coding sequences that include the highly repetitive PE/PPE genes. Comparison with Illumina reads indicated a base-level accuracy of 99.95%. Mutations known to confer drug resistance to first and second line drugs were identified and concurred with phenotypic resistance assays. We identified mutations in efflux pump genes (Rv0194), transporters (secA1, glnQ, uspA), cell wall biosynthesis genes (pdk, mmpL, fadD) and virulence genes (mce-gene family, mycp1) that may contribute to the drug resistance phenotype and successful transmission of this strain. Using the newly assembled genome as reference to map raw Illumina reads from representative M. tuberculosis lineages, we detect large insertions relative to the reference genome. We provide a fully annotated genome of a transmissible XDR M. tuberculosis strain from Papua New Guinea using Oxford Nanopore MinION sequencing and provide insight into genomic mechanisms of resistance and virulence. Data Summary Sample Illumina and MinION sequencing reads generated and analyzed are available in NCBI under project accession number PRJNA386696 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA386696) The assembled complete genome and its annotations are available in NCBI under accession number CP022704.1 (https://www.ncbi.nlm.nih.gov/sra/?term=CP022704.1) Impact statement We recently characterized a Modern Beijing lineage strain responsible for the drug resistance outbreaks in the Western province, Papua New Guinea. With some of the genomic markers responsible for its drug resistance and transmissibility are known, there is need to elucidate all molecular mechanisms that account for the resistance phenotype, virulence and transmission. Whole genome sequencing using short reads has widely been utilized to study MTB genome but it does not generally capture long repetitive regions as variants in these regions are eliminated using analysis. Illumina instruments are known to have a GC bias so that regions with high GC or AT rich are under sampled and this effect is exacerbated in MTB, which has approximately 65% GC content. In this study, we utilized Oxford Nanopore Technologies (ONT) MinION sequencing to assemble a high-quality complete genome of an extensively drug resistant strain of a modern Beijing lineage. We were able to able to assemble all PE/PPE (proline-glutamate/proline-proline-glutamate) gene families that have high GC content and repetitive in nature. We show the genomic utility of ONT in offering a more comprehensive understanding of genetic mechanisms that contribute to resistance, virulence and transmission. This is important for settings up predictive analytics platforms and services to support diagnostics and treatment.


Introduction
Globally, the tuberculosis (TB) incidence rate has shown a slow decline over the last two 65 decades, although absolute case numbers continue to rise due to population growth, with an is the induction of efflux pumps, which may lead to high level resistance in mycobacteria (14), 85 without any metabolic compromise. While previous studies described efflux pumps genes and 86 identified mutations in some of these genes (15, 16), efflux pump a transmissible XDR strain 87 have not been described. 88 Whole genome sequencing using short-reads has elucidated a large number of mutations 89 associated with drug resistance, as well as compensatory mutations, but has limited capacity to

157
Adaptor-ligated DNA was purified using 0.4X Agencourt® AMPure® XP beads (Beckman 158 Coulter) following manufacturers' instructions but using Oxford Nanopore supplied buffers 159 (adaptor bead binding and elution buffers). The library was ready for MinION ® sequencing. Oxford Nanopore and loaded onto the flow cell following manufacturers' instructions, choosing 164 a 48h sequencing procedure. Illumina data for the strain was available from our previous study. 166 Raw files generated by MinKNOW were base called using Albacore (v2.0) to return 167 Oxford Nanopore Technologies (ONT) fastq files. De novo genome assembly was performed 168 using Canu (30) and the assembly was improved using consensus with nanopolish (metlylation 169 aware option) (31) and PILON (32). The assembly was circularized using Circulator v1.5.1 (33) 170 and compared with the reference genome H37Rv (NC_000962.3) using MUMMER (34).

171
Genome annotation was performed using the NCBI pipeline (35) and circular representation of

202
No Illumina reads covered the PE_PGRS sub-family genes wag22 and PE_PGRS57. 203 We first evaluated SNP calling in the non-repetitive part of the reference genome by  with susceptibility to only amikacin, kanamycin, para-aminosalicyclic acid (PAS) and 226 cycloserine. Table 1 reflects phenotypic resistance, as well as mutations in genes known to 227 confer drug resistance to first and second line drugs and recognized compensatory mutations.

228
Genotypic drug resistance profiles concurred with phenotypic results. While 10 mutations were 229 identified in seven genes that encode trans-membrane efflux pumps and transporter proteins 230 (Table 2). Table 3 shows 16 SNPs identified in genes that encode virulence proteins; 8 (50%) 231 were from the mce-gene family and a mutation within mycP1 (p.Thr238Ala) was also noted. In  (Table S3). with the insertion revealed PE8-PPE15 as template to construct to predict the protein structure as 249 a PPE family protein (79% sequence modelled, 100% confidence) consisting of 73% alpha 250 helices (Fig. S6). A blast search of this gene sequence revealed a 50% query coverage to four 251 Mycobacterium tuberculosis H37Rv genomes (100% identification) and 100% query coverage to 252 55 Mycobacterium tuberculosis, Lineage 2 genomes (Table S4).

254
In this study, we utilized Oxford Nanopore MinION sequencing to assemble a     In conclusion, the assembly of a complete genome of a XDR "epidemic strain" using 333 nanopore technology did not only provide proof of principle for future deployment of this