Short- and Long-Read Sequencing Reveals the Presence and Evolution of an IncF Plasmid Harboring blaCTX-M-15 and blaCTX-M-27 Genes in Escherichia coli ST131

ABSTRACT Escherichia coli sequence type 131 (ST131) has contributed to the spread of extended-spectrum beta-lactamase (ESBL) and has emerged as the dominant cause of hospital- and community-acquired urinary tract infections. Here, we report for the first time an in-depth analysis of whole-genome sequencing (WGS) of 4 ESBL-producing E. coli ST131 isolates recovered from patients in two hospitals in Armenia using Illumina short-read sequencing for accurate base calling to determine their genotype and to infer their phylogeny and using Oxford Nanopore Technologies long-read sequencing to resolve plasmid and chromosomal genetic elements. Genotypically, the four Armenian isolates were identified as part of the H30Rx/clade C2 (n = 2) and H41/clade A (n = 2) lineages and were phylogenetically closely related to isolates from the European Nucleotide Archive (ENA) database previously recovered from patients in the United States, Australia, and New Zealand. The Armenian isolates recovered in this study had chromosomal integration of the blaCTX-M-15 gene in the H30Rx isolates and a high number of virulence genes found in the H41 isolates associated with the carriage of a rare genomic island (in the context of E. coli ST131) containing the S fimbrial, salmochelin siderophore, and microcin H47 virulence genes. Furthermore, our data show the evolution of the IncF[2:A2:B20] plasmid harboring both blaCTX-M-15 and blaCTX-M-27 genes, derived from the recombination of genes from an IncF[F2:A−:B−] blaCTX-M-15-associated plasmid into the IncF[F1:A2:B20] blaCTX-M-27-associated plasmid backbone seen in two genetically closely related H41 Armenian isolates. IMPORTANCE Combining short and long reads from whole-genome sequencing analysis provided a genetic context for uncommon genes of clinical importance to better understand transmission and evolutionary features of ESBL-producing uropathogenic E. coli (UPEC) ST131 isolates recovered in Armenia. Using hybrid genome assembly in countries lacking genomic surveillance studies can inform us about new lineages not seen in other countries with genes encoding high virulence and antibiotic resistance harbored on mobile genetic elements.

Studies reporting on the transmission and the genetic makeup of E. coli ST131 are well documented globally, including those that have used a combination of standard molecular techniques such as PCR and whole-genome sequencing analysis using short-read sequencing platforms with high throughput and accurate base calling to infer bacterial genotype and phenotype (11,(14)(15)(16). Furthermore, advances in long-read sequencing have enabled us to resolve repetitive elements so that we can identify the structure and positioning of mobile genetic elements (MGE) in chromosomes and plasmids, such as antibiotic resistance and virulence genes (17)(18)(19)(20). However, such genomic studies reporting whole-genome analysis of the E. coli ST131 lineage recovered in health care settings (including patients) in Armenia are absent. In this study, we report for the first time the whole-genome sequencing and analysis of the ST131 lineage of ESBL uropathogenic E. coli (UPEC) ST131 isolates recovered from patients' urine specimens in two hospitals in Armenia. To conduct the genomic analysis on Armenian ESBL-producing UPEC ST131, we used both Illumina and Oxford Nanopore Technologies (ONT) sequencing technologies to provide an in-depth genomic analysis identifying novel or rare genetic features within the Armenian isolates.

RESULTS
Isolates and antibiotic susceptibility testing. Four of 12 ESBL-producing UPEC isolates were identified as ST131 isolates (see Table S1 in the supplemental material). The four isolates identified as ST131 were received from medical microbiology laboratories (they were recovered from urine specimens from patients) of two hospitals, designated H4 (n 5 2) and H5 (n 5 2), in Yerevan, Armenia, between January 2019 and April 2019 (Table 1). These isolates were typed as O16:H5-fimH41 (ARM32 and ARM86) and O25b:H4-fimH30 (ARM42 and ARM46). All four Armenian E. coli ST131 isolates were resistant to the b-lactam antibiotics ampicillin and amoxicillin-clavulanic acid, the cephalosporin antibiotics ceftazidime and cefepime, and the fluoroquinolone antibiotics norfloxacin and levofloxacin. In addition, two isolates (ARM86 and ARM34) were resistant to the b-lactam antibiotic piperacillin-tazobactam; one isolate (ARM32) possessed intermediate resistance to piperacillin-tazobactam; one isolate (ARM46) was resistant to chloramphenicol; three isolates (ARM32, ARM46, and ARM86) possessed intermediate resistance to the aminoglycoside antibiotic amikacin; and one isolate (ARM46) possessed intermediate resistance to the carbapenem antibiotic imipenem. However, all Armenian E. coli ST131 isolates were sensitive to meropenem (a carbapenem antibiotic).
Phylogenetic analysis of ST131. To compare the Armenian E. coli ST131 isolates with those previously reported, a core SNP maximum-likelihood (ML) phylogenetic tree from 11,386 SNP sites with 99 recombination sites filtered out was constructed by aligning the short-read assemblies of our isolates to 2,496 E. coli ST131 isolate draft genomes retrieved from the European Nucleotide Archive (ENA) database that had been recovered from 28 countries (Fig. 1). These isolates were recovered from seven different sources (agriculture animals, avians, domestic animals, the environment, humans, meat, and wild animals), identified as belonging to five clades, 13 serotypes, and 27 fimH types (Table 2). Two Armenian fimH30 isolates (ARM42 and ARM46) were found within clade C2 and were phylogenetically closely related to human isolates recovered from the United States. Specifically, ARM42 was related to an isolate recovered from a blood sample in the United States (DABYPR01), and ARM46 was related to two isolates recovered from human stool samples in New York (DABHLM01 and DABHLJ01), as well as two isolates recovered from humans in New Zealand (ARM42 was related to DABNRT01, and ARM46 was related to DABNRO01, an isolate recovered from a blood sample). All isolates harbored the bla CTX-M-15 gene. The Armenian fimH41 isolates (ARM32 and ARM86) were found to belong to clade A and were phylogenetically closely related to human isolates from the United States that were recovered from urine samples in Pittsburgh, PA (DABAPB01), and the Northeast (DABYPM01), as well as human isolates from Australia that were recovered from fecal samples (DADOMG01 and DABGRC01) and a sample of lung fluid (DADPDJ01). Temporal analysis of the E. coli ST131 phylogenetic tree showed no temporal signal within the data set ( Fig. S1), with a correlation coefficient of 0.2. Accessory genome analysis revealed that 46,216 of 49,081 genes shared between 2,500 E. coli ST131 isolates were accessory genes. The t-SNE (t-distributed stochastic neighbor embedding) plot revealed that the majority of E. coli ST131 isolates shared many accessory genes with isolates from the same clade, indicating that accessory genes are clonally linked (Fig. 2). We found that isolates belonging to Armenian clade A (ARM32 and ARM86) shared many accessory genes (r 5 0.93, Z score 51.94, P 5 0.02) compared to the Armenian clade C2 isolates (ARM42 and ARM46) (r 5 0.88, Z score 5 0.97, P 5 0.16) (Table S4). ARM42 had a significantly high correlation in its accessory genome (r . 0.93, Z score 51.94, P , 0.02) with an isolate recovered from a blood sample in Canada (DABRYR01) and an isolate recovered from a human in New Zealand (DABNRT01).
Virulence gene comparisons. Overall, we identified 191 virulence genes within 2,500 E. coli ST131 isolates, with the mean number of virulence genes being 87 (range, 50 to 117) (Table S5). Among the Armenian isolates, we found that those (ARM32 and ARM86) belonging to clade A had 104 and 105 virulence genes, respectively, which was significantly higher (P , 0.01) than the mean number of virulence genes in clade A (mean, 87; range, 60 to 105). Furthermore, among the Armenian isolates, we also found that two isolates (ARM42 and ARM46) belonging to clade C2 had 88 and 79 virulence genes, respectively, similar to the mean number of virulence genes in clade C2 (mean, 87; range, 51 to 112).
By constructing the virulence-associated-gene hierarchy cluster heat map, we determined that the Armenian isolates (ARM32 and ARM86) found in clade A clustered with clade C1 isolates DADOVK01 (Jaccard similarity index 5 0.87), which was recovered from a rectal swab in Australia, and DABIKV01 (Jaccard similarity index 5 0.84) and DABIJI01 (Jaccard similarity index 5 0.86), which were recovered from a urine sample in Thailand (Fig. 3A). We found that the microcin H47 operon genes mchB, mchC, mchF, and mcmA, the S-fimbrial operon genes sfaB, sfaC, sfaD, sfaE, sfaF, sfaG, sfaH, and sfaY, the salmochelin operon, iroBCDE, and iroN were present in clade A Armenian isolates but were absent in other clade A isolates (n 5 375) in the ENA database. Furthermore, the microcin H47 operon genes were identified in only one other ST131 isolate (DADZMQ01), recovered from Denmark. The S-fimbrial operon and salmochelin operon were present in only 0.2% and 4.3% of the ENA ST131 isolates, respectively. Within clade C2, ARM42 clustered with isolates designated DABNBN01 (Jaccard similarity index 5 0.96) and DABXUG01 (Jaccard similarity index 5 0.95) (recovered from urine in the United States) (Fig. 3B), whereas ARM46 clustered with a clade A isolate, DADRBK01 (Jaccard similarity index 5 0.89) (recovered from urine in New Zealand) (Fig. 3C).
Synteny analysis of the Armenian isolates' plasmids revealed that ARM32p had high genetic similarities with both ARM86p1 and ARM86p2 (Fig. 7). Further analysis of ARM32p identified two regions of reverse synteny with ARM86p2 adjacent to each other and flanked by genomic sequences that had synteny to ARM86p1. These regions of synteny consisted of ARM86p2 between its IncFII replicon and IS15 and contained the ARM86p2 antibiotic resistance genes bla CTX-M-15 , aac(3)-IIe, cat3B, bla OXA-1 , and aac(69)-Ib-cr6. The other region of synteny with ARM86p2 contained the integrating conjugative element (ICE)-associated genes. There were multiple regions of synteny to ARM86p1 within the ends and middle sections of the same ICE-associated gene region. There was also a change in sequence orientation of the ARM86p IS26-bla CTX-M-27 -IS903B-IS15 region in ARM32p. ARM32p had all the antibiotic genes found in ARM86p2 but had only 6 of the 10 antibiotic resistance genes found in ARM86p1; it was missing tetA, APH(6)-Id, APH(39)-Ib, and sul2. Moreover, ARM32p possessed 3 copies in direct repeat of a genetic region found in ARM86p1 consisting of IS26, ISKpn11, the AAA family ATPase gene, aac(3), IIe, and IS26.
Our data analysis showed high genetic similarities of E. coli ST131 draft genomes obtained from the ENA database to the Armenian isolates' plasmids. We found that ARM42p1 had a high score (.0.999) of genetic identity to clade C2 ENA isolates that harbored the IncF [F31/36:A4:B1] genotype and had the same antimicrobial resistance (AMR) as ENA isolates recovered from Japan, Canada, the United Kingdom, New Zealand, Australia, and the United States. The Japanese isolate BGZA01 had the highest genetic similarity contained in its genome, whereas the Canadian isolate DABABP01 (recovered from a blood sample) had the highest BLAST coverage (0.93) relative to ARM42p1. ARM46p1 had a high score (.0.999) of genetic identity to clade C2 ENA isolates that possessed the IncF[F1:A1:B16] genotype and had the same AMR genes as ENA isolates recovered from France, Canada, the United Kingdom, New Zealand, Australia, and the United States. The U.S. isolate AATDBO01

DISCUSSION
In this study, we report for the first time the antibiotic resistance profiles and genetic features of four UPEC ST131 isolates recovered from patients in Armenia, where genomic surveillance is lacking. By combining the short-read sequencing for high base calling accuracy to infer Armenian isolates' genotype and phylogeny and long-read sequencing to resolve repetitive elements and identify structures and positioning of MGE within the chromosome and plasmid, we were able to perform an in-depth genomic analysis of the Armenian E. coli ST131 isolates (20). We found that two Armenian ST131 isolates were genotyped as H30Rx/ clade C2 (ARM42 and ARM46), one of the most prevalent E. coli ST131 genotypes that cause UTI globally (2,21). The other two isolates were genotyped as H41/clade A (ARM32 and ARM86), which has recently emerged as one of the most common ST131 genotypes causing UTI in Australia, New Zealand, and China (15,22,23). Intriguingly, ARM32 and ARM86 are phylogenetically closely related to each other and share many of the same accessory genes but were recovered from patients in two different hospitals, which may indicate a wide community transmission of this strain. Furthermore, we found that the Armenian isolates were phylogenetically more closely related to isolates recovered from Australia, New Zealand, and the United States and were more likely to share accessory genes such as those for virulence factors and antibiotic resistance genotypes than isolates recovered from other countries. Moreover, using long-read sequencing, we were able to determine that genetic elements in the ENA isolates (those that were recovered from countries such as Australia, New Zealand, and the United States) had high genetic similarities with the Armenian isolates. We hypothesize that because there are large Armenian diasporas in Australia, New Zealand, and especially the United States, with close family links in Armenia, it seems plausible that the transmission could have occurred through members of these Armenian communities (24,25). The antibiotic resistance phenotypes of Armenian ST131 isolates were consistent with that of ST131 isolates except in the case of ARM46, which demonstrated intermediate resistance to the carbapenem antibiotic imipenem (26,27). Antibiotic resistance genotyping also showed that carbapenem and chloramphenicol resistance was absent in ARM46, potentially indicating that it may have developed resistance via an unknown mechanism. Moreover, we were unable to determine the antibiotic resistance genotype for amikacin and piperacillin-tazobactam in ARM46, which, however, was determined for other Armenian isolates due to the identification of associated aac(69)-Ib-cr and bla OXA-1 genes. For all isolates, ESBL production was linked to the bla CTX-M-15 gene, which was plasmid borne in Armenian H41 isolates and chromosomal in H30Rx isolates. Chromosomal bla CTX-M-15 is generally considered rare in E. coli ST131 isolates, as IncF bla CTX-M-15 plasmids are very stable within the population even in the absence of antibiotic pressure (28)(29)(30). However, the general prevalence of chromosomal bla CTX-M-15 is still relatively unknown in whole-genome sequencing studies due to the use of short-read sequencing platforms, which have difficulties in differentiating between plasmid-borne and chromosomal genetic features in comparison to long-read sequencing (17,31). The chromosomal integration of bla CTX-M-15 in ARM46 was related to ISEcp1-bla CTX-M-15 , which previously was identified as chromosomal in ST131 isolates recovered in Zambia and Ireland (30,32), whereas the chromosomal integration of bla CTX-M-15 in ARM42 was found within an IS26 antibiotic resistance genomic island which was also present in ST131 isolates' chromosomes (obtained from the NCBI nucleotide database). Moreover, this IS26 antibiotic resistance island was also present in the Armenian H41 isolates' plasmids, which may suggest that this resistance genomic island was common among the Armenian ST131 isolates.
Moreover, we were able to determine that the Armenian H41 isolates harbored both :B20] plasmid, most likely from a recent recombination event between the plasmids' IS26 antibiotic genomic islands (33,34). Recombination events between IncF plasmids have been reported; however, the movement of the bla CTX-M-15 gene into the bla CTX-M-27 plasmid backbone in the E. coli ST131 genetic background is undocumented (13,35,36). The fusion plasmid of ARM32 also harbored many of the antibiotic resistance genes on the two plasmids found in ARM86, which may make the horizontal transfer of multiple antibiotic resistance genes easier. Furthermore, we found an increased copy number, detected by long-read sequencing with the ability to resolve repetitive elements, of the aminoglycoside antibiotic resistance genes aac(3)-IIe (1 copy in ARM86 and 3 copies in ARM32), which has been shown to increase the MIC of tobramycin (37)(38)(39). The evolution observed between these ARM32 and ARM86 plasmids demonstrates that recombination of bla CTX-M-15 can occur in the bla CTX-M-27 plasmid backbone within ST131-fimH41 backgrounds, which has not been seen before in other ST131 lineages, as they are restricted to carrying certain IncF plasmid types (13). Only one ENA isolate (DADOWZ01) that harbored both bla CTX-M-15 and bla CTX-M-27 genes possessed the IncF[F2:A2:B20] type, which suggests that this plasmid genotype may be the only genotype that can harbor both bla CTX-M-15 and bla CTX-M-27 . Moreover, as antibiotic resistance genes within IncF plasmids are acquired via homology recombination with common features in the IncF plasmid rather than via transposition, the development of the IncF[F2:A2:B20] genotype is most likely to occur due to allele exchange of two different IncF plasmids' IncFII replicons (34,36,40).
The Armenian H41 isolates harbored more virulence genes than ENA isolates that belonged to the same genotype. In particular, the Armenian isolates harbored a genomic island which contained the S-fimbrial operon (encoding adhesin factor), the salmochelin operon (encoding an iron siderophore), and a microcin H47 operon (which encodes a bacteriocin that inhibits other enteric pathogens) (41); this genomic island is known but uncommon in ST131 isolates. Previous reports of isolates harboring the microcin H47salmochelin-S-fimbrial genomic island generally showed increased virulence, suggesting that these genes provide a fitness advantage for E. coli, contributing to its survival and acquisition of virulence factors (42,43). Moreover, salmochelin and microcin H47 genes may have a role in promoting urinary tract colonization, as observed by the upregulation of salmochelin and microcin H47 genes when isolates are grown in urine (42)(43)(44)(45). As ST131 isolates are considered the most common sequence type causing UTI, the lack of this pathogenicity island suggests that these virulence factors may not be entirely necessary for them to be successful in causing UTI. Moreover, the mechanism of the success of ST131 in causing UTI compared to other E. coli sequence types is still relatively unknown (46)(47)(48)(49).
In summary, we report for the first time genetic features and transmission of ESBLproducing E. coli ST131 isolates recovered in Armenia using both short-and long-read sequencing technologies to provide in-depth genomic analysis to better understand Armenian isolates' genotype, phylogeny, and MGE structure and position within the chromosome and plasmid. Although we had only 4 isolates belonging to two different ST131 genotypes, we were able to identify the close relatedness of these isolates to those recovered in the United State, Australia, and New Zealand. Our data show that Armenian isolates included in this study possessed genetic features not commonly identified in other E. coli ST131 isolates, including the bla CTX-M-15 gene integrated into the ST131-H30Rx isolates' chromosomes and the acquisition of the pathogenicity island containing the S fimbrial, salmochelin siderophore, and microcin H47 virulence genes in ST131-H41 isolates. Moreover, using hybrid assembly of short-and long-read sequencing in ST131-H41 isolates, we detected the formation of IncF

MATERIALS AND METHODS
Bacterial isolation and identification. Twelve ESBL-producing UPEC isolates were received from medical microbiology laboratories of five hospitals in Armenia between January 2019 and August 2019 (Table S1) (50). All isolates were recovered from urine specimens from hospitalized patients. The isolates were identified as E. coli using matrix-assisted laser desorption ionization-time-of-flight mass spectroscopy (MALDI-TOF MS) as described previously (51). Four of 12 isolates were identified as E. coli ST131 by multilocus sequence typing (MLST) from whole-genome sequencing using mlst (https://github.com/ tseemann/mlst [accessed July 2021]) and were selected for use in this study.
Genome sequencing and assembly. The 12 E. coli isolates were whole-genome sequenced using the Illumina HiSeq platform. Genomic DNA was extracted using the TIANamp bacterial DNA kit (Tiangen, China), and paired-end sequencing libraries were constructed using Nextera XT DNA sample preparation kits or TruSeq DNA HT sample preparation kits (Illumina, USA) following the manufacturer's instructions.
The quality of short reads was checked using FastQC, with low-quality reads trimmed by Trimmomatic (53). Trimmed reads were de novo assembled using SPAdes (54).
Four of 12 isolates that were identified as E. coli ST131 were further sequenced for detailed chromosomal and plasmid genetic investigation using the Oxford Nanopore Technologies MinIon 1B sequencing platform. Genomic DNA for long-read sequencing was extracted using a Qiagen MagAttract HMW DNA kit (Qiagen, Germany), and the long-read library was constructed using a ligation sequencing kit (SQK-LSK109) (Oxford Nanopore Technologies, UK) and run on an R9.4.1 flow cell (Oxford Nanopore Technologies, UK). Guppy v6.3.4 was used for long-read base calling, filtering out reads with a Q score of , 9 (55).
Long and short reads were combined for hybrid genome assembly using two different assembly pipelines depending on the sequencing depth of long reads. All isolates were assembled using the Unicycler assembly pipeline for the detection of plasmids less than ,10 kb (56,57). The genome of one isolate (ARM42) was assembled using only Unicycler. Isolates ARM32, ARM46, and ARM86 (which had a long-read sequence depth greater than 50Â) were also assembled using Canu, Flye, Miniasm1Minipolish, Raven, NECAT, and NextDenovo/NextPolish (58)(59)(60)(61)(62)(63). A consensus of the long-read assemblies was constructed via Trycycler (64). The final Trycycler assemblies were polished with the short-read sequences of the same isolates using Polypolish and POLCA (65, 66).
Short-and Long-Read Sequencing of E. coli ST131 Microbiology Spectrum Genome selection for phylogenetic and genomic comparison. A total of 2,496 draft E. coli ST131 genomes obtained from the ENA database were selected for genomic comparison (accessed January 2022) ( Table S3). The isolates were selected based on the criteria that they (i) belonged to ST131 as determined with the mlst (https://github.com/tseemann/mlst accessed July 2021) typing scheme (PubMLST) (accessed January 2022) and (ii) were accompanied by data on their isolation, including date, source, and country.
Phylogenetic tree analysis. To reconstruct the core SNP maximum-likelihood (ML) phylogenetic tree, the short sequence reads of E. coli ST131 isolates recovered from Armenia and the isolates from the ENA database were first aligned with the ST131 reference genome EC958 (accession no. HG941718.1) using Snippy (https://github.com/tseemann/snippy). Recombination within the genome was filtered out using Gubbins (67). A phylogenetic tree was constructed using IQ-TREE v2.1.2 with the best model selected by ModelFinder and with ultrafast bootstrap replication set to 1,000 (68)(69)(70). The phylogenetic tree was visualized and annotated using iTOL (71). The temporal signal of the phylogenetic tree was determined by TempEst (72).
Genome annotation. E. coli ST131 isolate genomes were annotated using Prokka (73). The serotype was identified using ECtyper (74). Virulence-associated genes for each isolate were screened using ABRicate software (https://github.com/tseemann/ABRicate) in conjunction with the VirulenceFinder and VFDB database combined for virulence detection. Antibiotic resistance genotype was identified using Resistance Gene Identifier (RGI) software in conjugation with the Comprehensive Antibiotic Resistance Database (CARD) (accessed January 2022) (75). Antibiotic resistance related to deletion of outer membrane protein genes was also detected (76,77). A hierarchy cluster heat map of isolated antibiotic resistance genotypes and virulence-associated genes was constructed using the R package Pheatmap. The fimH allele type, presence of the ybbW G723A SNP in H30Rx isolates, and IncF replicon sequence type (RST) were determined using SKA (https://github.com/simonrharris/SKA) in conjunction with the FimTyper and pMLST IncF RST database (accessed August 2022) (78)(79)(80).
Genetic synteny and plasmid similarity analyses. Plasmid synteny between Armenian isolates was identified from the hybrid genome assembly using minimap2 and visualized using the R package gggenomes (https://github.com/thackl/gggenomes) (87). Mash Screen was used to determine whether the E. coli ST131 isolates' draft genomes contained sequences matching those in the hybrid assembly of the Armenian isolates' plasmids, with only those with a Mash Screen identity of .0.995 reported (88). Pyani was used to measure BLAST coverage of the sequence in the ENA database to the Armenian isolates' plasmids (89).
Statistical analysis. Student's t test was used to determine the significant difference between the number of virulence and antibiotic resistance genotypes between isolates. Z-scored normalization was used to determine the significance between accessory gene correlation data. A Jaccard similarity index was used to determine the similarity coefficient between two isolates' virulence genes and antibiotic resistance gene profiles.
Data availability. The short-and long-read sequencing data generated in this study were deposited in the ENA database under the accession numbers ERS14467679 (ARM32), ERS14467680 (ARM42), ERS14467681 (ARM46), and ERS14467682 (ARM86). Individual accession numbers for the genome sequencing data are included in Table S2. Genome assemblies of the isolates are included in Table S2 and deposited in the ENA database under the accession numbers GCA_951802855 (ARM32), GCA_955652485 (ARM42), GCA_951803545 (ARM46), and GCA_951802865 (ARM86). Both Illumina short reads and Oxford Nanopore long reads were uploaded to the ENA repository (project PRJEB51925) (Table S2).