Genomic epidemiology of Escherichia coli isolates from a tertiary referral center in Lilongwe, Malawi

Antimicrobial resistance (AMR) is a global threat, including in sub-Saharan Africa. However, little is known about the genetics of resistant bacteria in the region. In Malawi, there is growing concern about increasing rates of antimicrobial resistance to most empirically used antimicrobials. The highly drug resistant Escherichia coli sequence type (ST) 131, which is associated with the extended spectrum β-lactamase blaCTX-M-15, has been increasing in prevalence globally. Previous data from isolates collected between 2006 and 2013 in southern Malawi have revealed the presence of ST131 and the blaCTX-M-15 gene in the country. We performed whole genome sequencing (WGS) of 58 clinical E. coli isolates at Kamuzu Central Hospital, a tertiary care centre in central Malawi, collected from 2012 to 2018. We used Oxford Nanopore Technologies (ONT) sequencing, which was performed in Malawi. We show that ST131 is observed more often (14.9% increasing to 32.8%) and that the blaCTX-M-15 gene is occurring at a higher frequency (21.3% increasing to 44.8%). Phylogenetics indicates that isolates are highly related between the central and southern geographic regions and confirms that ST131 isolates are contained in a single group. All AMR genes, including blaCTX-M-15, were widely distributed across sequence types. We also identified an increased number of ST410 isolates, which in this study tend to carry a plasmid-located copy of blaCTX-M-15 gene at a higher frequency than blaCTX-M-15 occurs in ST131. This study confirms the expanding nature of ST131 and the wide distribution of the blaCTX-M-15 gene in Malawi. We also highlight the feasibility of conducting longitudinal genomic epidemiology studies of important bacteria with the sequencing done on site using a nanopore platform that requires minimal infrastructure.


INTRODUCTION
Antimicrobial resistance (AMR) is one of the most serious global public health threats [1]. Of specific concern are Enterobacterales (formerly the Enterobacteriaceae) that are resistant to thirdgeneration cephalosporins such as ceftriaxone. The World Health Organization (WHO) has designated ceftriaxone-resistant Enterobacterales as a critical priority [2]. In sub-Saharan Africa (SSA), there is growing evidence that ceftriaxone-resistant Enterobacterales are important pathogens in invasive infections such as bacteremia. Given that ceftriaxone is often used to treat severe infections in SSA, and carbapenems are not often available, this is a major concern. OPEN ACCESS Among the Enterobacterales, Escherichia coli (E. coli) is a common cause of invasive disease, accounting for between 3 and 33 % of positive blood cultures in case series in Africa [3][4][5][6][7][8]. E. coli is becoming more resistant to commonly used antibiotics in SSA, including ceftriaxone. Additionally, recent evidence indicates that the highly drug resistant E. coli sequence type (ST) 131 has been increasing in prevalence globally [9][10][11]. This sequence type is an extraintestinal pathogenic E. coli (ExPEC) that is associated with bloodstream and urinary tract infections, often possessing genes associated with extended-spectrum β-lactamases (ESBLs) [12,13]. The main mechanism of cephalosporin resistance is drug inactivation mediated by hydrolysis of the β-lactam ring by ESBL enzymes. Cefatoxamine-resistance Munich (CTX-M) derivatives are the dominant and most widely distributed ESBL enzymes among E. coli [14]. CTX-M-15 is strongly associated with ST131 [15][16][17][18]. The global spread of ESBL-E. coli is largely attributed to the dissemination of E. coli strains carrying the bla CTX-M- 15 gene, especially E. coli O25b:H4-ST131 [11]. Previously, three major lineages of ST131 have been identified that differed mainly with respect to their fimH alleles: A (mainly fimH41), B (mainly fimH22) and C (mainly fimH30) [19]. Clade C has predominated since the 2000s, corresponding with the rapid dissemination of the bla CTX-M-15 allele [11,19,20]. There is growing evidence in SSA that ST 131 CTX-M-15 E. coli strains are increasing, but there remains a limited number of studies assessing the clonality of E.coli, the distribution of ST131 and the presence of bla CTX-M- 15 genes using whole genome sequencing [21][22][23][24].
In Malawi, specifically, there has been growing concern about increasing rates of antimicrobial resistance to most empirically used antimicrobials [25,26]. A recent genomic epidemiology study of 94 E. coli isolates collected at Queen Elizabeth Central Hospital, a tertiary care centre in southern Malawi, and analysed by whole genome sequencing (WGS), has shown that ST131 is the most common ST in southern Malawi at 14.9% of isolates sequenced. CTX-M-15 was found in 21.4% of ST131 isolates, but occurred across 11 STs [27]. The purpose of our study is to increase our understanding of the genomic epidemiology of E. coli in Malawi by conducting WGS, using Oxford Nanopore Technologies (ONT) sequencing performed in Malawi, of isolates collected at a tertiary care hospital in central Malawi. We use this data to define the clonality, virulence genes and antimicrobial resistance genes in the central region of the country and to compare these results with those for southern Malawi.

Sample selection
Sixty isolates were selected from the UNC Project-Malawi archives for sequencing. Samples were collected between 2012 and 2018. Organisms were isolated using sheep blood agar (trypticase soy agar prepared with 5% defibrinated sheep blood) and MacConkey agar (selective and differential medium for Gram-negative rods) as primary culture media. Depending on the year of sample collection, identification was done either using conventional biochemical tests (TSI, indole, citrate and urease) and or Analytical Profile Index (API) 20E, a biochemical panel for identification and differentiation of members of the family Enterobacteriaceae. Isolates were selected for this study on the basis of diversity of phenotypic resistance pattern and clinical source of isolation, similarly to a previous study in Malawi [27]. One isolate was excluded from further analysis due to poor sequencing coverage, and one was excluded as it was identified as Klebsiella, resulting in 58 isolates included in all analyses.

Antimicrobial resistance testing
The Kirby-Bauer disc diffusion method was used to measure the in vitro susceptibility of bacteria to antimicrobial agents at the UNC Project-Malawi microbiology laboratory at the time of isolate collection. Results were obtained with disc diffusion tests that use the principle of standardized methodology and zone diameter measurements correlated with minimum inhibitory concentrations (MICs) with strains known to be susceptible, intermediate, and resistant to various antibiotics. All aspects of the procedure were standardized as recommended by the Clinical and Laboratory Standards Institute (CLSI) in the document 'Performance Standards for Antimicrobial Susceptibility Testing' [28].

Whole genome sequencing
Overnight cultures of the isolates were grown in 5 ml of LB broth at 37 °C. Cell pellets from the broth culture were recovered from 1.5 ml centrifuged at 10 000 g for 2 min. Cell pellets were resuspended in 100 µl of nuclease-free water. DNA was extracted from the resuspended pellet using the Zymo Quick-DNA Microprep Kit (cell suspensions protocol) as per the manufacturer's instructions (Zymo Research). The DNA was quantified using the dsDNA kit on a Qubit 2.0 (Thermo Fisher). Equal amounts of DNA (~100 ng) from each isolate were used for library preparation using the Rapid Barcoding

Impact Statement
Antimicrobial resistance is a global public health emergency. Although rates of resistance are high in Africa, little is known about the genetics and resistance mechanisms of clinically important bacteria. Here we characterize the molecular epidemiology of Escherichia coli isolates from a tertiary referral hospital in Malawi and compare these with historical isolates from the same country. Consistent with their global expansion, we show that ST131 is observed more often and that the bla CTX-M-15 gene is occurring at a higher frequency between studies. However, phylogenetics indicates that isolates are highly related between the central and southern geographic regions. This study highlights the feasibility of conducting longitudinal genomic epidemiology studies of important bacteria with the sequencing done on site using a nanopore platform that requires minimal infrastructure.
(RBK-004) as per the manufacturer's protocol (Oxford Nanopore Technologies). Pooled libraries of 12 isolates were run on a R9.4.1 flow cell on a MinION/MinIT for 24 h at UNC Project-Malawi. Each flow cell was washed once per protocol and a second set of 12 isolates were run for an additional 24 h or until all pores were exhausted.

Typing of isolates
Each assembly was aligned using minimap2 to databases of known serotypes, fimH type, genomic and plasmid sequence types including plasmid incompatibility, virulence factors, and antimicrobial resistance genes, listed in Table S1 (available in the online version of this article) [35][36][37][38][39][40]. Matches were made for each assembly using similar cutoffs to those used by the Centre for Genomic Epidemiology (CGE; https:// cge. cbs. dtu. dk/ services/ data. php) tools. Specifically, we required 60% of the feature to match at >90% sequence identity to serotype markers (fliC, wzx/wzy), fimH variant, virulence genes, antimicrobial resistance genes, and plasmid incompatibility group markers. For multi-locus sequence type (MLST) and fimH type, we report the closest hit (or multiple hits in case of a tie). Assembled contigs were identified as plasmids if they were found to contain at least one plasmid incompatibility type marker. For bla CTX-M-15 analysis, contigs >600 kbp without plasmid markers are inferred to be genomic, in all but two cases they are ~5 Mbp. No plasmid markers were found on any contig >300 kbp.

Species identification
We detected O and/or H serotypes and multi-locus sequence types in 59 of 60 presumed E. coli samples. The remaining sample appeared to have assembled well and a blast search revealed it was a strain of Klebsiella. We used this sequence as an outgroup in our phylogenetic analysis and otherwise excluded it from further analysis. An additional isolate (#31) was excluded from analysis for poor sequencing coverage, leaving 58 samples for all downstream analyses.

Phylogenomics
Assembled genomes were aligned to a set of single-copy orthologs largely conserved across Enterobacterales (BUSCO v4, https:// busco. ezlab. org/). A subset of 92 genes were selected that appear in all 58 samples that were included in the final analysis. A multiple-sequence alignment was performed for each gene using muscle (3.8.31) [41]. This concatenated alignment includes 108 278 sites, 16 729 informative SNPs. RAxML-ng was used to reconstruct a maximum-likelihood phylogenetic tree for the concatenated alignments with model parameter 'GTR+G' (Fig. 1) [42]. For the comparison to existing Malawi E. coli isolates, we used publicly available sequence data listed in Table S2. For each of these samples, a genome was assembled from Illumina sequence data with SPAdes (3.14.0) using default parameters [43]. A phylogeny incorporating these sequences was generated as described above, using a subset of 80 genes present in all 149 genomes (Fig. 2). This 92 gene alignment includes 125 994 sites and 18 282 informative SNPs. To evaluate the accuracy of these phylogenies based on a limited set of highly conserved genes in representing the global genomic phylogeny, we generated a more liberal 'core' gene alignment using Roary [44] that captures a much larger portion of the genome and found few differences (Supplemental Methods, Figures S1 and S2).

Data analysis
Data were analysed using STATA/SE version 16.1 (StataCorp LLC, College Station, TX, USA). Descriptive statistics were used to describe clinical characteristics, frequencies of gene detections, and disc diffusion results. Given the small overall sample size, two-sided Fisher's exact tests were used to assess for associations between genes detected and phenotypic resistance testing by Kirby-Bauer disc diffusion. P-values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate method [45].

Isolate characteristics and sequencing data
Five sequencing runs generated 9 291492 reads totalling 40.8 Gbp with a mean read length of 4391 bp and N50 of 8732 bp. Sequencing summary statistics across samples are described in Table 1. Clinical and patient characteristics for the isolates are summarized in Table 2. The majority were from a urinary source (59%) and were from females (74%). A significant number of isolates were from sterile sites, including blood (24%), cerebrospinal fluid (3%), and joint (3%). Patient white blood cell counts and haemoglobin levels were available for 20 of the isolates; medians and interquartile ranges are included in Table 2.
Kirby-Bauer antimicrobial susceptibility testing results are summarized in Table 3. Notably, the majority of isolates were resistant to amoxicillin and trimethoprim-sulfamethoxazole (TMP-SMX) and 57 % (33/58) were resistant to ceftriaxone. These are commonly prescribed antibiotics in the outpatient (amoxicillin and TMP-SMX) and inpatient (ceftriaxone) settings in Malawi [46].

Sequence types, H and O groups
Twenty-one different ST groups were found among 53 isolates. Five isolates could not be assigned to a unique ST group (Table S3). ST131 was the most common type identified [19 of 58 isolates (32.8%)], followed by ST410 which was present in 9 of 58 isolates (15.5%). Other STs that occurred in more than one isolate included ST69 (5.2%), ST38 (3.4%), ST617 (5.2%) and ST12 (3.4%). Of the 19 ST131, 18 of them were fimH30 and therefore clade C. One isolate was fimH27 and classified as clade B0. The fimH30 was linked to O25 and O16, while the fimH27 strain was O18 but also ST131. The population contained 23 different O groups (Table S4), with three samples being unable to be called, and 17 different H group calls (Table S5). A complete data set of genotype calls and sequencing characteristics is available in Table S6.

Population structure of E. coli in Malawi
A phylogenetic tree of the isolates with previously reported E. coli genomes from southern Malawi [27] is shown in Fig. 2. Notably, the ST131 isolates cluster into a single group and have relatively little genetic distance between them in the tree.

Genetic determinants of antimicrobial resistance
Our analysis identified 69 unique AMR genes that are known to encode proteins associated with antimicrobial susceptibility across a range of compounds (Table 4). AMR genes occurred across a range of ST and phylogenetic groups (Fig. 1).

β-Lactam and ESBL resistance
We identified 23 genes associated with β-lactam resistance, including 10 ESBL genes (Table 4). Extended-spectrum β-lactamases included bla    and bla  . Forty-eight out of 58 (83%) isolates carried at least one ESBL gene, and a total of 40 (69%) isolates carried more than one ESBL gene. The presence of any ESBL gene was associated with phenotypic resistance to ceftriaxone (P=0.004). Fifteen isolates with an ESBL gene retained phenotypic susceptibility to ceftriaxone by disc diffusion. All of these isolates each had only one of the following ESBL genes: bla EC-15 , bla EC-18 , bla EC- 19 .   (Table S2). Samples cluster by sequence type, but are well mixed between the two geographically and temporally separated studies. In the left column, blue indicates isolates from this study and grey from the previous study [17]. The rightmost column indicates the MLST where sequence types occurring more than once are assigned a unique colour (all others are left grey).

Fluoroquinolone resistance
We extracted the sequence of the gyrA gene from each isolate and translated them into protein sequences. We examined these for the presence of gyrA mutations that have previously been described in Malawi [27]. We identified the S83L mutation in 24 isolates and the D87N mutation in 22 isolates. In all cases, the D87N mutation co-occurs with the S83L mutation; in only two cases was a single mutation (S83L) observed. In addition, we identified the qnrB1 gene and the qnrS1 gene in 1 (2%) and 5 (9%) of isolates, respectively. These genes did not co-occur in isolates with gyr mutations.

Aminoglycoside resistance
We identified 13 genes associated with aminoglycoside resistance in these isolates (Table 4). Several genes were associated with gentamicin resistance by disc diffusion, aac(3)-IIa (P<0.001), aac(3)-IId (P<0.001), and ant(3″)-Ih (P=0.004). We also detected strA or strB in combination in 50 of the isolates, as has been seen in southern Malawi previously [27]. When found together, these genes confer resistance to streptomycin, which is used in tuberculosis therapy in Africa [47].

Resistance to other antimicrobials
We identified six genes associated with chloramphenicol resistance, the most common being catA1 in 11 (19%) of isolates and catB3 in 21 (36%) of isolates. We did not identify isolates with floR, which has previously been identified in southern Malawi [27]. Detection of catA1 was associated with intermediate susceptibility or resistance to chloramphenicol by disc diffusion (P=0.004), but detection of catB3 was not (P=0.613). We identified 11 genes associated with resistance to TMP-SMX, the most common being the trimethoprim resistance gene dfrA17 (60% of isolates) and the sulfonamide resistance genes sul1 (67% of isolates) and sul2 (88% of isolates). Of these three genes, only sul2 detection was associated with TMP-SMX intermediate susceptibility or resistance by disc diffusion (P=0.025); all isolates that were phenotypically intermediate or resistant to TMP-SMX had sul1 and/or sul2. We detected sul1 and sul2 in 35 of the 58 isolates. Finally, we identified a handful of resistance genes to other antimicrobials, including rifamycin, macrolides, fosfomycin, and tetracyclines (Table 4). Interestingly, the most common AMR genes detected were multidrug efflux pumps, acrF and emrD, both of which occurred in all isolates.

Plasmid incompatibility group
We identified eight different plasmid incompatibility groups ( Table 6). The most common were incF incompatibility groups, at least one variant of which (incFIA/incFIB/incFII) was found in all but seven isolates (88%). Although the majority of ST131 were in this group, it was not associated with ST131 (P=0.567) as many non-ST131 isolates had incF  Table S6 for all inc type combinations.

Virulence genes
In total, we identified 37 virulence genes (Table 7) with each isolate carrying a median of seven genes (IQR 3-10). The virulence factors spanned a range of functions, including acid resistance, adhesion, invasins, metalloproteases, and toxins. The most common gene detected was gad, which occurred in all 58 isolates and encodes glutamate decarboxylase, an enzyme linked with bacterial ability to resist environmental stresses [48]. Multiple adhesin proteins were also identified. The most common of these were papC, papH, papG-II, and IpfA. papC (36% of isolates), papH (35% of isolates), papG-II (35% of isolates) are all involved in pili function.
IpfA encodes the long polar fimbriae associated with human diarrheal disease, and occurred in 19 of 58 isolates [49]. A single protectin encoded by iss was identified and was the second most common virulence factor identified in 35 of 58 isolates. The iss (increased serum survival) gene was first identified in a human septicemic E. coli isolate and was associated with a 20-fold increase in complement resistance and a 100-fold increase in virulence toward 1-day-old chicks [50][51][52]. Multiple siderophores were identified, with the most common being iha, occurring in 29 of 58 isolates [53]. We also identified two common toxins, sat which occurred in 23 of 58 isolates and senB in 22 of 58 isolates. The secreted autotransporter toxin (sat) appears to fall within one subgroup of autotransporters recently classified as the serine protease autotransporters of Enterobacteriaceae (SPATE) family. It acts as a vacuolating cytotoxin for bladder and kidney epithelial cells [54]. The senB gene encodes the TieB protein, which may have some role in enterotoxicity of EIEC [55,56].

DISCUSSION
In this study, we sequenced 58 E. coli isolates collected at a tertiary care centre in Lilongwe, Malawi between 2012-2018. To our knowledge, this is one of the first studies from Malawi demonstrating the feasibility of performing all steps of the whole genome sequencing process, including DNA extraction,   library preparation, and the sequencing itself, on site in a local laboratory.
Relatively little data exist concerning the genomics of E. coli in Africa to date [21][22][23][24]. The isolates sequenced as part of our project collected in Lilongwe, in the central region of Malawi, are highly phylogenetically similar to those seen in a previous study conducted in Blantyre, Malawi, in the southern region [27]. Similarly to the previous report, we identified a diverse set of AMR genes, with similar genes occurring across a range of E. coli lineages in Malawi. We also provide additional information on common virulence genes associated with E. coli in Malawi, many of which are shared across a range of lineages.
In this joint phylogenetic analysis, we combined nanoporeonly assembled genomes with solely short-read-based assemblies. This approach has the potential to introduce biases in both gene prediction and single-nucleotide polymorphisms due to the divergent error profiles of the two technologies. Nanopore sequencing in particular is known to suffer from a relatively high error rate among raw reads that is dominated by short insertions and deletions and primarily in homopolymer stretches [57]. Although we sequenced to sufficient depth that the vast majority of these errors were eliminated during the consensus generation and multiple polishing steps that produced the final assembly, undoubtedly there existed some residual errors above the rate typically observed with short-read sequencing platforms like Illumina [58]. Where these platform-specific errors are shared among samples sequenced using the same platform, there is the possibility of these segregating biases making their way into the predicted phylogeny. While we observe appropriate clustering of clades by sequence type regardless of sequencing approach (Fig. 2), there is, in several cases, apparent segregation of samples within STs (notably ST131) that is consistent with either true phylogenetic structure or segregating sequencing platformspecific biases. Since the samples collected in our study and the previous study [27] are separated both temporally and geographically, it is reasonable to expect within-ST variation to be attributable to segregating biological variation, but we cannot conclusively rule out that sequencing bias contributes to this phenomenon without careful manual inspection of the segregating sites.  15 gene, has been increasing in prevalence [9][10][11][15][16][17][18] As expected, there is some subtle structure within ST131, with isolates from this study being primarily localized on a branch with a longer internal branch length (Fig. 2).
Another difference from previous work is the higher number of ST410 isolates in this study, which carried the bla CTX-M-15 gene at the highest frequency of all the sequence types. Overall, there does not appear to be any strong associations between overall pattern of AMR gene content, virulence gene content, isolation site, and ST within the isolates (Fig. 1), similarly to previous results [27].
Although globally there is a strong association between ST131 and presence of the bla CTX-M-15 gene, here we identified the bla CTX-M-15 gene across a diverse set of lineages [11]. This finding is similar to those from other studies in Tanzania and Malawi, where the bla CTX-M-15 gene was found across numerous STs [22,27]. We see the bla CTX-M-15 gene in eight different sequence types, including ST131. The ST that most commonly contained the bla CTX-M-15 gene was ST410. All of the ST131 isolates that contained the bla CTX-M-15 gene were O25b:H4-ST131 in this study. Overwhelmingly, the ST131 isolates were clade C, containing the fimH30 gene, consistent with the global expansion of ST131-H30 [12]. Interestingly, there is a strong association between ST and where the bla CTX-M-15 gene is carried in the ST131 and ST410 isolates. ST131 was much more likely to have a genomic location for the gene, whereas ST410 more frequently carried the gene on a plasmid. This may have implications for how the bla CTX-M-15 gene spreads in Malawi. Given the diverse lineages that carry the bla CTX-M- 15 gene, additional studies are needed to better understand the epidemiology of this gene in Malawi.
Overall, the patterns of AMR gene prevalence were similar to the one previous report from southern Malawi. sul2, strA, strB, dfrA, bla TEM-1 , and sul1 genes remained very common in the population. Interestingly, chloramphenicol resistance gene prevalence was lower than previously reported, with a decrease in the prevalence of catA from 64.9 % to 22 % [27]. This potentially reflects decreased use of the drug in the community with changing treatment guidelines and increasing availability of alternative agents with fewer side effects [46]. Most resistance genes were detected broadly across different genotypes in this study (Fig. 1).
In summary, we confirm that the E. coli population in Malawi is highly diverse, with evidence for increasing prevalence of the ST131 group in the country. We see a higher proportion of ST 131 isolates and a higher prevalence of the bla CTX-M-15 gene in our isolates, which were collected a few years later than those described in previous reports. This expansion is consistent with the global increase in O25b:H4-ST131 bearing the fimH30 gene. E. coli genotypes are similar between two major tertiary care hospitals that are quite distant, with highly related isolates being found between the sites. As previously seen, AMR genes, including the bla CTX-M-15 gene, are broadly contained across sequence types. A high diversity of virulence genes were seen within E. coli isolates. These data were collected by conducting ONT sequencing in Malawi, highlighting the possibility of conducting rapid longitudinal genomic epidemiology studies of consequential bacteria in sub-Saharan Africa where the sequencing is conducted on site.

Funding information
This project was funded by a Yang Biomedical Scholars Award from the University of North Carolina.