Alterations in chromosomal genes nfsA, nfsB, and ribE are associated with nitrofurantoin resistance in Escherichia coli from the United Kingdom

Antimicrobial resistance in enteric or urinary Escherichia coli is a risk factor for invasive E. coli infections. Due to widespread trimethoprim resistance amongst urinary E. coli and increased bacteraemia incidence, a national recommendation to prescribe nitrofurantoin for uncomplicated urinary tract infection was made in 2014. Nitrofurantoin resistance is reported in <6% urinary E. coli isolates in the UK, however, mechanisms underpinning nitrofurantoin resistance in these isolates remain unknown. This study aimed to identify the genetic basis of nitrofurantoin resistance in urinary E. coli isolates collected from north west London and then elucidate resistance-associated genetic alterations in available UK E. coli genomes. As a result, an algorithm was developed to predict nitrofurantoin susceptibility. Deleterious mutations and gene-inactivating insertion sequences in chromosomal nitroreductase genes nfsA and/or nfsB were identified in genomes of nine confirmed nitrofurantoin-resistant urinary E. coli isolates and additional 11 E. coli isolates that were highlighted by the prediction algorithm and subsequently validated to be nitrofurantoin-resistant. Eight categories of allelic changes in nfsA, nfsB, and the associated gene ribE were detected in 12412 E. coli genomes from the UK. Evolutionary analysis of these three genes revealed homoplasic mutations and explained the previously reported order of stepwise mutations. The mobile gene complex oqxAB, which is associated with reduced nitrofurantoin susceptibility, was identified in only one of the 12412 genomes. In conclusion, mutations and insertion sequences in nfsA and nfsB were leading causes of nitrofurantoin resistance in UK E. coli . As nitrofurantoin exposure increases in human populations, the prevalence of nitrofurantoin resistance in carriage E. coli isolates and those from urinary and bloodstream infections should be monitored.


DATA SUMMARY
Sequence reads and genome assemblies of E. coli isolates IN01-09 have been deposited in the European Nucleotide Archive (ENA) under BioProject accession number PRJEB38850. Nucleotide and protein sequences of nfsA, nfsB, and ribE alleles in genomes of these nine isolates can be accessed in our database NITREc ( github. com/ wanyuac/ NITREc). The database also offers nucleotide sequences of genetic elements related to the interrupted nfsA/nfsB regions in IN01-03. All supplementary files are available on figshare (https:// doi. org/ 10. 6084/ m9. figshare. 16638457. v1) [106].

INTRODUCTION
Nitrofurantoin is a synthetic nitrofuran compound that has been widely used as a first-line antimicrobial agent for treating urinary tract infection (UTI) since 1953 and is active against a broad range of Gram-negative and Grampositive bacteria, including Escherichia coli, most species of Staphylococcus and Enterococcus [1,2]. It can reach a urinary concentration of 50-250 mg/L given a regular therapeutic or prophylactic dose while retaining negligible blood (<5 mg/L) and gastrointestinal concentrations, providing an advantage for treating UTI [3][4][5]. Although the antibacterial activity of nitrofurantoin is not fully understood, studies have revealed that metabolic intermediates from nitroreductase-mediated reduction of nitrofurantoin can damage DNA and RNA, as well as disrupting protein production in bacteria [6][7][8].
E. coli is the predominant pathogen of uncomplicated UTI, and the increasing prevalence of isolates resistant to trimethoprim led Public Health England to recommend prescribing nitrofurantoin for UTI since November 2014 [9]. Despite clinical use of nitrofurantoin for nearly 70 years, the prevalence of nitrofurantoin resistance in E. coli remains relatively low in Europe. Up to 2016, fewer than 6% of E. coli isolates collected from urine specimens in Western, Northern, and Southern European countries were resistant to nitrofurantoin [10][11][12][13][14]. Separate UK-based studies showed that nitrofurantoinresistant E. coli accounted for 5% of urinary and bloodstream E. coli isolates collected in London, during 2005London, during -2006 and 2011-2015, despite higher prevalence of resistance (>20%) to other commonly prescribed oral antimicrobials in the same isolates [15][16][17].
Since E. coli was not widely exposed to nitrofurantoin in the UK until the adoption of new prescribing guidelines, it is worthwhile monitoring the prevalence of nitrofurantoin resistance among UK E. coli isolates. The reported low prevalence of nitrofurantoin resistance might be explained by the broad intracellular target range of toxic nitrofurantoin metabolites and a reported fitness cost among nitrofurantoinresistant mutants [6,18]. Two kinds of genetic determinants of nitrofurantoin resistance have been identified in Enterobacteriaceae so far: (a) loss-of-function mutations in chromosomal genes nfsA, nfsB (encoding oxygen-insensitive nitroreductases NfsA and NfsB, respectively), and ribE (encoding 6,7-dimethyl-8-ribityllumazine synthase, involved in the reduction of nitrofurantoin) [7,8]; and (b) acquired gene complex oqxAB (encoding a multidrug efflux pump OqxAB) [19]. By contrast, bacteria with impaired DNA-repair ability showed increased susceptibility to nitrofurantoin [20,21].
Understanding the mechanisms underpinning antimicrobial resistance (AMR) is central to predicting the impact of antimicrobial prescribing guidelines. Nevertheless, there is little or no understanding of the basis of nitrofurantoin resistance in UK E. coli. This is hindered by the fact that nitrofurantoin susceptibility testing is mostly limited to urinary tract isolates rather than enteric or invasive bloodstream isolates, while a vast array of whole-genome sequencing (WGS) has focussed on bloodstream isolates or isolates with multidrug-resistance phenotypes. We set out to uncover the genetic basis of nitrofurantoin resistance in nine E. coli isolated from UTI patients in north west London. We then screened a large genomic dataset of 12412 UK E. coli isolates for resistance-associated genetic alterations to enhance our understanding of the wider distribution of nitrofurantoin resistance and genetic mechanisms that influence nitrofurantoin's interaction with E. coli.

E. coli isolates and nitrofurantoin susceptibility testing
Following the protocol and breakpoint from the European Committee on Antimicrobial Susceptibility Testing (EUCAST), automated disc diffusion tests for nitrofurantoin susceptibility (Oxoid CT0034B 100-µg nitrofurantoin discs, ThermoFisher Scientific, USA) in clinical urinary E. coli isolates were routinely performed by a microbiology laboratory of the Imperial College Healthcare NHS Trust in north west London. The laboratory serves a population of 2.5 million. In the period 2018-2019, 18 viable nitrofurantoinresistant E. coli isolates were collected, however, the susceptibility result could be reproduced for only nine of these isolates

Impact Statement
This study expands knowledge on the genetic basis of nitrofurantoin resistance in E. coli using whole-genome sequencing and in-depth genomic analysis. We report novel and previously discovered deleterious mutations in chromosomal genes nfsA, nfsB, and ribE as well as the interruption of nfsA and nfsB by insertion sequences, recapitulating the roles of oxygen-insensitive nitroreductases in the development of nitrofurantoin resistance in E. coli. We outline and categorise alterations in these three genes identified in a large collection of UK E. coli genomes. A proposed scoring algorithm is able to predict the level of nitrofurantoin susceptibility from genotypes, and the predictions suggest that acquired nitrofurantoin resistance is not of immediate concern in the UK. Our results also suggest a need to monitor nitrofurantoin resistance amongst E. coli.
(IN01-09) when frozen stock was retrieved and then retested using the same disc diffusion method. Aerobic MIC of nitrofurantoin was measured for each of the nine resistant isolates using MIC Test Strips (Liofilchem, Italy). Nitrofurantoinsusceptible E. coli isolate EC0098B (nitrofurantoin MIC: 32 mg/L) was chosen as a negative control. MICs were reported as per conventional two-fold series of concentration increments (2,4,8, … mg/L) and interpreted in accordance with the EUCAST breakpoint (v10.0) [22].

Whole-genome DNA extraction and sequencing
Genomic DNA of the nine resistant isolates IN01-09 was extracted using a phenol/chloroform method [23], checked for integrity using gel electrophoresis, and stored at −20 °C. DNA libraries were prepared in a paired-end layout with NEBNext Ultra II FS DNA Library Prep Kit (New England BioLabs, UK) and sequenced under a 150-bp read length using Illumina HiSeq 4000 systems and HiSeq SBS Kit v4 reagents (Illumina, USA). Read demultiplexing and adapter trimming were performed by the Imperial BRC Genomics Facility using their production pipeline.

Public E. coli genomes used for comparative analysis
In order to identify genetic variants related to nitrofurantoin resistance using a comparative approach, the NCBI Pathogen Detection portal ( www. ncbi. nlm. nih. gov/ pathogens/), NCTC 3000 Project, EnteroBase, NIHR Health Protection Research Unit (HPRU) isolate collection (BioProject accession: PRJEB20357; manuscript in preparation), and literature were searched for E. coli genomes from isolates with known nitrofurantoin-susceptibility profiles [8,[24][25][26][27]. Complete and draft genome assemblies were downloaded from the NCBI nucleotide database and sequence reads of unassembled genomes were downloaded from the NCBI Sequence Read Archive (SRA). In total, 208 E. coli genomes were obtained from the search, consisting of 196 genomes downloaded from the GenBank database ( www. ncbi. nlm. nih. gov/ genbank/) (Table S1 'NCBI') and 12 genomes from clinical isolates collected from north west London by the HPRU in 2015-2016 (Table S1 'HPRU').

Quality control of sequence reads
Quality summaries of all read files were generated using FastQC v0.11.9 and compiled using MultiQC v1.8 [28,29]. With Trimmomatic v0.39, the reads of IN01-09 were trimmed for an average base quality of Phred Q20 in a 5-bp sliding window and were then filtered for a minimum length of 50 bp [30]. Seqkit v0.12.0 was used for randomly sampling the filtered reads of each genome to an 80-fold coverage, assuming an average genome size of 5 Mbp [31,32]. For reads downloaded from the SRA, case-by-case read trimming and filtering was conducted using Trimmomatic in order to deal with large variation in read quality and to obtain a coverage of less than 100 folds. Genomes having less than 40-fold coverage were excluded from further analysis. DNA contamination in reads was evaluated for each genome using Kraken v2.0.8-beta and its full bacterial database (accessed in February 2020) [33].

De novo genome assembly and annotation
Reads of each genome were assembled with Unicycler v0.4.9b, which used SPAdes v3.13 for initiating assembly graphs [34,35]. Quality of assemblies were evaluated based on summary statistics calculated by QUAST v5.0.2 [36]. Complete plasmid sequences were identified in Bandage visualisation of assembly graphs [37]. Assemblies were annotated through Prokka v1.13 with a reference protein database representing proteins extracted from all E. coli genomes that were publicly available in the NCBI RefSeq database by March 2020 [38]. Redundant protein sequences were removed from this database using CD-HIT v4.8.1 [39] under a minimum amino acid identity of 70%.

Genomic comparisons of E. coli with known nitrofurantoin-susceptibility profiles
Multi-locus sequence typing (MLST) of the nine genomes IN01-09 was conducted using ARIBA v2.14.4 [40], and the MLST for other 208 collected genomes of E. coli isolates with known nitrofurantoin-susceptibility profiles was conducted using mlst ( github. com/ tseemann/ mlst). The Achtman scheme for E. coli, which was downloaded from PubMLST database ( pubmlst. org) with ARIBA in March 2020, was used for the MLST. To depict the overall genomic relatedness between the 217 genomes and to facilitate the selection of a reference genome for each of genomes IN01-09, pairwise core-genome and accessory-genome distances were estimated from genome assemblies with PopPUNK v2.0.2 given k-mer lengths increased from 15 bp to 23 bp by a 2-bp step size [41]. Phylogroups of the 217 genomes were determined from genome assemblies using ClermonTyping with contigs shorter than 1 kbp excluded [42]. Single-nucleotide polymorphisms (SNPs) between seemly identical genomes IN01 and IN02 were identified using Bowtie2 v2.4.1 and BCFtools v1.9 as implemented in the cgSNPs mapping pipeline ( github. com/ wanyuac/ cgSNPs) [43][44][45] (Supplementary Method S1). The PopPUNK phylogenetic tree and population data were visualised using iTOL v6.3 [46].

Identifying genetic alterations of nfsA, nfsB, and ribE
For each of sample genomes IN01-09, the most closely related complete genome of a nitrofurantoin-susceptible isolate was chosen from the collection of 208 genomes as a reference according to core-genome distances calculated by PopPUNK. Reads of each sample genome were then mapped to the chromosome sequence of the selected reference genome using Bowtie2 under the mode sensitive-local. SNPs and indels were identified from mapped reads using BCFtools when filtering out low-confidence variant calls (QUAL ≤20, DP ≤10, or MQ ≤20) as well as low-quality base calls (Phred Q <20). Moreover, SNPs and indels within repetitive regions (nucleotide identity ≥90%) identified in the selected reference sequence with nucmer v3.1 were excluded for accuracy [47], and the remaining variants in the sample genome were annotated using SnpEff v4.3t and a customised database built from the selected reference sequence [48].
Bandage v0.8.1 [37] was used to search for loci of nfsA, nfsB, and ribE in genome assemblies of IN01-09 and to extract the identified allele sequences. SNPs and indels were identified in these alleles using web-based megaBLAST ( blast. ncbi. nlm. nih. gov) and compared to those identified through read mapping. Interruptive insertion sequences were inferred from assembly graphs using Bandage and were searched against the ISFinder database for reference insertion sequences and classification [49]. Insertion sites of these insertion sequences were determined with ISMapper v2.0.1 [50]. Copy numbers of nfsA, nfsB, and ribE in each sample genome were predicted from mapped reads using CNOGpro v1.1 (100-bp sliding windows and 1000 bootstrap samples per gene) [51,52]. Translated sequences of these three genes were extracted from Prokka annotations and were aligned using the ClustalW algorithm implemented in MEGA X [53,54]. The functional effect of each amino acid substitution was predicted using PROVEAN Protein (accessed in June 2020).

Validating interrupted nfsA and nfsB
Insertion sequences that might have interrupted genes nfsA and nfsB in genomes IN01-03 were identified by visualising Unicycler assembly graphs using Bandage. The sequence of each interrupted region was extracted from the assembly graph using Bandage and was then confirmed using Sanger sequencing of PCR products. PCR primers (Table 1) were designed with Primer3web v4.1.0 ( primer3. ut. ee) [55] to ensure a complete coverage of each template genomic region. PCR was carried out using a T100 Thermal Cycler (Bio-Rad, USA) and GoTaqDNA Polymerase (Promega, UK). PCR primers and products were sent to Genewiz (UK) for Sanger sequencing. Reads were trimmed to remove ambiguous bases before alignment.

Detecting acquired antimicrobial resistance genes
An ARIBA-compatible reference database of acquired AMR genes was created from the ResFinder database (commit hash: 7e1135b) through quality filtering and sequence clustering (nucleotide identity ≥80%) using ARIBA and CD-HIT-EST v4.8.1 [39,56]. Acquired AMR genes in genomes IN01-09 were detected from sequence reads using ARIBA and this reference database, whereas these genes in other E. coli genomes were identified from genome assemblies using ABRicate ( github. com/ tseemann/ abricate) [57] and the ResFinder database without clustering.

Searching for nitrofurantoin-resistance determinants in the UK E. coli population
A non-redundant reference database NITREc ( github. com/ wanyuac/ NITREc) was created from allelic and translated sequences of nfsA, nfsB, and ribE in all the 217 E. coli isolates of known nitrofurantoin susceptibility. In particular, the allelic and protein sequences from 169 NCBI isolates and 12 HPRU isolates that had nitrofurantoin MICs ≤32 mg/L or were considered nitrofurantoin-susceptible based on disc diffusion methods were used as reference sequences for identifying wildtype alleles or proteins. CD-HIT-EST and CD-HIT were used to deduplicate reference sequences in the database. Computer scripts developed for gene detection and mutation identification are available in the NITREc code repository ( github. com/ wanyuac/ NITREc/ tree/ master/ Script).
Genome assemblies of UK E. coli isolates without available nitrofurantoin susceptibility data were then downloaded from NCBI nucleotide databases (as of August 2020; Table S2 'UK_public_total') or retrieved from an ongoing NIHR HPRU study of E. coli bacteraemia and intestinal colonisation. Collection dates and locations of isolates were retrieved Nucleotide BLAST (megaBLAST) v2.9.0 [58] was used to identify and extract allele sequences of nfsA, nfsB, ribE, oqxA, and oqxB (script searchGenes. pbs), and the extracted sequences were then translated into protein sequences using script translateDNA. py. Particularly, alleles of nfsA, nfsB, and ribE from nitrofurantoin susceptible E. coli strain ATCC25922 and alleles of oqxA and oqxB from the ResFinder database were used as queries for the gene search. Since missense mutations in the start or stop codon may reduce the length of a BLAST alignment, hits showing partial or complete truncation of either codon were manually verified. CD-HIT-EST and CD-HIT were used to determine identical alleles and translated sequences, respectively. Sequence alignments were generated using Clustal Omega ( www. ebi. ac. uk/ Tools/ msa/ clustalo). Nonsense mutations were determined through comparing lengths of predicted protein sequences to the reference sequences. Missense mutations were identified in the alignments of protein sequences using script missense-Finder. py, which identifies amino acid substitutions in each query protein sequence by comparing it to its most closely related reference protein sequence in the NITREc database. Finally, using script findKnownMutations. py, missense mutations were searched for those known to be associated with nitrofurantoin resistance (Tables S3-S5) and those carried by isolates IN01-09.

Evolutionary analysis of missense mutations in nfsA, nfsB, and ribE
For each gene, full-length, indel-free alleles were translated into protein sequences using script translateDNA. py (codon table: 11). Premature proteins (due to nonsense mutations) and their corresponding alleles were identified and then excluded using script rmProteinsByLength. py. A multisequence alignment of remaining alleles was generated using programme MUSCLE in software package MEGA X [59] and was then converted into a codon alignment using script pal2nal. pl [60]. The nucleotide nonsynonymous-substitution rate (dN) over synonymous-substitution rate (dS), denoted as ω=dN/dS, was used to reflect the direction and pressure of natural selection on each of the genes nfsA, nfsB, and ribE [61]. A maximum-likelihood (ML) estimate of ratio ω for each gene was then calculated from the codon alignment using a reference-free approach implemented in GenomegaMap v1.0.1 [62], assuming a constant ω per gene.
For each of genes nfsA, nfsB, and ribE, SNP sites were identified in a deduplicated set of allele sequences that were used for estimating the ω ratios -namely, untruncated alleles that only carried missense mutations. An ML phylogenetic tree was reconstructed for each gene from the alignment of deduplicated alleles. Specifically, nucleotide sequences were deduplicated using CD-HIT-EST before tree reconstruction. An extended selection of substitution models by IQ-Tree v1.6.12 (parameters: -t BIONJ -m MF -mtree) was performed for each gene [63]. Since the best-fit model differed between genes, candidate models of each gene were sorted in an ascending order of models' Bayesian information criterion (BIC) scores, and the model consistently ranked <10 across all three genes was chosen for the tree reconstruction. Then an ML tree was reconstructed for each gene by IQ-Tree with the chosen model (TIM3e+R2), a BIONJ starting tree, ten independent runs (for selecting the tree of the greatest maximum likelihood), and 500 bootstrap replicates (for supporting branches in the selected tree).
Identification of variant sites in the sequence alignment used for reconstructing each gene tree was performed with snp-sites v2.5.1 [64]. Homoplasy sites in the alignment were identified and plotted using R package homoplasyFinder [65], which took as input the gene tree in addition to the allele alignment. A pairwise homoplasy index (PHI) was calculated from the allele alignment of each gene, and the significance of each PHI was tested for using PHIPack [66] with a 80-bp sliding window, given the null hypothesis of no recombination. Recombination within a gene was considered significant if the p-value of a PHI did not exceed 0.05.

Predicting nitrofurantoin susceptibility from detected genetic alterations
For each E. coli genome that had no nitrofurantoin susceptibility data available, the prediction of nitrofurantoin susceptibility was conducted using a scoring algorithm based on sequentially categorising variations at sequence, genetic, and genomic levels. The algorithm also considers whether a gene of interest is intrinsic or acquired, and assumes that the functional loss of an intrinsic gene and the gain of a functional oqxAB complex are both associated with reduced nitrofurantoin susceptibility. Calculation of gene-level and genome-level scores have been implemented in NITREc helper script scoreHitsNITR.R, whereas manual inspections are required to determine sequence-level scores.
Firstly, at the sequence level, supposing n i BLAST hits (matched regions between query and subject sequences) of gene i (i=nfsA, nfsB, ribE, oqxA, or oqxB) are identified in a genome, a score s ij (j=1, …, n i ) is assigned to each hit to represent the probability of this sequence to confer reduced nitrofurantoin susceptibility. Then an arbitrary value is assigned to the score for each hit of intrinsic genes (nfsA, nfsB, and ribE): s ij =1, if the sequence is truncated or interrupted (which cannot be distinguished from each other by BLAST without assembly graphs) or has a start-codon loss, a frameshift or nonsense mutation, or any missense mutation known to be associated with nitrofurantoin resistance; Secondly, a gene score g i (i=nfsA, nfsB, ribE, oqxA, oqxB) is determined from sequence scores. Specifically, for each intrinsic gene, g i =min{s ij }, where j=1, …, n i . Namely, gene i is believed to not confer reduced nitrofurantoin susceptibility when there is at least one wildtype allele. For acquired genes oqxA and oqxB, however, g i =max{s ij } (j=1, …, n i ) as wildtype alleles are believed to confer reduced nitrofurantoin susceptibility. Moreover, g i =1 and g i' =0 when intrinsic gene i and acquired gene i' are not detected, respectively.
Finally, a risk score r is calculated from gene scores for each genome. Let g 1 , …, g 5 represent gene scores of nfsA, nfsB, ribE, oqxA, and oqxB, respectively, then each gene score takes a value of 0, 0.1, or 1. Since the multidrug efflux pump OqxAB requires both functional components OqxA and OqxB to reduce the intracellular concentration of nitrofurantoin, the risk score is calculated by formula: where min ( g4, g5 ) = 1 when g4 = g5 = 1 , indicating a functional oqxAB complex. Hence r takes values between zero (nitrofurantoin susceptible) and four (nitrofurantoin resistant). In our study, genomes were sorted by risk scores to identify possible resistant and susceptible isolates.

Validating predictions of nitrofurantoin resistance
Since nitrofurantoin-susceptible HPRU isolates (based on clinical records) had been used by the prediction algorithm as references, we experimentally validated nitrofurantoinsusceptibility predictions using the remaining HPRU isolates, which had no available nitrofurantoin-susceptibility data. First, 20 HPRU isolates predicted to have reduced nitrofurantoin susceptibility were tested using Etests (Liofilchem MIC Test Strips). Second, as an exploratory step, seven additional HPRU isolates were included for Etests because these isolates carried novel mutations that had also been detected in nfsA or nfsB of IN08, or carried an alternative stop codon or novel mutation in ribE, where the predicted susceptibility was uncertain due to a lack of information from literature. Furthermore, disc diffusion tests (Oxoid CT0034B 100-μg nitrofurantoin discs, used as per EUCAST manual v8.0 and manufacturer's instructions) [67] were conducted for confirming heterogeneous responses to nitrofurantoin, and MALDI-TOF mass spectrometry was used for detecting non-E. coli contamination. E. coli strain ATCC25922 was used as the quality control for Etests.

Genomic context of clinical nitrofurantoin-resistant E. coli isolates from north west London
According to the EUCAST breakpoint, nitrofurantoin resistance (MIC >64 mg/L) was confirmed in nine out of 18 E. coli UTI isolates that had been reported as nitrofurantoinresistant by the NHS microbiology laboratory (Table S1). Each isolate was obtained from a unique patient. Nitrofurantoin MICs of these nine isolates (IN01-09) were ≥128 mg/L.
To contextualise the nine isolates IN01-09, genomes of a further 208 E. coli isolates with known nitrofurantoinsusceptibility profiles were incorporated for genomic comparisons, creating a genome collection of global nitrofurantoin-resistant and -susceptible E. coli isolates (Table S1). Of these 208 isolates, eight were nitrofurantoinresistant, none of which came from the UK, and 200 were nitrofurantoin-susceptible, 14 of which came from the UK.
Population structure of the 217 isolates (including IN01-09) was interpreted using phylogroups, multi-locus sequence types (STs), and genomic similarities estimated using PopPUNK. In total, six E. coli phylogroups (A, B1, B2, C, D, F) and 57 STs constituting 19 clonal complexes (CCs, defined by the MLST scheme) were identified (Fig. 1, Table S1). No novel MLST allele or ST was detected. Isolates showing nitrofurantoin MICs ≥64 mg/L were seen in five phylogroups (A, B1, B2, D, and F). Isolates IN01-09 were classified into four phylogroups (A, B1, B2, D) and seven STs (Fig. 1, Table 2). Notably, SNPs and indels were not identified between the genomes of isolates IN01 and IN02, and these two genomes only slightly differed in accessory genomes by a PopPUNK accessory-genome distance (Jaccard distance based on k-mer matches) of 2.4×10 −4 , although the isolates were collected from two distinct patients at different locations, with UTI episodes that were three weeks apart. Interestingly, while the majority of nitrofurantoin-resistant isolates were sporadically distributed across these phylogroups and CCs (including major clinical lineages such as CC69, CC73, and CC131), clustering of reduced nitrofurantoin susceptibility was mainly seen among the 19 USA isolates of CC38 (Fig. 1, Table S1).

Genetic basis of nitrofurantoin resistance in UK E. coli isolates IN01-09
Neither oqxA nor oqxB was identified in any of the isolates IN01-09, indicating that horizontally acquired nitrofurantoin resistance conferred by the oqxAB complex had not occurred in these isolates. We therefore focused on alterations in intrinsic genes nfsA, nfsB, and ribE. For each genome of isolates IN01-09, the most closely related complete genome (based on the PopPUNK core-genome distances) of a nitrofurantoin-susceptible E. coli isolate was chosen as a reference for the identification of genetic alterations.

Genetic alterations of nfsA and nfsB
Analysis of nfsA and nfsB confirmed the presence of a single copy of both genes in each of the nine genomes IN01-09 (Table S6 'Copy_number'). Further, genetic alterations affecting protein sequences were identified in both nfsA and nfsB. Altogether, these alterations consisted of missense and nonsense mutations, deletions, and interruptions by insertion sequences (Table 2). Correspondingly, predicted sequences of proteins NfsA and NfsB showed amino acid substitutions or truncations or both. Synonymous mutations in nfsA or nfsB were also detected in five genomes (Table  S6). One deleterious missense mutation was predicted by PROVEAN [68] for each NfsA or NfsB sequence that only carried missense mutations (Table 2). However, these deleterious missense mutations were not present in any NfsA or NfsB sequence from the publicly available 208 E. coli isolates with known nitrofurantoin-susceptibility profiles.
Only two deleterious missense mutations (W212R in NfsA and G192D in NfsB) were previously reported as associated with nitrofurantoin resistance (Tables S3 and S4).
Interruption of nfsA or nfsB by insertion sequences was identified in three genomes, IN01-03 (    Relatedness between these isolates is displayed by a core-genome neighbour-joining tree generated using PopPUNK. The scale bar denotes the estimated frequency of nucleotide substitutions in the core genome [42]. The tree is midpoint-rooted and drawn in a circular layout with tip labels coloured by nitrofurantoin susceptibility of isolates (   A decision tree was then developed to categorise the hits for each of genes nfsA, nfsB, and ribE based on predicted protein sequences (Fig. 3). Specifically, 164 partial sequences (39-714 bp) of nfsA (wildtype length: 723 bp) were identified, which failed to encode complete 240-aa (amino acid) NfsA proteins. Frameshift mutations were present in all 107 alleles with indels, resulting in predicted proteins of 18-247 aa. Nonsense mutations identified in 117 alleles encoded partial NfsA proteins of 23-234 aa. Interestingly, amongst missense mutations, the start codon ATG of nfsA was altered in 14 genomes: mutation M1T (2T>C) was found in an allele shared by two BSAC isolates eo393 and eo2899 (collected in 2002 and 2010, respectively), causing a loss of the start codon; startcodon substitution M1I (3G>A) was found in five nfsA alleles from 12 genomes (Table S7) and was considered deleterious to protein structure according to the PROVEAN prediction (score: −3.149).
Genetic alterations of nfsB (654 bp) displayed a similar pattern to nfsA. Particularly, 15 partial sequences (35-647 bp) were identified and no start-codon loss was seen amongst missense mutations (Table S7). By contrast, ribE (471 bp) displayed lower diversity than the other two genes, with no partial allele sequence or nonsense mutation identified.
Among ribE alleles identified in the 12412 genomes, a stopcodon substitution (TGA >TAA) owing to missense mutation 470G>A was found in two ribE alleles that were present in 31 genomes. Both alleles encoded the same RibE protein as the nitrofurantoin-susceptible strain ATCC25922 (NCBI protein accession: WP_001021161.1). A novel deletion of four amino acid residues (KAGN, from position 132 to 135 in the reference RibE sequence from the ATCC25922 genome) was predicted from one (carried by isolate EC0430U) of ten ribE alleles that possessed indels. Notably, this deletion overlapped the deletion of amino acid residues 131-134 (TKAG, from the same reference RibE sequence) that is known to reduce nitrofurantoin susceptibility [8]. Putatively deleterious missense mutations of RibE were identified in three genomes: EC0340B (A16V), EC0444B (A34T), and EC1165B (T131S), with PROVEAN scores of -3.406,-3.893, and −2.742, respectively (Table S7).
Acquired resistance genes oqxA (1176 bp) and oqxB (3153 bp) were rare among the 12412 genomes: both genes were detected in only one genome (eo1692 from the BSAC collection), with exact matches to their reference sequences in the ResFinder database [56]. No other hit for either gene was obtained given a minimum nucleotide coverage and minimum query coverage of 70% and 80%, respectively.
At the gene level, nfsA, nfsB, and ribE alleles in 43% of the 12412 genomes perfectly matched to those in known nitrofurantoin-susceptible or wildtype E. coli isolates; 54% were one-gene mutants or single mutants (namely, each had a mutated nfsA, nfsB, or ribE gene); 3% were two-gene mutants or double mutants; and 0.05% were three-gene mutants or triple mutants (Tables 3 and S7). Of note, isolates NCTC11117 and NCTC11132, each carried a wildtype allele and a frameshifted allele of nfsA, were not considered as mutants because of the presence of the wildtype alleles (Table S7).

Evolutionary patterns of nucleotide substitutions in nfsA, nfsB, and ribE
Noticing the differences between the observed allelic diversities of these three genes in the 12412 E. coli genomes (Fig. 3), we hypothesised that ribE was under stronger negative or purifying selection than nfsA and nfsB. The estimated dN/dS ratio ω was 0.607812, 0.40739, and 0.179068 for nfsA, nfsB, and ribE, respectively, indicating that all three genes were under purifying selection with ascending strengths in the order nfsA<nfsB<ribE.
After obtaining unique alleles by deduplicating identical, full-length hits from the 12412 genomes, 231 (32% of 723 bp), 184 (28% of 654 bp), and 86 (18% of 471 bp) SNP sites were found in 328 nfsA alleles (from 12098 hits), 231 nfsB alleles (from 12350 hits), and 88 ribE alleles (from 12407 hits), respectively (Figs S1-S3). Despite this variation, phylogenetic trees of these genes generally showed extremely low bootstrap values (for instance, 0-2) in the majority of ancestral branches (Dataset S1), suggesting a lack of consensus phylogenetic information in these alleles for reliably reconstructing the evolutionary history of each gene.
PHIs calculated from the allele alignment of each gene revealed significant homoplasy in nfsA (P-value: 0.0434) and nfsB (P-value: 0.0363) but not in ribE (P-value: 0.736).

Prediction of nitrofurantoin susceptibility and experimental validation
For each of the 1997 HPRU isolates without known nitrofurantoin-susceptibility profiles, nitrofurantoin susceptibility was predicted from sequence categories (as illustrated in Fig. 3) of intrinsic genes nfsA, nfsB, ribE and presence-absence of both acquired genes oqxA and oqxB using the scoring algorithm. Since neither oqxA nor functional oqxB was detected in any of these isolates, the algorithm gave a risk score r (0≤r≤4) to each isolate as a summary of its three-gene (nfsA, nfsB, and ribE) sequence-category profiles or genotypes. We hypothesised that a higher score would predict a greater reduction in the nitrofurantoin susceptibility of an isolate, and vice versa.
Using r≥1 as a criterion, a non-redundant set of 62 HPRU E. coli isolates (one isolate per patient) were predicted to have reduced nitrofurantoin susceptibility (Table S7). Since RibE sequences of these isolates were identical to those of wildtype RibE proteins in nitrofurantoin-susceptible  4) Is there any nonsense mutation? (5) Does the predicted protein sequence carry no mutation, or any known resistance-associated amino acid substitution, or any other amino acid substitution? Notations: Δ, partial sequence compared to its reference; cov., query coverage; fs, frameshift mutation; ins/del, insertion or deletion of amino acid residues; R, missense mutations known to be associated with nitrofurantoin resistance; S, proteins identical to those in nitrofurantoin-susceptible isolates; mis, missense mutations of unknown phenotypical impacts.
isolates, 16 genotypes associated with reduced nitrofurantoin susceptibility were identified based on nfsA and nfsB sequences only (Table 4).
Assuming nonsense or frameshift mutations observed in this study always cause a loss of function, 20 representative isolates of the 16 genotypes were selected from the 62 isolates as a 'prediction group' for quantitative antimicrobial susceptibility testing (Table 5). Specifically, one representative was chosen for each genotype that did not involve any missense mutation; otherwise, one representative was chosen for each missense mutation. In this group, all 12 double mutants carried at least one resistance-associated genetic alteration (1<r≤2). Nitrofurantoin susceptibility was correctly predicted for all these 12 isolates. Notably, the double mutants were highly resistant to nitrofurantoin (MIC ≥256 mg/L) when both genetic alterations were known to be associated with nitrofurantoin resistance. Predictions for the eight single mutants of resistance-associated genetic alterations (r=1) were less reliable than those for double mutants, with five single mutants being more susceptible to nitrofurantoin than predicted (Table 5). Isolates EC394_9, EC0880B, and EC0026B showed heterogenous responses to nitrofurantoin as a few colonies of resistant E. coli appeared within inhibition zones. MALDI-TOF did not detect any non-E. coli contamination in cultures of these three isolates.
Seven additional HPRU E. coli isolates were included as an exploratory group to examine potential effects of novel mutations on nitrofurantoin susceptibility (Table 5): NfsA or NfsB of two isolates had a missense mutation that had also been identified in isolate IN08; ribE of one isolate was terminated by an alternative stop codon TAA; and other four isolates had mutations in RibE, while genomes of all these four isolates encoded wildtype NfsA and NfsB. A nitrofurantoin-resistance risk score of 0.1 was assigned to each isolate by the prediction algorithm to denote presence of a novel mutation. Upon testing, all seven isolates were susceptible to nitrofurantoin (MICs ≤64 mg/L), matching our predictions. None of the six missense mutations resulted in nitrofurantoin resistance. By contrast, isolate EC0430U, which had a deletion of KAGN 132-135 in RibE, showed an MIC of 64 mg/L, exceeding the nitrofurantoin MIC of a previously reported isolate having an overlapping 4-aa deletion in RibE [8].

Predicted occurrences of reduced nitrofurantoin susceptibility in UK E. coli bloodstream isolates
As data relating to nitrofurantoin susceptibility amongst bacteriaemia E. coli isolates is seldom available, possible occurrences of reduced nitrofurantoin susceptibility were calculated for a non-redundant collection of 2253 bacteriaemia E. coli isolates that were collected in 2001-2016 from across the UK. This collection comprised 582 HPRU isolates (one isolate was selected per ST per patient), 1509 BSAC or CUH isolates, and 162 SCOT isolates. Overall, 142 (6.3%) of the 2253 isolates were predicted to show reduced nitrofurantoin susceptibility. Specifically, 123 (5.5%) single-mutation isolates had a resistance-associated alteration in either nfsA (117 isolates, 5.2%) or nfsB (six isolates, 0.3%), and 19 (0.8%) double-mutation isolates had resistance-associated alterations in both nfsA and nfsB.  Mutation  BSAC  CHU  HPRU  LTCF  NCBI  NCTC  SCOT  IN01-09  Sum   None  758  308  1390  266  2466  43  127  0  5358   One-gene  269  89  520  31  5724  32  27  0  6692   Two-gene  67  18  99  0  146  9  8  9  356   Three-gene  0  0  0  0  5  1  0  0  6   Sum  1094  415  2009  297  8341  85  162  9 12412  Each entry shows the number of isolates with each genotype, with the number of isolates selected for experimental validation displayed between parentheses. Row and column names represent genotypes of nfsA and nfsB, respectively. A gene was considered fragmented when it was interrupted or truncated. In total, 16 genotypes had at least one isolate each, from which 20 isolates were selected for the validation.  Tables S3-S5,  alt. stop, alternative stop codon; del, deletion or truncation, with its start and end nucleotide positions noted on the left side; int, gene interruption, with nucleotide positions (start : end) of remnants related to their wildtype references noted aside.

DISCUSSION
In this study, we have explored genetic determinants of nitrofurantoin resistance in UK E. coli starting with nine sequenced nitrofurantoin-resistant clinical isolates and a comparative genomics approach. We found four types of alterations in two intrinsic, oxygen-insensitive nitroreductase genes (nfsA and nfsB) that are known to be associated with nitrofurantoin resistance in E. coli: gene interruptions by insertion sequences, frameshift mutations caused by indels, nonsense mutations and missense mutations both caused by single-nucleotide substitutions. Notably, each of these nine isolates had alterations in both nfsA and nfsB, equivalent to double-step or multi-step mutants of both genes, and hence explaining high nitrofurantoin MICs (≥128 mg/L) of these isolates [73].
Both nfsA and nfsB had at most one missense mutation identified in genomes of isolates IN01-07 and IN09, and these mutations were predicted to be deleterious to protein functions by PROVEAN (  [73,77]; substitution of G 192 with alanine (G192A) has also been associated with nitrofurantoin resistance [73]. Located at the end of the fourth β-sheet of NfsB (PDB accession: 1DS7), G 192 constitutes the hydrophobic core that accommodates co-factor flavin mononucleotide [78] and is in close spatial proximity of negatively charged residue D 160 . Therefore, substitution of the neutral, hydrophobic residue G 192 with an aspartic acid residue possibly affects the protein formation as well as its function. A similar impact of the NfsB mutation G153D in isolate IN08 on the protein function is anticipated based on the same difference in hydrophobicity of amino acid residues.
Nonsense mutations, frameshift mutations, and insertionsequence-mediated interruptions of nfsA or nfsB are also common loss-of-function genetic alterations causing nitrofurantoin resistance [73,74,79], and we found examples of all these alterations in isolates IN01-09 ( Table 2). The interruption of nfsA by an IS1-family insertion sequence has been reported by three other studies, two of which also identified insertion of IS1 in nfsB [7,8], while the third study identified integration of the composite transposon Tn10 into nfsA [73]. Interestingly, the position and flanking nucleotides of the IS1 insertion site differ among all three studies, indicating variability of IS1 in gene inactivation and being consistent with the known AT-rich target-site specificity of IS1 [80]. In comparison, the interruption of nfsB by a novel IS10R-like insertion sequence had not been reported before, although a different IS4-family insertion sequence IS186 was found in disrupted nfsA [7]. Since IS10-group elements comprise both ends of Tn10, the presence of Tn10 in isolate IN03 implies transposition of IS10R or its IS10L-like element from Tn10 to nfsA or nfsB, conferring reduced nitrofurantoin susceptibility to host bacteria. Nevertheless, we anticipate that it is less likely for IS10L (the left end of Tn10) to show the same behaviour as IS10R because its transposition function is much weaker than the latter [81]. We might see a growing frequency of insertionsequence-mediated nitrofurantoin resistance in the future, if copy numbers of IS1-and IS4-family elements in E. coli genomes are undergoing similar accumulation trajectories as those in Shigella species [82].
This study has revealed variations of genes nfsA, nfsB, and ribE among UK E. coli isolates, and such variations are consistent with the 194 NCBI isolates (excluding UK isolates EC958 and NCTC13441) collected in other countries. Identified alleles of these genes (Fig. 3) can be summarised into two primary types: functional alleles and pseudogenes. The latter can result from gene truncation, insertion-sequence-mediated gene interruption, frameshift mutations, nonsense mutations, as well as deleterious missense mutations, and confers reduced nitrofurantoin susceptibility to E. coli.
The use of a large and diverse genome collection enabled us to discover evolutionary patterns of nfsA, nfsB, and ribE. Comparisons between dN/dS ratios of these genes support our hypothesis that ribE is subjected to the strongest pressure of negative selection whereas nfsA is subjected to the weakest. Not only does this variation in selective pressures explain the observed diversity differences between sequence variations of these three genes in both this study and literature (Tables  S3-S5), but also explains why stepwise reduction of E. coli nitrofurantoin susceptibility usually starts from a mutation in nfsA (the first-step mutation) -a phenomenon that has been reported by several studies [7,8,73]. The differences in gene lengths of nfsA (723 bp), nfsB (654 bp), and ribE (471 bp) may also contribute to this order of stepwise mutations, as a long gene might be more likely to gain spontaneous mutations than a short gene within the same cell. Notably, nfsA and nfsB are classified as non-essential genes of E. coli, whereas ribE is considered as an essential gene by the PEC database ( shigen. nig. ac. jp/ ecoli/ pec) [83]. The higher level of evolutionary conservation of ribE compared with nfsA and nfsB appears to represent a greater degree of functional constraint on its sequence [84]. Nonetheless, expression levels of these three genes might also be important to consider [85].
Homoplasic SNPs were identified in nfsA, nfsB, and ribE from E. coli genomes in our collection (Figs S1-S3), and recombination within nfsA and nfsB was also inferred by the pairwise homoplasy indexes. Both factors may contribute to the lack of congruent phylogenetic signals in alleles of each gene (shown by low bootstrap values of branches in each gene tree), indicating that, in addition to recombination, resistance mutations can arise independently in each gene across the E. coli population. Because the acquired resistance gene complex oqxAB was extremely rare in our E. coli genome collection, de novo mutations of chromosomal genes nfsA, nfsB, and ribE, and insertion-sequence-mediated interruptions of nfsA and nfsB may constitute the main source of reduced nitrofurantoin susceptibility in UK E. coli isolates.
We developed a decision-tree based algorithm for predicting nitrofurantoin susceptibility from five loci (nfsA, nfsB, ribE, oxqA, and oqxB) that are known to be involved in nitrofurantoin resistance. Nitrofurantoin-susceptibility testing showed that the algorithm correctly predicted susceptibility levels for double mutants with known resistance-associated genetic alterations and for single mutants with novel genetic alterations (Table 5). Consistent results were obtained when applying the algorithm to non-UK E. coli isolates (Tables S7  and S8). The unexpected lower MICs of isolates carrying single resistance-associated alterations (which include confirmed loss-of-function alterations, such as nonsense and frameshift mutations, gene interruptions and truncations) in nfsA or nfsB suggests that impaired function or production of both oxygen-insensitive nitroreductases NfsA and NfsB may be required to render E. coli resistant to nitrofurantoin, or there are unknown mechanisms rendering E. coli susceptible to nitrofurantoin. This discrepancy between the predictions and susceptibility profiles also implies that we might have overestimated the occurrence of reduced nitrofurantoin susceptibility in the collection of sequenced E. coli bloodstream isolates from the UK, although the estimate of 6.3% is close to the reported prevalence (5%) of nitrofurantoinresistant E. coli in England by 2019 [15,16,86].
To provide reference information for future research, we have developed database NITREc ( github. com/ wanyuac/ NITREc), which will facilitate searching for the most closely related reference sequence of a query allele of nfsA, nfsB, or ribE, minimising the number of reported variants and reducing noise in the results. We anticipate improvement in accuracy of both variant identification and susceptibility prediction when new references are incorporated into the database in the future.
Notably, the core-genome identity between ST1463 isolates IN01 and IN02 points more definitively to a transmissive potential or common source for these nitrofurantoin-resistant isolates, albeit further epidemiological data were not available relating to additional patient risk factors or any common exposures. There is a low prevalence (≤6%) of gastrointestinal colonisation by nitrofurantoin-resistant E. coli in human and farm animals [93][94][95], and it is hypothesised that these carriage bacteria may colonise the urethra via periurethral contamination and progress to UTI [96]. Horizontal transmission of nitrofurantoin-resistant isolates might be feasible if the resistance does not compromise bacterial survival or pathogenicity. This potential might be the case of isolates IN01 and IN02, as ST1463 E. coli has adapted to a wide range of niches (including human, companion animals, livestock, poultry, and environmental matrices), carries diverse AMR and virulence determinants, and is reported to transmit between human and animals [97][98][99][100][101][102].
This study has several limitations. First, the impact of novel missense mutations in nfsA and nfsB on nitroreductase function and bacterial fitness were not experimentally confirmed, for example, through mutagenesis experiments. Such confirmation is desirable as the observed variation in MICs of singlemutation isolates carrying missense mutations indicates that PROVEAN-predicted deleterious nfsA/nfsB mutations may not cause reduced nitrofurantoin susceptibility and their effects may be offset by other susceptibility mechanisms. Second, epidemiological analysis of nitrofurantoin-resistant E. coli is limited owing to a lack of available nitrofurantoin-susceptibility profiles of E. coli isolates from the UK or abroad. As such, several key clinical questions have not been addressed in this study, including the association of STs or infection sites (for instance, bloodstream and urinary tract) with nitrofurantoin resistance, co-or crossresistance between nitrofurantoin and other antimicrobials, and temporal trend of nitrofurantoin resistance in the UK. Third, regulatory elements (such as promoters) of the nitroreductase system and DNA-repair mechanisms were not investigated in this study. Furthermore, untargeted analysis was not carried out for identifying novel genetic mechanisms of nitrofurantoin resistance due to the small number of nitrofurantoin-resistant E. coli isolates that were available to this study.

CONCLUSIONS
We predict the major cause of nitrofurantoin resistance in UK E. coli to comprise sporadic de novo mutations in chromosomal genes nfsA, nfsB, and ribE; and interruptions of nfsA and nfsB by insertion sequences. Accordingly, clonal expansion of resistant mutants, mobilisation of insertion sequences, and selection of resistant clones in the presence of nitrofurantoin are believed to be three driving forces in the evolution of nitrofurantoinresistant E. coli in the UK. Previous reports have suggested that nfsA and nfsB mutations are associated with significant fitness cost that may obstruct propagation of resistant E. coli [73,103]. However, it is notable that isolates investigated in these reports and the current research were all identified from clinically significant infections, implying a fitness to survive in enteric microbiota and to survive the host immune response in the lower urinary tract. This fitness was also supported by the possible community transmission identified with clonal isolates IN01 and IN02. The same nitrofurantoin-resistance determinants of IN01-09 were also identified in bloodstream isolates, underlining the ability of these E. coli to adapt to circumstance.
It is unclear whether nitrofurantoin resistance will become more prevalent among E. coli in the UK as a result of increased community nitrofurantoin exposure following the changes in national prescribing guidelines. As such, routine nitrofurantoin susceptibility testing and WGS of E. coli isolates are needed to monitor this trend. The tools provided in our study will facilitate future WGS-based surveillance. Further work is required to assess the fitness cost of resistance-associated genetic alterations and to identify novel mechanisms of nitrofurantoin resistance.
Funding information