Ten-year longitudinal molecular epidemiology study of Escherichia coli and Klebsiella species bloodstream infections in Oxfordshire, UK

The incidence of Gram-negative bloodstream infections (BSIs), predominantly caused by Escherichia coli and Klebsiella species, continues to increase; however, the causes of this are unclear and effective interventions are therefore hard to design. In this study, we sequenced 3468 unselected isolates over a decade in Oxfordshire (UK) and linked this data to routinely collected electronic healthcare records and mandatory surveillance reports. We annotated genomes for clinically relevant genes, contrasting the distribution of these within and between species, and compared incidence trends over time using stacked negative binomial regression. We demonstrate that the observed increases in E. coli incidence were not driven by the success of one or more sequence types (STs); instead, four STs continue to dominate a stable population structure, with no evidence of adaptation to hospital/community settings. Conversely in Klebsiella spp., most infections are caused by sporadic STs with the exception of a local drug-resistant outbreak strain (ST490). Virulence elements are highly structured by ST in E. coli but not Klebsiella spp. where they occur in a diverse spectrum of STs and equally across healthcare and community settings. Most clinically hypervirulent (i.e. community-onset) Klebsiella BSIs have no known acquired virulence loci. Finally, we demonstrate a diverse but largely genus-restricted mobilome with close associations between antimicrobial resistance (AMR) genes and insertion sequences but not typically specific plasmid replicon types, consistent with the dissemination of AMR genes being highly contingent on smaller mobile genetic elements (MGEs). Our large genomic study highlights distinct differences in the molecular epidemiology of E. coli and Klebsiella BSIs and suggests that no single specific pathogen genetic factors (e.g. AMR/virulence genes/sequence type) are likely contributing to the increasing incidence of BSI overall, that association with AMR genes in E. coli is a contributor to the increasing number of E. coli BSIs, and that more attention should be given to AMR gene associations with non-plasmid MGEs to try and understand horizontal gene transfer networks.

Conclusions: Our large genomic study highlights distinct differences in the molecular epidemiology of E. coli and Klebsiella BSIs and suggests that no single specific pathogen genetic factors (e.g. AMR/virulence genes/sequence type) are likely contributing to the increasing incidence of BSI overall, that association with AMR genes in E. coli is a contributor to the increasing number of E. coli BSIs, and that more attention should be given to AMR gene associations with non-plasmid MGEs to try and understand horizontal gene transfer networks.
Keywords: Gram-negative bloodstream infections, Bacteraemia, Whole genome sequencing, Klebsiella pneumoniae, Virulence, Antimicrobial resistance Background Gram-negative bloodstream infections (GNBSI), predominantly caused by Escherichia coli and Klebsiella spp., are a significant and increasing threat to public health. They are now the leading cause of bloodstream infection (BSI) in the UK with a substantial associated burden of morbidity and mortality [1,2]. Despite becoming a significant public health concern and policy focus, their incidence continues to increase and it is unclear how ambitious targets to reduce this can be achieved [3].
There is a significant association between GNBSIs and antimicrobial resistance (AMR), particularly in certain globally successful multi-locus sequence types (STs), such as E. coli ST131 [4,5]. Enterobacteriaceae have relatively open pan-genomes and are able to rapidly adapt to changing selection pressures (including antibiotic usage) [6,7]. Multidrug resistance (MDR)-associated STs have been linked with prolonged hospital stay and adverse outcomes [8]. Whilst infections caused by relatively susceptible isolates still represent the majority of cases, the potential for rapid proliferation of AMRassociated clones and the dissemination of AMR genes on mobile genetic elements between lineages and species is a major concern [9].
Recent molecular epidemiology studies in the UK have replicated global findings that most E. coli BSIs are caused by STs 131, 95, 73 (all phylogroup B2) and ST 69 (phylogroup D) [7,10]. One study has shown that after the emergence and expansion of ST69 and ST131 in the early 2000s, the population structure reached an equilibrium, with STs 131, 95, 73, and 69 predominating [11]. However, this study only sequenced isolates cultured prior to 2012, which may represent a critical inflection point in the incidence rate [1], which continues to increase year on year. It remains unclear whether the population structure remains in equilibrium or whether one or more STs are responsible for this increase.
For Klebsiella spp., it has been hypothesised that isolates broadly cause two categories of BSI [12,13], namely, (i) multidrug resistant healthcare-associated (HA) infections caused by low virulence strains with AMR gene-associated plasmids and (ii) clinically hypervirulent communityassociated (CA) infections caused by strains carrying virulence gene plasmids. The convergence of these two phenotypes might pose a significant risk to human health [14]. Whilst there are multiple studies describing in detail the epidemiology of globally distributed clones associated with AMR and hyper-virulence [5,15], such isolates represent a minority of BSI in the UK and most of Europe [16]. A recent study in Cambridgeshire (UK) of 162 Klebsiella pneumoniae isolates, enriched to over-represent MDR isolates, demonstrated a predominance of globally important STs with apparent cycling of relative incidence over a 2year period [17]. However, MDR infections represent approximately 18% of K. pneumoniae BSIs in England and the molecular epidemiology of most invasive disease caused by Klebsiella spp. has not been systematically studied [2].
In order to investigate underlying potential pathogen genetic factors driving the increasing incidence of these infections, we analysed all E. coli and Klebsiella spp. BSIs over a decade (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) in Oxfordshire, UK, linking electronic health records and mandatory reporting data with sequencing data for all isolates, to give a detailed picture of the regional genomic epidemiology of E. coli and Klebsiella spp. BSIs. We used other publicly available sequencing datasets to contextualise our findings.

Methods
Study setting, laboratory procedures, and DNA extraction All isolates causing BSI between September 15, 2008, and December 01, 2018 (de-duplicated to one BSI isolate per 90-day period per patient), were processed by the clinical microbiology laboratory at the John Radcliffe Hospital, Oxford, UK, using standard operating procedures, with sub-cultured stocks stored at −80°C in 10% glycerol nutrient broth. Blood cultures processed by the laboratory were taken as part of routine clinical workup. We attempted to sequence all isolates with no preselection according to, e.g. in vitro antibiotic susceptibility, patient factors, and date of acquisition; we hereby refer to this as an "unselected" sampling frame. 3461 E. coli isolates and 886 Klebsiella spp. isolates were successfully sequenced. All isolates were included in all parts of the relevant analysis except where otherwise indicated below. The microbiology laboratory serves all four hospitals and all community healthcare facilities within Oxfordshire, with a catchment population of ~805,000. For sequencing, isolate stocks were grown overnight on Columbia blood agar at 37°C and DNA was extracted using the QuickGene DNA extraction kit (Autogen, MA, USA) as per the manufacturer's instructions with the addition of a mechanical lysis step (FastPrep, MP Biomedicals, CA, USA; 6m/s for 40 s). Short-read sequencing was performed using the Illumina HiSeq 2500/3000/4000/MiSeq instruments as previously described [18]. Additional, publicly available short-read sequencing data was obtained to facilitate phylogenetic comparison with previous studies as follows : [7]). These isolates were mapped to ST-specific references and phylogenetic trees generated, followed by permutation tests for geographical clustering as described in the statistical analysis section below.

Epidemiological linkage
Isolate data was linked to laboratory and electronic health record data via the Infections in Oxfordshire Research Database (IORD). Data on suspected infectious focus and patient provenance was acquired via linked local infection control records which had been submitted to Public Health England as part of the mandatory surveillance programme; such data was available for 400 Klebsiella spp. and 2773 E. coli isolates. Healthcareassociated (HA) BSI were defined as occurring >48 h post-hospital admission or ≤30 days since hospital discharge; other cases were defined as communityassociated (CA) [21].
Mapping to MLST-specific reference genomes was performed using Snippy (v4.6.0) [41]; genomes were acquired from NCBI (appendix). Gubbins (v2.3.4) was used to build recombination-corrected phylogenies [42]. Time-scaled phylogenies were created using the BactDating [43] library in R (version 3.6.0) under a relaxed gamma model with a minimum of 100,000 Markov chain Monte Carlo iterations [43]. Significance of temporal signal was assessed with root-to-tip plots and 10,000 random permutations of tip dates. Effective sample size was assessed using the Coda package [44]. Evolutionary distinctiveness (ED) was calculated using the Picante package in R [45]; low ED scores indicate closely related genomes. All bioinformatics was performed using the Oxford University Biomedical Research Computing Facility.
Contigs were classified as being of likely chromosomal or plasmid origin using MLplasmids with a 0.7 probability cut-off [46]. All plasmid/chromosomal contigs for a given isolate were binned into separate multi-fasta files and the distances between these calculated using Dashing [47]. The pairwise distance matrix was filtered on a distance of 0.71 (the median plasmidome similarity of bla CTX-M-15 carrying K. pneumoniae ST490 isolates) and then clustered into "plasmidome groups" using the LinkComm package in R [48]. The ecology of categorical groups was compared with a permutational multivariate analysis of variance (permanova) test in the Vegan package in R [49].
A pangenome wide association study (PGWAS) was conducted using PySeer [50] with a linear mixed model utilising relatedness distances inferred from a core genome phylogeny to correct for population structure. A discriminant analysis of principal components (DAPC) was performed using the Adegenet library in R [51]. Gene co-occurrence was examined using graphs constructed using the iGraph library in R [52]. For each species, edge lists with all genes/plasmids/insertion sequences with a Pearson correlation coefficient spp. >0.5 were created and communities detected using single linkage.

Statistical analysis
To describe molecular epidemiological trends, STs were arbitrarily categorised as "rare" (≤10 isolates over the study period, or untypeable STs), "intermediate"  isolates), and for E. coli, "sub-major "(101-300 isolates) and "major" (≥300 isolates). For Klebsiella spp. we considered the most prevalent ST (i.e., ST490) in isolation. Stacked negative binomial regression models with clustered standard errors were used to compare rates of change over time (STATA v16 [53]) [54]. Incomplete years (2008 and 2018) are excluded from this part of the analysis. All other statistical tests were performed in R v3.6 using the Stats package unless indicated [55]. Charlson score was calculated in the Comorbidity [56] R package using all ICD10 codes associated with episodes in the year prior to specimen collection dates. To test for geographical/healthcare setting structure in MLST groups we conducted a permutation test similar to that previously described [57]. In brief, the ratio of median distances between isolates from the same centre and different centres was calculated. This observed ratio was compared to a permuted distribution created by 1000 tip label randomisations. The number of permuted values at least as extreme as the observed value divided by the number of permutations carried out was calculated to give a one-sided test of statistical significance.

Results
Escherichia coli bloodstream infections are stably dominated by major lineages across community and healthcare-associated settings, but with diverse sporadic lineages accounting for almost half of cases, and evidence of sub-lineage replacement in major STs  Fig. S1). The incidence rate ratio per year (IRRy, i.e. the relative increase in incidence per year) of all these categories increased over time (major IRRy=1.13 (95% CI 1.10-1.17), sub-major IRRy=1.16 (1.10-1.22), intermediate IRRy=1. 15 (1.12-1.18), rare IRRy=1.10 (1.06-1.14)) ( Fig. 1A). There was no evidence that the proportion of BSIs caused by major or sub-major STs changed over the study period (Fig. 1B). There was evidence that the incidence of BSIs caused by intermediate STs increased slightly faster versus all other isolates and by rare isolates slightly slower (p heterogeneity <0.05).
No evidence of intra-or inter-regional clustering of major Escherichia coli lineages causing BSI in either communityor healthcare-associated settings For the four largest E. coli STs, we compared the phylogeny of BSI isolates in this study to those in previous studies over a similar timeframe (see Methods). There was no obvious phylogenetic clustering of HA or CA isolates (Fig. 2), supported by permutation testing, confirming that for the major STs these were indeed randomly distributed across the phylogeny by geography and healthcare/community setting (Additional File 1: Table S2). Additionally, for all four of the major E. coli STs, there was no difference in the ED scores between HA and CA cases and therefore no evidence of healthcare-associated adaptation and subsequent transmission (Additional File 1: Table S3). Our PGWAS did not identify any significant hits separating CA and HA isolates (Additional File 1: Fig.S5), including those that had been previously identified in other studies [7].
Urinary and hepatobiliary sources of E. coli and Klebsiella spp. BSI predominate, with major E. coli STs overrepresented in urinary-associated E. coli BSIs The urinary tract was the most common physicianidentified source of infection in both species. In E. coli, these were strongly associated with the presence of the pap group of genes (Additional File 1: Table S4) which were over-represented in major STs (916/1651 (55%) vs 394/1810 (22%) isolates, p < 0.001). The hepatobiliary tract was the second most common source for both E. coli and Klebsiella spp., followed by other gastrointestinal infections for E. coli and the respiratory tract for Klebsiella spp. (Additional File 1: Fig. S6a). For E. coli BSIs, there was some evidence that the incidence of BSIs not attributed to the urinary tract increased faster than those thought to be of urinary origin ( ; Fig. 3); there was only a single isolate carrying a gene encoding for carbapenem resistance (ST10, bla OXA-48 ) in 2014. For ceftriaxone (p heterogeneity =0.003), gentamicin (p heterogeneity =0.03), amoxicillin (p heterogeneity =0.003), and ciprofloxacin (p heterogeneity <0.001), there was evidence that the incidence of isolates with AMR genes encoding resistance to these antimicrobials was increasing faster than those without. Genes conferring resistance to ceftriaxone in E. coli were over-represented in ST131 (180/346) and ST73 (56/346), with significantly lower ED scores for isolates carrying ceftriaxone-resistance genes in these STs (accounting for 236/346 (68%) of these genes, Additional File 1: Fig.S1), consistent with genomes from resistant sub-lineages being more closely related within these sub-lineages, and therefore with their expansion. Findings were similar restricting to CA-BSI. ST127 was the only major/sub-major ST in which no genes conferring resistance to ceftriaxone were detected.
A major contributing factor to this finding was the overall decline of the MDR ST490 lineage and the emergence of a more susceptible ST490 sub-lineage between 2005 and 2010 which had lost the aac(3)-IIa, aac(6')-Ibcr, bla OXA-1 and tet(a) genes (Additional File 1: Fig.S7). Genes encoding for resistance to ceftriaxone were significantly more common in intermediate STs (STs with >10 isolates in the dataset) (  The distribution of virulence factors strongly reflected the underlying clonal population structure in E. coli isolates, with segregation by ST, confirmed with a discriminant analysis of principal components (DAPC) (Fig. 4). Most notably phylogroup B2 isolates were separated from the rest of the population by the presence of chuA, chuX (involved in haem utilisation), and espL1/espX4 (elements of the type 3 secretion system). Given this and the broad equilibrium in the E. coli BSI population structure demonstrated above, it is unlikely that increased carriage of certain virulence factors explains the increasing incidence of these infections. In keeping with this, there was a near-perfect correlation between frequency of carriage of any given virulence gene across all isolates in 2009 vs 2018 (Pearson correlation=0.99; p< 0.001).
No such structuring by virulence factors was observed for Klebsiella spp.; instead, virulence factors were widely distributed amongst 79 known STs and 14 isolates with no assigned ST (Additional File 1: Fig. S8) (Fig. 5).
Plasmid replicon and plasmidome analyses reflect diversity consistent with a highly mobile accessory genome within species Given the known importance of plasmids in the carriage and transmission of AMR genes, we sought to understand the role they might play in the relative success of major STs. Analysis of plasmid replicon profiles showed these were largely genus-restricted, with some overlap (Additional File 1: Fig. S9). However, plasmid replicons were not structured by host lineage in E. coli or Klebsiella spp., with the exception of K. pneumoniae ST490 which was discriminated by the presence of the Fig. 3 Changes in the incidence of clinically relevant groups of antimicrobial resistance genes considering total effects of resistance to individual antibiotics, regardless of mechanism, over the study period. Gene groupings are as defined by ResFinder. Amoxicillin is excluded for Klebsiella spp. which are generally considered intrinsically resistant to this drug. Bars show univariate point estimates and 95% confidence intervals for the incidence rate ratio (per year) of presence of genes belonging to the classes shown. The point estimate and error bar is shown in a darker blue if a Wald test between the incidence rate ratios for the presence and absence of a given set of genes was significant at a 0.05 threshold (i.e. there was evidence that the rate of increase of isolates with a group of genes was different to that for isolates without) IncFIA(HI1) plasmid type (also identified in 23 E. coli isolates; Additional File 1: Fig. S10).
Using our approach to predict plasmidome population structure based on k-mer similarity of contigs binned as plasmid, we observed a striking degree of plasmidome diversity within highly genomically related isolates/STs for all species (Additional File 1: Fig. S9); for 656 isolates with a core genome similarity >0.99 to another isolate in the dataset, the median plasmidome similarity was only 0.51 (IQR 0.28-0.86). Isolates with a near-identical plasmidome (>0.99 similarity, n=115 isolates) did however have highly similar chromosomes (median plasmidome similarity: 0.97 [IQR 0.94-0.99]). Compared with other STs, major E. coli STs had larger plasmidomes (median size 106,766 vs 97,432, p<0.001) which belonged to more "plasmidome groups" (see methods) (median 3 vs 2, p< 0.001). For both E. coli and Klebsiella spp., there was no evidence of different plasmid populations between those associated with HA-vs CA-BSI (p=0.4).
Finally we analysed the co-occurrence of AMR genes, plasmid replicon types, insertion sequences (ISs), and integron-associated markers within E. coli and Klebsiella spp. The networks formed were notable for the widespread co-occurrence of AMR genes, insertion sequences and (for some classes of AMR genes) integrons, but not usually specific plasmid types (Additional File 1: Figs. S11/12). This suggests that most common AMR genes are found in a diverse range of genetic contexts (e.g. multiple plasmid types, chromosomally integrated), and that horizontal gene transfer of important AMR genes in these isolates is largely facilitated by smaller, non-plasmid MGEs such as transposons/ISs/integrons, which are difficult to reliably evaluate with short-read data [58].

Discussion
In this unbiased longitudinal sequencing study of all (90day-deduplicated) E. coli and Klebsiella spp. BSIs in Oxfordshire over a decade (2008-2018), we highlight the similarities and differences in the molecular epidemiology of these species. Overall, the increasing incidence of E. coli and Klebsiella spp. BSIs in Oxfordshire is not explained by the expansion of a single ST, and much of the burden (40%) of Klebsiella spp. disease is caused by non-K. pneumoniae species. Although six lineages stably account for nearly half of all E. coli BSIs and a clonal outbreak was observed in K. pneumoniae, many BSIs in our setting are caused by diverse strains with diverse accessory genomes.
We found no genomic evidence supporting the stratification of E. coli isolates into HA-BSI vs CA-BSI. Strains causing healthcare-associated BSIs may not be specifically healthcare-acquired but may still be healthcareprovoked (for example by the presence of indwelling devices or relative immunosuppression). Interventions targeting only HA infections (such as that of the UK government to reduce these by 50% by 2021 [3]) might have limited efficacy without considering the wider ecology of these species. Whilst the incidence of E. coli BSIs attributed to the urinary tract increased over the study period, there was evidence that the rate of this increase was slower compared to other sources. This possibly reflects some success of infection prevention interventions such as catheter care, but also highlights the multifaceted approach required to reduce the overall incidence rate. More work is required to understand CA-Klebsiella spp. BSIs because known markers of virulence found in CA disease (e.g. aerobactin, yersiniabactin, colibactin) in other studies [14] were not seen in most of our cases. Importantly therefore, existing markers may therefore be insensitive to detect the emergence of clinically hypervirulent, CA, multidrug-resistant strains.
For E. coli, our analysis is consistent with previous studies demonstrating a broadly stable population structure at the MLST level with the predominance of STs 69, 73, 95, and 131 [7,11]. Within this however, we demonstrated some replacement in sub-clades of major STs over the study period. This may represent an adaptive strategy within major STs that allows them to continue to maintain their niche and evolve to survive changing environmental and immunological selection pressures. As has been previously noted, virulence factors in E. coli were structured by the underlying phylogeny [59,60]. Differences in carriage of haem utilisation genes were a notable factor discriminating isolates in major STs from the rest of the population, suggesting that iron metabolism might be an important selective pressure in determining invasive potential, a finding supported by a recent GWAS of a mouse model [61]. In contrast, for Klebsiella spp., we demonstrated that most infections are caused by sporadic STs which individually only accounted for only a small proportion of the overall population causing disease.
In E. coli, the incidence of isolates carrying genes conferring resistance against commonly used antimicrobial classes increased faster than those without; in general, the opposite was true for Klebsiella spp. The most commonly isolated MDR-ST in Klebsiella spp. (ST490) significantly decreased in incidence over the study period, lending credence to the idea that the relentless expansion of MDR clades is not inevitable, although it is unclear what interventions may have helped in causing its decline, particularly given that approximately a third of cases were community onset. The increasing incidence of gentamicin/ceftriaxone resistance in predominantly CA-E. coli BSIs seems paradoxical given that these antibiotics are rarely used in this setting. One explanation is that they are co-located on MGEs with other AMR genes encoding resistance to antibiotics more commonly used in the community (e.g. amoxicillin, trimethoprim). An alternative hypothesis might be increased exposure to third-generation cephalosporins due to the rise of ambulatory/"hospital-at-home" medical pathways where the once-a-day dosing of ceftriaxone makes it a relatively widely prescribed antibiotic in these settings. Regardless, overall, Klebsiella strains causing BSI appear to be exposed to declining antibiotic selection pressures compared to E. coli, suggesting that they are maintained and selected for in distinct ecological niches.
In E. coli BSI isolates, there was no genetic signal of adaptation to either healthcare or community environments, suggesting that these are not relevant niches for selection, contrary to a previous study [7]. However, this study included small numbers of isolates (n=162) and may have been unable to fully account for population structure. Furthermore our analysis suggests that, contrary to true nosocomial pathogens [62], the ecology of the E. coli plasmidome is similar for HA/CA infections. Our findings support the hypothesis that the diversity observed in any given epidemiological strata (e.g. age groups/infection focus/healthcare setting) is sampled from the same common ecological pool of isolates with invasive potential rather than representing specialised adaptation to any given setting. There are of course exceptions to this, such as localised outbreaks within care facilities and hospital environments [63]. However, our data suggests that at least in Oxfordshire, these outbreaks do not contribute significantly to the epidemiology of E. coli BSIs. Importantly, iatrogenic, nonpathogen-associated factors promoting invasive infection (e.g. urinary catheterisation) should be minimised to reduce the incidence of E. coli BSIs.
For Klebsiella spp., the traditional view that HA infections are opportunistic and caused by MDR isolates and CA infections are caused by isolates carrying one or more specific virulence genes (e.g. yersiniabactin, colibactin, aerobactin, salmochelin), which appears to be an over-simplification in our setting [12,13]. The majority of CA Klebisella spp. infections contained none of the known genetic markers of hypervirulence, and these genes are unlikely to be a reliable method of surveillance for emerging strains with a propensity to cause invasive CA disease, at least in our setting. Whilst ESBL and MDR isolates were significantly more common in HA isolates, about a third were CA (even using our fairly conservative definition) which significantly challenges the prevailing dogma that these are largely opportunistic HA strains.
Our study was limited by the inability to reconstruct closed genomes/plasmids using short-read sequencing data and our plasmidome analysis should therefore be interpreted with caution. The widely used PlasmidFinder database allows some inferences to be made using short read data; however, there are many untypeable plasmids that are not reflected in this database [64]. Similarly the MLplasmids classification algorithm used is trained on a limited reference set which may lead to some erroneous contig classifications. Additionally, analysis of the entire plasmidome may be too crude to identify the nuances of similarities/differences, particularly for small, shared plasmids as these represent only a small proportion of the overall plasmidome. Whilst we used a relatively conservative definition of healthcare-associated (i.e. within 30 days of hospital admission), this may have failed to capture the longer-term impacts of earlier hospital admissions/other healthcare exposures.

Conclusions
The contrasting epidemiology of E. coli and Klebsiella spp. BSIs suggests different reservoirs, selection pressures, and modes of acquisition for these genera. Separate strategies to reduce the incidence of these infections are likely to be required, and consideration of the community as a reservoir is important. Given the stable population structure demonstrated here, vaccines may be a promising prospect to reduce the incidence of E. coli and Klebsiella pneumoniae BSI. For E. coli, such a vaccine (ExPEC4/10V, Janssen Pharmaceuticals [65]) is in clinical trials and we have recently demonstrated its potential to provide protection against most E. coli isolates causing BSI in Oxfordshire [66]. The lack of reproducibility of several findings from previous studies and the poor sensitivity of current molecular markers for hyper-virulent Klebsiella surveillance highlights the critical importance of unselected sampling frames when making epidemiological inferences.
Additional file 1: Fig S1. Gene presence/absence heatmap showing AMR gene presence/absence against the core genome phylogeny for E. coli. Fig S2: Time-scaled phylogenies for ST131, ST95, ST73 and ST69. Fig  S3: Possible transmission within nursing homes. Fig S4: Gene presence/ absence heatmap showing AMR gene presence/absence against the core genome phylogeny for Klebsiella spp. Fig S5: Manhattan plots of a pangenome wide association study of the association of genes with community/healthcare associated onset. Fig S6: Proportions of presumed infectious foci for CA and HA BSI. Fig S7: Timescaled phylogeny of Klebsiella pneumoniae ST490 with a heatmap of AMR genes. Fig S8: Phylogenetic tree of Klebsiella spp annotated with species and virulence score. Fig  S9: top panel -plasmid types in the PlasmidFinder database identified in the major/other E. coli/Klebsiella spp., bottom left -plot showing kmer based plasmidome similarity (y-axis) against chromosome similarity (xaxis) for isolates of the same MLST. Fig S10: DAPC plots for Klebsiella spp. Fig S11: Networks of genes/plasmids/insertion sequences commonly cooccurring in E. coli. Fig S12: Networks of genes commonly co-occuring in Klebsiella spp. Table S1: Incidence rate ratios for sub-lineage of major STs identified by fastbaps. Table S2: SNP ratios (median within/between region) and (median HA/all) were calculated for each ST. Table S3: Evolutionary distinctiveness (ED) scores for community-associated (CA) and healthcare-associated (HA) isolates amongst major E. coli STs. Table S4: Top hits from a pangenome-wide association study (PGWAS) performed using Pyseer [50] of the assocation of gene presence/absence with pysician identified BSI source.