Enterobacterales plasmid sharing amongst human bloodstream infections, livestock, wastewater, and waterway niches in Oxfordshire, UK

Plasmids enable the dissemination of antimicrobial resistance (AMR) in common Enterobacterales pathogens, representing a major public health challenge. However, the extent of plasmid sharing and evolution between Enterobacterales causing human infections and other niches remains unclear, including the emergence of resistance plasmids. Dense, unselected sampling is essential to developing our understanding of plasmid epidemiology and designing appropriate interventions to limit the emergence and dissemination of plasmid-associated AMR. We established a geographically and temporally restricted collection of human bloodstream infection (BSI)-associated, livestock-associated (cattle, pig, poultry, and sheep faeces, farm soils) and wastewater treatment work (WwTW)-associated (influent, effluent, waterways upstream/downstream of effluent outlets) Enterobacterales. Isolates were collected between 2008 and 2020 from sites <60 km apart in Oxfordshire, UK. Pangenome analysis of plasmid clusters revealed shared ‘backbones’, with phylogenies suggesting an intertwined ecology where well-conserved plasmid backbones carry diverse accessory functions, including AMR genes. Many plasmid ‘backbones’ were seen across species and niches, raising the possibility that plasmid movement between these followed by rapid accessory gene change could be relatively common. Overall, the signature of identical plasmid sharing is likely to be a highly transient one, implying that plasmid movement might be occurring at greater rates than previously estimated, raising a challenge for future genomic One Health studies.

Many plasmids can transfer between species and are seen across different niches (Redondo-Salvo et al., 2020) but the extent to which they are shared between human and non-human niches remains poorly understood.Previous studies investigating this topic have often been limited in size given the genetic diversity in these niches (Mounsey et al., 2021), and/or restricted to single species (Ludden et al., 2019) or drug-resistant isolates (Shen et al., 2020), or are systematic studies, pooling geographically/temporally disparate samples (Cherak et al., 2021;Bastidas-Caldes et al., 2022).Further, fragmented genome assemblies in many cases make recovering complete plasmids, and other MGEs, impossible (Hilpert et al., 2021).
Instances of cross-niche transfer of plasmids are well described, but the frequency of such events is poorly characterised.There are multiple instances where AMR genes have emerged from non-human niches and subsequently become major clinical problems in human Enterobacterales infections, highlighting the relevance of inter-niche transfer in AMR gene dissemination (e.g.bla CTX-M , mcr-1 [Wang et al., 2018] and bla NDM-1 [Sekizuka et al., 2011]).In general, environmental bacteria are believed to be the original source of AMR genes that eventually become prevalent in clinical settings after transfer into clinical pathogens.However, we know little about natural rates of inter-niche transfer beyond these high-profile examples.It remains unclear how plasmids evolve within natural populations, meaning we understand little about the wider context in which AMR genes emerge and disseminate.
Sampling niche was strongly associated with isolate genus (Fisher's test, p-value <0.001;   Appendix 1-figure 2), demonstrated intermixing of human and non-human isolates within clades, consistent with species lineages not being structured by niche.We contextualised our plasmids within known plasmid diversity using 'plasmid taxonomic units' (PTUs; using COPLA, see Materials and methods), designed to be equivalent to a plasmid 'species'.We found 32% (1193/3697) of plasmids were unclassified, highlighting the substantial plasmid diversity within this geographically restricted dataset, whilst the remaining 68% (2,504/3,697) were assigned a PTU.In total, we found n=67 known PTUs, containing a median 9 plasmids (IQR = 4-30, range = 1-556), with the largest PTU-F E (556/2,504), corresponding to F-type Escherichia plasmids.

Figure 2 continued on next page
Plasmid clustering reveals a diverse but intertwined population structure across niches Near-identical plasmids shared across niches are a likely signature of recent transfer events, but we also wanted to examine the wider plasmid population structure.We therefore agnostically clustered all plasmids based on alignment-free sequence similarity (clusters were groups of n≥3 plasmids; see Materials and methods and Appendix 1-figures 4 and 5).We defined n=247 plasmid clusters with median 5 members (IQR = 3-10, range = 3-123) recruiting 71% (2627/3697) of the plasmids.The remainder were either singletons (i.e.single, unconnected plasmids; 19% [718/3697]) or doubletons (i.e.pairs of connected plasmids; 10% [352/3697]).By bootstrapping b=1000 ACs for plasmid clusters, doubletons, and singletons found against number of isolates sampled (Appendix 1-figure 6; see Materials and methods), we estimated that the rarefaction curve had a Heap's parameter γ=0.75, suggesting further isolate sampling would likely detect more plasmid diversity and clusters.
Overall, plasmid clusters scored high homogeneity (h) but low completeness (c) with respect to biological and ecological characteristics (non-putative PTUs [h=0.99,c=0.66];replicon haplotype [h=0.92,c=0.69]; bacterial host sequence type (ST) [h=0.84,c=0.14] in Figure 3b;predicted mobility [h=0.93,c=0.20] in Figure 3c).This indicated that clustered plasmids often had similar characteristics, but the same characteristics were often observed in multiple clusters.When scoring plasmid clusters against broad sampling niche (BSI, livestock-associated, or WwTW-associated; Figure 3a), homogeneity was low (h=0.12,c=0.61), indicating mixed clusters.The imperfect homogeneity is to be anticipated as replicon haplotypes and mobilities can vary within plasmid families, and plasmid families can have diverse host ranges (Redondo-Salvo et al., 2020).

An intertwined ecology of plasmids across human and livestockassociated niches
Plasmids can change their genetic content, particularly when subject to new selective pressures (Rodríguez-Beltrán et al., 2021;Pesesky et al., 2019).Many plasmids have a structure with a 'backbone' of conserved core genes and a 'cargo' of variable accessory genes (Orlek et al., 2017b;Matlock et al., 2021a;Coluzzi et al., 2022).We wanted to explore evidence for cross-niche plasmids with minimal mutational evolution in a shared backbone (compatible with approximately years of evolutionary separation) but variable accessory gene repertoires.
Using multiple sequence alignments of the core genes within each cluster, we produced maximum likelihood phylogenies (see Supplementary file 1 and Materials and methods).For this step, we only considered the n=62/69 clusters where each plasmid had ≥1 core gene.With the n=27/62 clusters that contained both BSI and livestock-associated plasmids, we measured the phylogenetic signal for plasmid sampling niche using Fritz and Purvis' D (see Supplementary file 2 and Materials and methods).The analysis indicated that the evolutionary history of plasmid clusters is neither strictly segregated by sampling niche nor completely intermixed, but something intermediate.
Alongside the core-gene phylogenies, we generated gene repertoire heatmaps (example cluster 2 in Figure 4a-b; all clusters and heatmaps in Supplementary file 1).By visualising the genes in a consensus synteny order (see Materials and methods), the putative backbone within each plasmid cluster is shown alongside its accessory gene and transposase repertoire.This highlights how plasmids might gain/lose accessory functions within a persistent backbone.Log-transformed linear regression revealed a significant relationship between Jaccard distance of accessory genes presence against core-gene cophenetic distance (y=0.080log(x)+0.978,R 2 =0.47,F(1,52988)=4.75e4,p-value <0.001; see Appendix 1-figure 8 and Materials and methods).

Plasmid dissemination between human and livestock-associated niches is not structured by bacterial host
Alongside vertical inheritance, conjugative and mobilisable plasmids are capable of inter-host transfer, crossing between bacterial lineages, species, up to phyla (Redondo-Salvo et al., 2020).Phylogenetic analysis can determine whether plasmid evolution between BSI and livestock-associated niches is driven by host clonal expansion or other means, as well as allow us to explore the early emergence of AMR gene carrying plasmids.
Figure 4b-c shows the plasmid core-gene phylogeny (T plasmid ) and the E. coli host core-gene phylogeny (T chromosome ).The E. coli phylogeny was structured by six clades corresponding to the six phylogroups (see Materials and methods).We found low congruence between the plasmid core-gene phylogeny and the chromosomal core-gene phylogeny as seen in the central 'tanglegram' (i.e.lines connecting pairs of plasmid and chromosome tips from the same isolate).Additionally, we calculated a Robinson-Foulds distance RF(T plasmid , T chromosome )=162, reflecting a high number of structural differences between the phylogenies (see Materials and methods).There was some evidence of plasmid structuring by niche (Fritz and Purvis' D=0.24; see Materials and methods).

Discussion
Sharing of plasmids between different niches is normally focused on those carrying AMR genes that are of particular current clinical concern, such as extended-spectrum beta-lactamase (ESBL) or carbapenemase genes, meaning we lack information on the vast 'denominator' of background plasmid sharing, and on the dissemination of other AMR genes which are now widespread in clinical isolates and from which important insights might be gained.By analysing a dataset of n=3697 systematically collected Enterobacterales plasmids sampled from human BSI, livestock-and WwTW-associated sources in a geographically and temporally restricted context, we found evidence supporting significant plasmid dissemination across niches, putting those which carry AMR genes of current major clinical concern into context.We found 225 instances of shared, near-identical plasmid groups, 25% of which were found across multiple bacterial STs, 4% across multiple bacterial species, and 8% in both human BSI and ≥1 non-BSI niche.Beyond this near-identical sharing, we analysed 'clusters' of plasmids and found that 73/247 clusters contained plasmids seen in both human BSIs and other contexts.Approximately a fifth (52/247) of plasmid clusters contained plasmids carrying AMR genes (n=550 plasmids).Our results suggest the need for broad, unselected, and detailed sampling frames to fully understand plasmid diversity and evolution, and to evaluate the 'One Health' risk of AMR associated with plasmid sharing across niches.
Whilst many plasmid clusters were strongly structured by host phylogeny and isolate source, some plasmids from human BSIs were highly genetically related to those in other niches, including livestock.However, not all of these carried AMR genes.Our results highlight the potential routes for transfer that exist through similar plasmids.However, recovering these instances of putative sharing is a sampling challenge.Accumulation curve analyses suggested increasing the size of our dataset would have led to further near-identical matches at an approximately linear rate, meaning even a dataset of this size captures only a small fraction of the true extent of plasmid sharing between human clinical and other non-human/clinical niches.This presents a challenge for designing appropriately powered studies.Had we only sampled n=100 livestock-associated isolates (i.e.around 20% of our actual sample), there was only a 39% chance that we would have detected ≥5 matches with BSI plasmids (Appendix 1-figure 9).
Understanding the evolutionary history, distribution, and epidemiology of well-known genes in environmental plasmids may offer insights into the future trajectories of more recently emerged genes.For example, the first plasmid-encoded beta-lactamase to be described was bla TEM-1 , identified in 1965 in an E. coli isolate in Greece (Datta and Kontomichalou, 1965) and now widely prevalent in Enterobacterales (Bush and Bradford, 2020).bla TEM-1 has a narrow spectrum of activity and is now less clinically concerning than newer genes which mediate broad-spectrum resistance, but in our dataset bla TEM-1 was strongly associated with plasmid clusters seen in BSI and with the carriage of other AMR genes.bla TEM-1 may continue to play an important role in the spread of AMR-carrying plasmids which can transfer recently emerged genes, and similarities in its association with plasmids and other smaller transposable MGEs may reflect the future trajectory of other AMR genes of more recent clinical concern such as ESBLs and carbapenemases.
Given that plasmids observed in BSI isolates represent small proportion of human Enterobacterales diversity, many more sharing events may occur in the human gut (Forster et al., 2019) which we only sampled incompletely using wastewater influent as a proxy.The human colon contains around 10 14 bacteria (Sender et al., 2016), with large ranges of Enterobacteriaceae abundance.Further, even small numbers of across-niche sharing events, such as transfer events of important AMR genes from species-to-species or niche-to-niche, may have significant clinical implications, as has been seen with several important AMR genes globally.Future studies need to carefully consider the limitations of sampling frames in detecting any genetic overlap, given both substantial diversity and the effects of niches and geography (Shaw et al., 2021;Hanage, 2019).
By examining plasmid relatedness compared to bacterial host relatedness in E. coli, we demonstrated that plasmids seen across different niches are not necessarily associated with clonal lineages.Using a pangenome-style analysis, we showed that plasmids can share sets of near-identical core genes alongside diverse accessory gene repertoires.While plasmids with more distantly related core genes tended to have dissimilar accessory gene content, plasmids with more closely related core genes shared a wide range of accessory gene content.This would be consistent with a hypothesis of persistent 'backbone' structures gaining and losing accessory functions as they move between hosts and niches.We suggest that this mode of transfer might be worth considering.Evolutionary models for plasmids which can accommodate well-conserved backbone evolution alongside accessory structural changes and gain/loss events are urgently needed.Estimating plasmid evolutionary rates remains a challenge, with little known about appropriate values for mutation rates in plasmids, and even less for non-mutational processes such as gene gain/loss.
Our study had several limitations.Our non-BSI isolates were not as temporally varied as the BSI isolates, meaning we could not fully explore temporal evolution.Although we evaluated four bacterial genera, 72% (1044/1458) of our sequenced isolates were E. coli, and so our analyses and findings are particularly focused on this species.Additionally, we did not sample livestockassociated niches densely enough to explore individual livestock types (cattle/pig/poultry/sheep) sharing plasmids with BSI isolates (see Appendix 1-figure 9).Isolate-based methodologies are limited in evaluating the true diversity of the niches sampled; composite approaches including metagenomics might shed additional insight in future studies.Further, the exact source of an isolate is poorly defined for wastewater/waterway isolates as they act as a confluence of multiple sources, although they represent important niches in their own right.We only analysed plasmids from complete genomes, i.e., where the chromosome and all plasmids were circularised, meaning we disregarded ~23% and ~33% of BSI and non-BSI assemblies, respectively.The exclusive use of complete assemblies was to ensure full plasmid sequences could be examined in their full genomic context.We only focused on plasmids as horizontally transmissible elements here; detailed study of other smaller MGEs across-niches would represent interesting future work.We have also investigated a limited subset of Enterobacterales: plasmid sharing likely extends to other bacterial hosts not investigated here.Lastly, our isolate culture methods for livestock-associated samples may not have been as sensitive for the identification of Klebsiella spp. as for other Enterobacterales such as Escherichia, as we did not use enrichment and selective culture on Simmons citrate agar with inositol (Rodrigues et al., 2022).This may have limited our ability to study the epidemiology of livestock Klebsiella plasmids.
In conclusion, this study presents to our knowledge the largest evaluation of systematically collected Enterobacterales plasmids across human and non-human niches within a geographically and temporally restricted context.Near-identical plasmids can be found in different niches, pointing to putative dissemination, although this dynamic likely varies by plasmid cluster; the proportion of nearidentical plasmid groups that were found across niches was 8% (17/225) and influenced by sample size.We demonstrate a likely intertwined ecology of plasmids across human and non-human niches, where different plasmid clusters are variably but incompletely structured 1475 and putative 'backbone' plasmid structures can rapidly gain and lose accessory genes following cross-niche spread.Future 'One Health' studies require dense and unselected sampling, and complete/near-complete plasmid reconstruction, to appropriately understand plasmid epidemiology across niches.

BSI isolates
Sequenced Human BSI Enterobacterales isolates from patients presenting to n=4 hospitals within Oxfordshire, UK, September 2008-December 2018, as described in Lipworth et al., 2021, were also included.Although all patients were sampled in Oxfordshire, a total of n=505/738 patients resided in Oxfordshire, n=133/738 in surrounding counties, and n=100/738 had location information omitted.Only complete assemblies (n=738/953 total assembled) were considered.

Other livestock-associated and WwTW-associated isolates
Enterobacterales isolates from faeces from the n=14 non-poultry farms and wastewater influent, effluent, and waterways upstream/downstream of effluent outlets surrounding n=5 WwTWs, across three seasonal timepoints in 2017 were included (as in Shaw et al., 2021), were included.Only complete assemblies (n=558/827 total assembled) were considered.

PTU classification
Plasmids were assigned a PTU using COPLA (Redondo-Salvo et al., 2021) (default parameters except -t circular, -k Bacteria, -p Pseudomonadota, -c Gammaproteobacteria, and -o Enterobacterales) (Redondo-Salvo et al., 2020).COPLA compares query plasmids to a database of PTU reference plasmids, assigning a PTU when both (i) the ANI >0.7 along 50% of the length of the smallest plasmid in the comparison and (ii) a graph-neighbouring condition to existing PTU clusters is satisfied.The COPLA reference database contains over 10,000 curated, non-redundant plasmids retrieved from the 84th NCBI RefSeq database in 2017 (Pruitt et al., 2007).We contextualised our plasmids within known plasmid diversity using COPLA to determine each plasmid's 'PTU' (see Materials and methods), which is designed to be equivalent to a 'species' concept for plasmids (Redondo-Salvo et al., 2021).Briefly, COPLA classifies query plasmids based on average nucleotide identity (ANI) against a non-redundant reference plasmid database where most plasmids have been assigned to a reference PTU (Pruitt et al., 2007).Within our sample, 64% (2369/3697) plasmids were assigned a PTU and 4% (135/3,697) a putative PTU (i.e. the query plasmid was clustered with three unclassified reference plasmids).This is consistent with a previous COPLA analysis of 1000 Enterobacterales plasmids which found that 63% were classified into a PTU (Redondo-Salvo et al., 2021).The remaining 32% (1193/3697) of plasmids were unclassified (i.e.connected set with less than four plasmids) highlighting the previously unsampled plasmid diversity within our dataset.In total, we found n=67 known PTUs, containing a median 9 plasmids (IQR = 4-30, range = 1-556), where the largest assigned PTU (556/2504) was PTU-F E , corresponding to F-type Escherichia plasmids (Matlock et al., 2021a;Rozwandowicz et al., 2018).The proportion of unclassified plasmids was higher in environmental/livestock samples (33%; 385/1155) versus BSI samples (26%; 485/1880), emphasising the underrepresentation of non-human plasmids in reference plasmid databases.

Near-identical plasmid screening
Groups of near-identical plasmids were detected as connected components in a plasmid-plasmid network with Mash distance (Ondov et al., 2016) (v. 2.3; default parameters except sketch size -s 1000000) weighted edges, at a threshold d<0.0001.Briefly, Mash distance estimates an evolutionary distance on a reduced-length MinHash sketch of the sequences.Since Mash is a probabilistic estimate of evolutionary distance, we confirmed the probability of seeing any of our pairwise Mash distances in the near-identical groups by chance was 0. For whole genomes, Mash distance has a strong positive correlation with ANI (Figueras et al., 2014).We also required the shortest plasmid to be within 1% length (bp) of the longest plasmid, to account for assembly errors.Network analysis was performed using the igraph (Csardi and Nepusz, 2006) The stringency of a k-mer-based distance threshold for near-identical plasmid clustering is equivalent to a threshold on the Jaccard index (i.e.rearranging the Mash distance calculation ) ) with d=10 -4 and k=21 gives a Jaccard index threshold of j=0.9958).The effect of this threshold varies with plasmid size: at very small plasmid sizes, clusters contain only identical plasmids because the presence of a single SNP means plasmids are placed in different clusters.For example, two 1552 bp plasmids with a single SNP (e.g.RHB03-C05_6 and RHB02-C22_6) will have a Mash distance of d=5.0 × 10 -4 (>10 -4 threshold).In contrast, at length = 150 kb a single SNP (not at the start/end of the plasmid) would lead to d=5.6 × 10 -6 (<<10 -4 threshold); even two 150 kbp plasmids with ~30 SNPs would have d≈2×10 -4 (>10 -4 threshold) and so be split into near-identical plasmids.Our analysis of plasmid sharing is therefore maximally conservative at small plasmid sizes but remains highly conservative for large plasmids.

Accumulation and rarefaction curves
To generate an accumulation curve, isolates were sampled without replacement in a random order.For each isolate, the new plasmid diversity was recorded.For Appendix 1-figure 3, we recorded the number of new near-identical plasmid groups and singletons.For Appendix 1-figure 9, we recorded the number of near-identical matches with BSI plasmids from only environmental/livestock isolates.
For Appendix 1-figure 6, we recorded the number of new clusters, doubletons, and singletons.A bootstrapped average of b=1000 accumulation curves was plotted for the rarefaction curve.The bootstraps were also used to estimate Heap's parameter (γ) by fitting a linear regression to log-log transformed data using standard R libraries.For γ<0, it is possible to sample the entire diversity, and for 1>γ>0, the diversity will increase with every additional sample (Tettelin et al., 2008).

Plasmid similarity
Plasmid Jaccard index (JI) was calculated using Mash (Ondov et al., 2016) (v. 2.3; default parameters except sketch size -s 1000000).The JI, given by where A, B are the sets of k-mers of plasmids a, b, respectively.This measures extent of k-mer sharing between plasmids, range = 0-1, where 1 indicates an identical k-mer repertoire.Since the sketch size was larger than the plasmid lengths (except for one plasmid in the dataset, OX-ENV-67_2, which was larger than 1 Mbp at 1,310,597 bp and was not clustered; the next smallest was OX-WTW-80_2 at 394,284 bp), the calculated Jaccard indices were almost always exact.

Plasmid network and clustering
The determination of the plasmid-plasmid network, threshold, and clusters could be achieved with several alternative methodologies.Plasmid networks have previously been constructed by full sequence alignments (Yamashita et al., 2014), annotated genes (Branger et al., 2018), and alignment-free Mash distances (Matlock et al., 2021a;Acman et al., 2020;Jesus et al., 2019).We chose to use the Jaccard index of entire plasmid 21-mer distributions to capture coding sequences, their immediate contexts (Matlock et al., 2021b;Arcilla et al., 2016), and intergenic regions (Zhi et al., 2015;Delihas, 2009), all of which have known importance to bacterial evolution.Further, our contained previously unsampled diversity as seen by the PTU analysis, and because reference-based classifications such as MOB and replicon typing schemes are known to be incongruent (Orlek et al., 2017a) or unreliable: 16% (602/3697) of our plasmids had an unidentifiable replicon type, which is not uncommon (Rozwandowicz et al., 2018).The evolutionary histories of plasmids can incorporate multiple gain, loss, and rearrangement events in addition to mutations (Kizny Gordon et al., 2020), and as such, traditional measures of genetic relatedness (e.g.single nucleotide variant thresholds) used for genomic epidemiology of whole genomes are likely less appropriate here.These similarities formed the edge weights in a plasmid-plasmid network, which was subsequently thresholded to sparsify the network and allow the detection of clusters.Network thresholding to some extent depends subjectively on the dataset, with trade-offs between successfully revealing the underlying structure of plasmid relationships without excessively separating relatives.We chose a data-driven threshold as adopted by Branger et al., 2018, for their plasmid network, which examined the evolution of connected components within the network.This ensured the threshold was chosen where the regime of connected component evolution approximately stabilises, minimising excessive network breakup.The threshold was chosen at JI = 0.5, meaning that edges between plasmids with JI <0.5 were deleted from the network.From this threshold onwards, both the number of connected components and the number of singletons steadily increased at a similar rate (Appendix 1-figure 4).This regime indicates an approximately stable non-singleton structure from JI = 0.5 onwards.
We defined plasmid clusters as groups of n≥3 plasmids with high within-cluster similarities and low between-cluster similarities.Plasmid clusters were detected using the Louvain algorithm which optimises the network modularity by iterative expectation maximisation (Blondel et al., 2008).This aims to maximise the density of edges within clusters against edges between clusters.Though nondeterministic, the Louvain algorithm showed low variation in cluster distribution over 50 runs, consistent with reproducible segregation of plasmids in clusters (range of clusters detected: 245-247; Appendix 1-figure 5).The algorithm was implemented using the Python-Louvain (v.0.16) Python module.Although the algorithm is non-deterministic, multiple runs demonstrated minimal variation at our chosen network threshold.Overall, these approaches add to the growing literature describing suitable methodologies for clustering plasmids.
Near-identical plasmid groups were also included in the wider cluster analysis, as many were crosscompartmental and found across bacterial hosts (see earlier, Figure 2).Of the n=194/225 groups which were clustered, 100% (194/194) had all members fall within the same plasmid cluster, with n=30/247 clusters containing multiple near-identical plasmid groups.Only 6% (14/247) of plasmid clusters comprised exclusively near-identical plasmid groups, suggesting that near-identical groups of plasmids often have nearby genetically related plasmids.Examining the entire PTU distribution within clusters, most contained at least one unclassified plasmid (51%; 127/247) or plasmid assigned a putative PTU (9%; 23/247).However, many clusters exclusively contained just one known PTU (42%; 105/247).

Cluster homogeneity and completeness
Homogeneity (h) and completeness (c) are dual conditional entropy-based measures, independent of cluster and metadata label distributions (Rosenberg and Hirschberg, 2007) and are the conditional entropy of the metadata given the clusters and the entropy of the clusters, respectively H ( M|C ) = 0 when the cluster partition coincides with the metadata partition, and no new information is added.A cluster partition satisfies completeness (c=1) if all instances of a metadata label type are assigned the same cluster.Completeness is defined dually by The measures were calculated using the clver library (v.0.1.1)in R.

Fritz and Purvis' D
Fritz and Purvis' D measures phylogenetic signal for binary traits (Fritz and Purvis, 2010).First, we calculate the character state changes required to observe our phylogeny (d obs ).To account for phylogeny size and prevalence, d obs is standardised under the two null models: Hence, for D≈1, d obs follows d r more closely, and for D≈0, d obs follows d b more closely.We calculated d obs n=10,000 times and averaged the result, as well as calculate p-values for significant deviation from d r or d b .D was implemented using the R library caper (Orme, 2013).Fritz and Purvis' D is normally used for cross-species analysis so is not benchmarked for plasmids.Results for phylogenies with less than 25 tips should be viewed more conservatively due to reduced statistical power in these instances.

Consensus gene synteny heatmaps
For each cluster, we first generated a list of every possible pair of genes in the pangenome.Then for each plasmid, we counted the distance between these pairs, modulo the number of genes in the plasmid.If a gene was absent in a plasmid, NA was used.We then calculated the median of these values across all plasmids in the cluster.We then built a dendrogram from a hierarchical clustering of the median distances.The order of the tip labels in the dendrogram were then used as the 'consensus gene synteny'.

Accessory gene distances
Plasmid accessory gene distances were calculated using pairwise Jaccard distances on gene presenceabsences matrices.For plotting the cluster-wise plasmid core-gene cophenetic distance against accessory gene presence-absence Jaccard distance, only the n=26/62 clusters with at least 50 accessory genes were plotted.The log-transformed linear regression of Jaccard distance of accessory genes presence against core-gene cophenetic distance was fitted in R with standard libraries.

Plasmid mutation rate
Mutation rates per base pair in microbes typically arise from DNA replication and tend to be below m=10 -9 per site per generation (Drake, 1991) or perhaps as low as 10 -10 per site per generation (Foster et al., 2015;Wielgoss et al., 2013).For a plasmid of size L, one therefore expects L × m mutations per plasmid per generation.For example, if the plasmid has L=10 5 then in each generation 1 in 10,000 plasmids will gain a mutation.The generation time of E. coli per day in the human gut has been estimated to be between 6 and 20 generations per day (Ghalayini et al., 2018).For large plasmids that exist at a copy number of ~1, the plasmid generation time is the cell generation time.More generally, for a plasmid copy number p the number of replications of the plasmid expected for a given number of cell generations g will be p × g (assuming that plasmid copies are simply and linearly related to the realised number of replications per cell).A crude estimate for the expected mutation rate per time period for a plasmid is therefore given by L × m × p × g.For a plasmid of L=100 kbp and P=1, assuming m=[0.1-1]×10 -9 per site per generation and g= [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]×365 per year, one would expect it to accumulate ~0.5 mutations a year (between ~0.02 and 0.7 depending on assumptions).One obtains the same result for L=10 kbp and P=10.There is a strong inverse correlation between plasmid size and copy number.This suggests that a suitable upper bound for the expected number of mutations for a typical plasmid per year (under neutral evolution) is of the order of magnitude of 1 SNP a year.This rough 'SNPs and years' rule-of-thumb appears consistent with known empirical results.For example: 100 kbp I1-type Shigella plasmids isolated between 2007 and 2010 in Vietnam were separated by at most 2 SNPs (Holt et al., 2013); 30 kbp X4-type plasmids carrying mcr-1 isolated between 2016 and 2018 in China were separated by most 4 SNPs (Shen et al., 2020) (analysis not shown); 63.5 kbp pOXA48-like plasmids (n=202) in K. pneumoniae collected across Europe between 2013 and 2014 as part of EUSCAPE were overwhelmingly within 2 SNPs of each other (176/202) (David et al., 2020); the same was true of 45.4 kbp IncX3 plasmids (n=135) from the EUSCAPE dataset (all were within 6 SNPs of each other; see Figure 4 of that paper); and also of 113.4 kbp pKpQIL-like plasmids (n=91) from the EUSCAPE dataset -although a minority of these plasmids were separated by up to 20 SNPs, which seems suggestive of either ancestry before the 2-year sampling frame or recombination.

Pangraph analysis
We used pangraph (Noll et al., 2022) (v. 0.5.0) to build a pangraph of the clade within plasmid cluster 2, using the --circular flag and otherwise default parameters.We removed duplicated blocks from the pangraph.We used pangraph export (--edgeminimum-length 0, default parameters) to export the graph to GFA format and then visualised this using Bandage (Wick et al., 2015).Supplementary file 4 used Prokka annotations (see above) of the core and accessory pancontigs.The following dataset was generated:

Figure 1 .
Figure 1.A diverse sample of geographically and temporally restricted Enterobacterales.(a) Number of chromosomes and plasmids by niche, stratified by isolate genus.(b) Map of approximate, relative distances between sampling sites, coloured by niche (human bloodstream infection [BSI], livestockassociated [cattle, pig, poultry, and sheep faeces, soils nearby livestock sites], and wastewater treatment work [WwTW]-associated sources [influent, effluent, waterways upstream/downstream of effluent outlets]).Number in circles indicates how many of the n=1458 isolates are from that location.(c) Sampling timeframe for BSI and REHAB (non-BSI) isolates.

Figure 2 .
Figure 2. Cross-niche, near-identical plasmids.(a) Size of cross-niche, near-identical plasmid groups, coloured by niche (total n=84 plasmids).Median length (bp) of plasmids within groups increases from left to right.(b) Proportion of plasmid host species by group.(c) Predicted mobility of plasmid.(d)Antimicrobial resistance (AMR) gene carriage in plasmid.For small plasmids, the stringent distance threshold (d<0.0001)becomes an identical threshold, meaning that plasmids of the same length with a single SNP between them are grouped into different groups (e.g. the three groups with

Figure 4 .
Figure 4. Cluster 2 plasmid and host evolution.(a) Consensus gene ordering for plasmid cluster 2, coloured by gene type (total n=99 plasmids; n=1 Salmonella enterica isolate omitted).Genes are coloured by core, accessory, or transposase.(b) Plasmid core-gene phylogeny with tips coloured by sampling niche.The grey circle highlights the clade of n=44 plasmids which were further analysed.(c) Plasmid host chromosome core-gene phylogeny with tips coloured by sampling niche.Plasmid and host phylogeny tips are connected in a 'tanglegram' which connects pairs of plasmids and chromosomes from the same isolate.(d) Visualisation of the pangraph for n=44 plasmids in the grey-circled clade in (b).Blocks are coloured by presence in plasmids.(e) Core blocks (found in at least 95% of the n=44 plasmids).(f) Accessory blocks (found in less than 95% of the n=44 plasmids).
(i) tip labels are random permuted (d r ), and (ii) tip labels are distributed under the expectation of a Brownian motion model of evolution (d b ).Then, we define D = (d obs − d b )/(dr − d b ).
. A clustering satisfies homogeneity (h=1) if all cluster members have the same metadata label type.Consider a network with N nodes, partitioned by a set of metadata labels, M = represent the ij th entry in the contingency table of partitions.Hence, a ij counts the number of nodes with label m i in community c j .We then say