Frequent emergence of pathogenic lineages of Klebsiella pneumoniae via mobilisation of yersiniabactin and colibactin

Klebsiella pneumoniae (Kp) is a commensal bacterium that causes opportunistic infections. Evidence is mounting that Kp strains carrying acquired siderophores (yersiniabactin, salmochelin and aerobactin) and/or the genotoxin colibactin are highly pathogenic and can cause invasive disease. Here we explored the diversity of the Kp integrative conjugative element (ICEKp), which mobilises the yersiniabactin locus ybt, by comparing 2499 diverse Kp genomes. We identified 17 distinct ybt lineages and 14 ICEKp structural variants (some of which carry colibactin (clb) or salmochelin synthesis loci). Hundreds of ICEKp transmission events were detected affecting hundreds of Kp lineages, including nearly >20 transfers into the globally-disseminated, carbapenem-resistant clonal group CG258. Additionally, we identify a plasmid-encoded lineage of ybt, representing a new mechanism for ybt dispersal in Kp populations. We introduce a novel sequence-based typing approach for identifying ybt and clb variants, to aid the identification of emerging pathogenic lineages and the convergence of antibiotic resistance and hypervirulence. SIGNIFICANCE Klebsiella pneumoniae infections are increasingly difficult to treat with antibiotics. Some K. pneumoniae carry extra genes that allow them to synthesise yersiniabactin, an iron-scavenging molecule, which enhances their ability to cause disease. These genes are located on a genetic element that can easily transfer between strains. Here, we screened 2499 K. pneumoniae genome sequences and found substantial diversity in the yersiniabactin genes and the associated genetic elements, including a novel mechanism of transfer, and detected hundreds of distinct yersiniabactin acquisition events between K. pneumoniae strains. We also developed tools to identify and type yersiniabactin genes, to help track the evolution and spread of yersiniabactin in global K. pneumoniae populations and to monitor for acquisition of yersiniabactin in antibiotic-resistant strains.


INTRODUCTION
Multidrug-resistant Klebsiella pneumoniae (Kp) is one of the leading causes of healthcare-associated (HA) infections worldwide and poses significant treatment challenges. The production of carbapenemases and extended-spectrum betalactamases (ESBLs) are particularly problematic, and are often associated with clones such as sequence type (ST) 258 or 15 that cause hospital outbreaks (1)(2)(3). The prevalence of community-acquired (CA) invasive infections is also rising and is caused by Kp clones with enhanced pathogenicity such as ST23 (4)(5)(6) that are typically characterised by the presence of mobile genetic elements (MGEs) encoding siderophores, the genotoxin colibactin and the rmpA gene which contributes to the 'hypermucoid' phenotype by upregulating capsule production (4,7). CA Kp infections are also often associated with capsular serotypes that display greater serum resistance (K1, K2, K5), and are encoded by loci that can be transferred between Kp lineages via homologous recombination (8,9).
Siderophore systems comprised of iron-chelating molecules and associated receptors are common in Kp and other bacteria (10,11). They are considered integral to virulence as they allow bacteria to scavenge for iron -which is essential for growth -from host transport proteins, thereby enhancing the ability to survive and replicate within the host (12). Nearly all Kp produce the siderophore enterobactin. However its uptake mechanisms are inhibited by human lipocalin-2 (Lcn2), which has a strong binding affinity for ferric and aferric enterobactin (13) and induces an inflammatory response upon binding (14). The next most common siderophore in Kp is yersiniabactin, which escapes Lcn2 binding and also has iron-independent effects on virulence (14)(15)(16).
Yersiniabactin synthesis is encoded by the ybt locus. It was first described in the Yersinia 'high pathogenicity island', variants of which have since been identified in several Enterobacteriaceae species (17). In Kp, ybt is found on an integrative and conjugative element (ICE) that has been characterised in a few completely sequenced genomes (18)(19)(20). The ICE is self-transmissible, involving excision, formation of an extrachromosomal circular intermediate (requiring int and 17 bp direct repeats at the outer ends), mobilization to recipient cells (requiring virB1, mobB and oriT) and integration at attO sites present in any of four closely-located copies of tRNA-Asn in the Kp chromosome (18,21). The ICE sometimes includes loci for the synthesis of the siderophore salmochelin (iro) or the genotoxin colibactin (clb) (18,19,22). Colibactin has been shown to induce double strand breaks in human intestinal cells, and has been linked to colorectal cancer (23).
The mobility of the major virulence determinant ybt is highly concerning as it could theoretically be acquired by any Kp lineage, including those which are multidrug-resistant, leading to the emergence of new problematic clones. Yet detailed studies investigating the evolution, diversity and distribution of ybt in Kp has so far been limited. In this study, we address this lack by analysing 2499 Kp genomes to investigate the evolution of the ICEs (referred to hereafter as ICEKp) responsible for ybt and clb mobilization, and develop an approach for detecting and tracking these variants easily from genome data.

Bacterial genome sequences.
We analysed a total of 2499 Kp genomes (2285 Kp sensu stricto, 63 K. quasipneumoniae, 146 K. variicola, 5 undefined or hybrid (4,7)) obtained from various sources representing a diverse geographical and clonal distribution ( Table 1; see Supplementary Table 1 for full list of isolates and their properties). Where available, Illumina short reads were analysed directly and assembled using SPAdes v3.6.1, storing the assembly graphs for further analysis of genetic context. Where reads were unavailable (n=921), publicly available pre-assembled contigs were used. These had been generated using various strategies and assembly graphs were not available for inspection.
An isolate from our collection (strain INF167, isolated from a patient at the Alfred Hospital, Melbourne, Australia in 2013) was subjected to further sequencing using a MinION Mk1B and R9 Mk1 flow cell (Oxford Nanopore Technologies). A 2D MinION library was generated from 1.5 µg purified genomic DNA using the Nanopore Sequencing Kit (SQK-NSK007). DNA was repaired (NEBNext FFPE RepairMix), prepared for ligation (NEBNextUltra II End-Repair/dA-tailing Module) and ligated with adapters (NEB Blunt/TA Ligase Master Mix). We sequenced the library for 48 hours, yielding 3862 reads (mean length 3049 bp, maximum 44026 bp) that were used to scaffold the SPAdes assembly graph using a novel hybrid assembly algorithm (http://github.com/rrwick/Unicycler). The resulting assembly included one circular plasmid, which was annotated using Prokka (35) and submitted to GenBank under accession TBA.

Multi-locus sequence typing (MLST) analysis
Genomes were assigned chromosomal sequence types by comparison to the Kp MLST scheme (36) in the Kp BIGSdb database (http://bigsdb.pasteur.fr/klebsiella/klebsiella.html) (24) using SRST2 to analyse reads (37) and BLAST+ to analyse assemblies. Alleles of genes belonging to the yersiniabactin (ybtS, ybtX, ybtQ, ybtP, ybtA, irp2 irp1, ybtU, ybtT, ybtE, fyuA) and colibactin (clbABCDEFGHIJKLMNOPQR) loci were determined by comparison to known alleles in the Kp BIGSdb database. Genomes were excluded from comparative analyses if at least one yersiniabactin allele could not be accurately determined due to data quality. Novel MLST schemes (38) were constructed for the yersiniabactin and colibactin loci, so that each observed combination of alleles was assigned a unique yersiniabactin sequence type (YbST, listed in Supplementary Phylogenetic analysis of the siderophore loci and K. pneumoniae chromosome. For each YbST, concatenated alignments of the corresponding allele sequences were produced using Muscle v3.8.31. Recombination events were identified and removed from the alignment using Gubbins v2.0.0 (39) and visualised using Phandango (https://github.com/jameshadfield/phandango/). Maximum likelihood (ML) trees were inferred from the post-Gubbins alignment by running RAxML v7.7.2 (40) five times with the generalised time-reversible (GTR) model and a Gamma distribution, selecting the final tree with the highest likelihood. The same approach was used to generate a colibactin ML tree.

Chromosomal insertion sites and ICE structures.
For each ybt-positive (ybt+) genome, the annotated assembly was manually inspected to determine which of the four tRNA-Asn sites was occupied by ICEKp. This was done with reference to the MGH78578 genome, which lacks any genomic islands at tRNA-Asn sites. The Artemis genome viewer was used to inspect the annotation of the region; BLAST+ was used for genome comparison; and when the region failed to assemble into a single contig, Bandage (41) was used to inspect the locus in the assembly graph where available. Once the insertion site was determined, the structure of the ICEKp was inferred by extracting the sequence between the flanking direct 17 bp repeats 'CCAGTCAGAGGAGCCAA', either directly from the contigs using Artemis or from the assembly graph using Bandage. Representative sequences for each ICEKp structure were annotated and deposited in GenBank (accessions TBA) and are included in the Kleborate repository (https://github.com/katholt/Kleborate).

Diversity of yersiniabactin genes in K. pneumoniae
A screen of 2499 genomes detected the ybt locus in 39.5% of Kp genomes, but only 2/146 K. variicola and 0/63 K. quasipneumoniae. Prevalence was 40.0% in CG258, 87.8% in the hypervirulent CG23, and 32.2% in the wider Kp population. Source information was available for 1341 human isolates and demonstrated a strong and statistically significant association between ybt and infection isolates ( Table 2), particularly those from invasive infections (OR=33.4 for liver abscess, OR=5.6 for blood isolates), and also in respiratory (OR=3.4), urinary tract (OR=3.2) and wound infections (OR=3.3).
Next we explored the diversity of the eleven ybt locus genes using phylogenetic and MLST analyses. YbSTs defined by unique combinations of ybt gene alleles were successfully assigned to 834 ybt+ isolates (Supplementary Table 1). A total of 329 distinct YbSTs were identified; Figure 1 shows their phylogenetic relationships (excluding a small number of recombination events, Fig. S1). The majority of YbSTs clustered into 17 lineages (referred to hereafter as ybt 1, ybt 2, etc) with 0.004 -0.457% nucleotide divergence and a mean of eight shared loci within lineages compared to 0.032 -1.127% nucleotide divergence and zero mean shared loci between lineages (Fig. S2).

ICEKp structures and insertion sites
With the exception of ybt 4 (see below), the ybt locus was predominantly located within an ICEKp structure that integrated into a chromosomal tRNA-Asn site. The four tRNA-Asn genes that serve as integration sites are located within a single chromosomal region, which is 16.4 kbp in size in strains that lack any MGE insertions at these sites (Fig. 2). Examples of ICEKp integration were observed at all four sites ( Fig. 1). Multiple ICEKp integration sites were observed for most ybt lineages ( Fig.  1); thus there is no evidence that distinct ICEKps preferentially integrate at specific sites. The frequencies of ICEKp integration differed substantially by site: 35.7%, 44.7%, 19.5% for sites 1, 3 and 4 respectively, and just one integration at site 2.
The boundaries of each ICEKp were identified by the 17 bp direct repeats formed upon integration. Each ICEKp structure includes (i) a P4-like integrase int at the left end; (ii) the 29 kbp ybt locus; (iii) a 14 kbp sequence encoding the and oriT transfer origin, virB-type 4 secretion system (T4SS) and mobBC proteins (responsible for mobilisation) (18); and (iv) a distinct cluster of genes at the right end that we used to classify the ICE into 14 distinct structures (see  Table  4); the main exception was ICEKp10, which carries a clb insertion and was associated with three ybt lineages (see below). BLAST searching NCBI for each ICEKp structure detected only four occurring outside Kp (ICEKp3 and ICEKp4 in E. coli, ICEKp5 in Enterobacter hormachei and ICEKp10 in Citrobacter koseri and Enterobacter aerogenes) and none outside Enterobacteriaceae.
A ~34 kbp Zn 2+ and Mn 2+ metabolism module (KpZM) was found upstream of six different ICEKp structures (most of ICEKp10, 11 and 12; and a small subset of ICEKp2, 4 and 5; see Fig. 3). The KpZM module encodes a P4-like integrase at the left end that shares 97.5% amino acid identity with that of ICEKp, and the same 17 bp direct repeat was found upstream of both integrases and downstream of ICEKp. It is therefore likely that the entire sequence between the outer-most direct repeats (grey bars in Fig. 3) -including the KpZM module, ybt locus and variable region -can be mobilised together as a single ICE, and we refer to these structures as e.g. ICEKp2-KpZM, to distinguish them from the forms that lack KpZM. Notably, the KpZM ICEs were clustered in the ybt sequence tree (ybt 1, ybt 12 -13 and ybt 15 -17; Fig. 1), suggesting that the KpZM was acquired in the ancestors of each of these three clusters, of which the latter two subsequently diversified into multiple ICEKp structures by acquiring distinct gene modules at their right ends.
Two ICEKp structures carried additional known Kp virulence factors. ICEKp1, which was first described in ST23 strain NTUH-K2044 (18,42), carried ybt 2 genes and was one of only two ICE structures to have an additional gene cluster inserted between the ybt and mobilisation genes (see Fig. 3). As previously reported, this ~18 kbp insertion is homologous to a region on Kp plasmid pLVPK encoding iro as well as rmpA (which upregulates capsule production and is associated with hypermucoid phenotype) and other virulence determinants (18). ICEKp10 is characterised by the presence of the ~51 kbp colibactin (clb) module at its right end and associated with three distinct ybt lineages (1, 12 and 17; see Fig. 1 and further details in the colibactin section below). The ICEKp10 structure corresponds to the genomic island described in ST23 strain 1084 as GM1-GM3 of genomic island KPHPI208 and in ST66 strain Kp52.145 as an ICE-Kp1-like region (19,20).

Plasmid-encoded yersiniabactin
No chromosomal insertion site could be identified in genomes carrying ybt 4. Inspection of the de novo assemblies of these genomes revealed that in all cases, contigs containing the ybt locus also harboured common Kp plasmid replicon sequences including FIB K repA, FII K repB and/or FIA repE (plasmid replication) and sopAB (plasmid partitioning) genes. It was not possible to resolve complete circular plasmid sequences from the short read assemblies, however inspection of the assembly graphs showed that the ybt 4-encoding contigs were disconnected from the chromosomal contigs, consistent with a plasmid location. To confirm this, we subjected one of the isolates (ST2370 strain INF167) to long-read sequencing using a MinION (Oxford Nanopore) device and resolved the complete sequence for a 165 kbp FIB K circular plasmid carrying ybt 4. Annotation of the complete ybt+ plasmid and the ybt+ contigs from the remaining isolates did not reveal any other genes with identifiable virulence or antimicrobial resistance (AMR)-related functions.
These results indicate that ybt 4 is typically plasmid-encoded in Kp, providing an alternative transfer mechanism between different Kp hosts. The ybt 4 sequences were distinct from those of other ybt lineages found in Kp (>0.28% nucleotide divergence; maximum 1 shared allele) (Fig. 1) and shared closer sequence identity with ybt genes found in Yersinia species (0.01% nucleotide divergence). The ICEKp integrase and mobilisation genes were absent from the ybt+ plasmids and additional complete plasmid sequences will be required to resolve the mechanisms by which

Colibactin diversity
Three distinct ybt lineages (1, 12 and 17) were associated with the clb-positive ICEKp10 (Fig. 1), which was detected in 40% of ST258, 77% of ST23 and 4.0% of other Kp genomes including 25 other STs. Notably, all but three of the ICEKp10 strains carried the KpZM module at the left end, suggesting that the clb locus is usually mobilized within the larger ICEKp10-KpZM. Sixty-five CbSTs were identified, similar to the number of YbSTs (n=86) detected in ICEKp10 (Supplementary Table 3). The clbJ and clbK genes were excluded from this analysis due to a common 4173 bp deletion, which results in a new open reading frame fusing the 5' end of clbJ with the 3' end of clbK (Fig. 4A, Fig. S3). Phylogenetic analysis of the clb locus revealed three lineages that were each associated with a different ybt lineage: clb 1 (ybt 12), clb 2A (ybt 1) and clb 2B (ybt 17) (Fig. 4B). The only exceptions were three isolates with clb 2B that had rare YbSTs not assigned to any lineage: ST258 strain UCI91 and ST48 strains WGLW1 and WGLW3. Two ybt-clb+ isolates were observed (both ST23). The corresponding clb loci clustered with those from the other ST23 ICEKp10 ybt+clb+ isolates and shared the same ICEKp10 integration site, suggesting a shared ancestral integration event in ST23 followed by subsequent loss of ybt. The clbJ/clbK deletion was detected sporadically in all clb lineages, suggesting it has arisen on multiple independent occasions and thus may be under positive selection (Fig. 4B).
ClbJ and clbK encode multi-domain proteins of 2166 and 2154 amino acids, respectively, whose functions are not yet characterised (Fig. S3). The deletion appears to be mediated by recombination between two copies of a 1480 bp stretch of homologous sequence that occurs with ~95% identity within the clbJ and clbK genes, which encodes an amino acid adenylation domain (A-domain) that is frequently a component of multi-domain non-ribosomal peptide synthetases. The fusion product created by the clbJ/clbK deletion is a 2440 amino acid protein (Fig. S3C) that could potentially be functional, however its effect on colibactin synthesis is not yet known.

Frequency of yersiniabactin acquisition in K. pneumoniae
We identified at least 206 unique combinations of ICEKp structure, insertion site and chromosomal ST, representing distinct ybt acquisition events. Twenty-six chromosomal STs showed evidence of multiple insertion sites and/or ICEKp structures, indicating multiple independent acquisitions of ICEKp within the evolutionary history of these clones (Fig. 5). Most unique acquisition events (65%) were identified in a single genome sequence. The greater the number of genomes observed per Kp ST, the greater the frequency of ybt carriage and unique ybt acquisitions per ST, suggesting that deeper sampling would continue to uncover further acquisitions (Fig. 5). Notably, of the 35 clonal groups that were represented by ≥10 genomes, 30 (86%) included at least one ybt acquisition (Fig. 5). Further, the five remaining clonal groups each consisted mostly of isolates from a localised hospital cluster (ST323, Melbourne; ST490, Oxford, ST512, Italy; ST681 Melbourne, ST874 Cambridge), so do not represent diverse sampling. Of the acquisition events that were detected in more than one genome, 68% (n=50/73) showed diversity in the YbST, consistent with clonal expansion of ybt-positive Kp strains and diversification of the ybt locus in situ. The greatest amount of YbST diversity within such groups was observed in hypervirulent clones ST23 (18 YbSTs of ICEKp10/ybt 1 in site 1), ST86 (12 YbSTs of ICEKp3 in site 3) and ST67 K. rhinoscleromatis (5 YbSTs in site 1); followed by hospital outbreak-associated MDR clones ST15 (six YbSTs of ICEKp4 in site 1 and five in site 3), ST45 (five YbSTs of ICEKp4), ST101 (five YbSTs of ICEKp3 in site 3) and ST258 (detailed below). This level of diversity suggests longterm maintenance of the ICEKp in the genome, allowing time for diversification of the ybt genes.
Given the clinical significance of the carbapenemase-associated CG258, we explored ICEKp acquisition in these genomes in greater detail. Ybt was detected in 269 isolates (40%) from 17 countries; 218 isolates also carried clb (nearly all from USA; see Supplementary Table 1). Fifty-eight YbSTs were identified amongst CG258 isolates and clustered into seven ybt lineages associated with six ICEKp structures. Comparison of ybt lineage, ICEKp structure and insertion site with a recombination-filtered core genome phylogeny for CG258 indicated dozens of independent acquisitions of ICEKp sequence variants in this clonal complex (Fig. 6). Near-identical clb 2B (ICEKp10/ybt 17) sequences were identified in 211 ST258, mostly at tRNA-Asn site 3, isolated from the USA during 2003-2014. Most of these isolates carried the clbJ/clbK deletion (n=175, 83%), and also transposase insertions within other clb genes (n=173) that may prevent colibactin production (Fig. 6). A total of 27 ybt+clb+ ST258 isolates had an apparently intact clb locus; two were isolated in Colombia in 2009 and the rest from USA during 2004-2010 (Supplementary Table 1), including KPNIH33 (43).

DISCUSSION
The yersiniabactin synthesis locus ybt was detected in over a third of all human Kp isolates, which is highly concerning given its role in virulence models (15,16,44). Our data strengthens the previously reported evidence that ybt is significantly associated with invasive infections in humans (7), such as liver abscess (OR=33.4, p < 2x10 -16 ) and bacteraemia (OR 5.6, p < 4x10- 15 ). The detection of significant associations with respiratory tract, urinary tract and wound infections (ORs 3.2-3.4, Table 2) indicate that even these classically opportunistic infections are more likely to occur if ybt is present.
While ybt was first identified in Yersinia spp. (45), the frequency and extensive sequence diversity of ybt and corresponding ICEKp structures in the Kp population (Figs. 1, 3) reveals the locus is a long-standing and well-adapted component of the Kp accessory genome. The sheer number of distinct ICEKp insertion events detected (n=214) is remarkable, and reveals that the high frequency of ybt in Kp is the result of highly dynamic processes. The benefits of gaining ybt are clear, as the ability to scavenge iron is essential to survival in iron-depleted conditions which are commonly encountered in a wide range of environmental and hostassociated niches (12). However it appears that loss of the locus is also common, which could be due to the high-energy costs associated with synthesising the polyketide siderophore.
The identification of FIB K plasmid-borne ybt constitutes an entirely novel mechanism for ybt transfer in Kp. The FIB K plasmid replicon is very common (found in over half of all the Kp genomes we surveyed), and seems to be highly stable in Kp (46), suggesting these plasmids have the potential to rapidly transmit ybt within the Kp population. Indeed, the detection of ybt plasmids in 15 otherwise geneticallyunrelated Kp isolates from a single hospital, as well as unrelated isolates from three other countries, shows it is a significant mechanism for ybt dissemination in Kp. Worryingly, FIB K plasmids frequently carry AMR genes or virulence genes in Kp (47), suggesting there may be few barriers to convergence of AMR and virulence genes in a single FIB K plasmid replicon, which could potentially pose a substantial public health threat and deserves careful monitoring.
The functional relevance of the genetic variation in the ybt and clb loci, as well as the cargo genes in the variable regions of the ICEKp, remains to be explored. Experimental studies demonstrating the contribution of yersiniabactin and/or colibactin to virulence have been conducted with a limited number of strains (14,15,(18)(19)(20), which we found to harbour either ICEKp1 or ICEKp10 and one of three ybt lineages (ybt 1, 2 or 12): NTUH-K2044 (ST23, ICEKp1/ybt 2), ATCC 43816/KPPR1 (ST493, ICEKp1/ybt 2), B5055/CIP 52.145 (ST66, ICEKp10/ybt 12/clb 1) and 1084 (ST23, ICEKp10/ybt 1/clb 2A). A systematic comparison of different ybt lineages, particularly the plasmid-borne lineage, in the same in vivo model might identify important differences in virulence potential. The genome collection analysed here is a convenience sample of available genome data that was originally generated for a variety of different purposes, however future genomic epidemiology studies of prospectively collected isolate collections may reveal associations between distinct ybt or ICEKp variants (identified using the tools developed here) and the extent of clinical risk.
Colibactin is known to be common in the hypervirulent liver abscess clone ST23 (5). The genotoxic property of colibactin has been experimentally demonstrated in strain 1084 (ST23, ICEKp10/ybt 1/clb 2A) and may be associated with colorectal cancer (19,23,48). The presence of clb in 31 other Kp lineages, particularly the hospital-associated clone ST258, is therefore concerning and warrants further investigation. It remains to be determined whether colibactin is effectively synthesised by these strains, particularly those carrying the common clbJ/K deletion. Notably, most ST258 clb+ strains also carried transposase disruptions in the clbB, clbH and/or clbO genes, which likely interrupt colibactin synthesis and may represent selection against its production, which presumably carries a high metabolic cost to the host bacterium.
The extensive diversity uncovered amongst ybt and clb sequences and ICEKp structures in this study provides several epidemiological markers with which to track their movements in the Kp population through analysis of whole genome sequence data, which is increasingly being generated for infection control and AMR surveillance purposes (49,50). The work presented here provides a clear framework for straightforward detection, typing and interpretation of ybt and clb sequences via the YbST and CbST schemes (Figs. 1, 3), which are publicly available and can be easily interrogated using our Kleborate package (https://github.com/katholt/Kleborate), the BIGSdb-Kp resource, or common tools such as BLAST or SRST2. Application of these tools in genomic surveillance will provide much-needed insights into the emergence and spread of pathogenic Kp lineages, and the convergence of virulence and AMR in this troublesome pathogen.

ACKNOWLEDGMENTS
This work was funded by the NHMRC of Australia (project #1043822 and Fellowship #1061409 to K. E. H).      Supplementary Table 4 and GenBank deposited sequences for details of specific genes). (B) Disrupted ICEKp loci. (C) Gene structures for core modules, which are shown in A-B as coloured blocks: yersiniabactin synthesis locus ybt (dark grey, labelled with the most commonly associated ybt lineage if one exists), mobilisation module (blue) and Zn 2+ /Mn 2+ module (purple = usually present, light purple = rarely present). In panels A-B, the variable gene content unique to each ICEKp structure, which is typically separated from the mobilisation module by an antirestriction protein (light grey arrow), is shown in a unique colour as per Figure 1. Grey rectangles represent direct repeats; black rectangles, P4-like integrase genes.   Recombination events were predicted by Gubbins and are shown as coloured blocks (visualised using Phandango). Coordinates along the ybt locus and gene boundaries are indicated on the x-axis. Each row in the plotting area represents a YbST. Phylogenetic relationships between the YbSTs are shown in the tree to the left, which is a midpoint-rooted, recombination-free YbST phylogeny reproduced from Figure 1.
Colours and numbers on the tree indicate ybt lineages as detailed in the text and Figure 1. Figure S2. Minimum spanning tree of YbSTs, visualised using PhyloViz. Each node represents a YbST, connections between the nodes indicate allele sharing between YbSTs; nodes are coloured by ybt lineages (as defined in Figure 1; black indicates no lineage assigned) and labelled with ICEKp structures (as defined in Figure 2; the clb+ ICEKp10 structure, boxed, is associated with three ybt lineages).