Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations

Mobile genetic elements (MGEs) that frequently transfer within and between bacterial species play a critical role in bacterial evolution, and often carry key accessory genes that associate with a bacteria’s ability to cause disease. MGEs carrying antimicrobial resistance (AMR) and/or virulence determinants are common in the opportunistic pathogen Klebsiella pneumoniae, which is a leading cause of highly drug-resistant infections in hospitals. Well-characterised virulence determinants in K. pneumoniae include the polyketide synthesis loci ybt and clb (also known as pks), encoding the iron-scavenging siderophore yersiniabactin and genotoxin colibactin, respectively. These loci are located within an MGE called ICEKp, which is the most common virulence-associated MGE of K. pneumoniae, providing a mechanism for these virulence factors to spread within the population. Here we apply population genomics to investigate the prevalence, evolution and mobility of ybt and clb in K. pneumoniae populations through comparative analysis of 2498 whole-genome sequences. The ybt locus was detected in 40 % of K. pneumoniae genomes, particularly amongst those associated with invasive infections. We identified 17 distinct ybt lineages and 3 clb lineages, each associated with one of 14 different structural variants of ICEKp. Comparison with the wider population of the family Enterobacteriaceae revealed occasional ICEKp acquisition by other members. The clb locus was present in 14 % of all K. pneumoniae and 38.4 % of ybt+ genomes. Hundreds of independent ICEKp integration events were detected affecting hundreds of phylogenetically distinct K. pneumoniae lineages, including at least 19 in the globally-disseminated carbapenem-resistant clone CG258. A novel plasmid-encoded form of ybt was also identified, representing a new mechanism for ybt dispersal in K. pneumoniae populations. These data indicate that MGEs carrying ybt and clb circulate freely in the K. pneumoniae population, including among multidrug-resistant strains, and should be considered a target for genomic surveillance along with AMR determinants.


INTRODUCTION
Mobile genetic elements (MGEs) including plasmids, transposons and integrative conjugative elements (ICEs) can generate significant genotypic and phenotypic variation within bacterial populations, driving the emergence of niche-or host-adapted lineages or pathotypes [1,2]. Despite the risk that acquisition of virulence-associated MGEs can pose to pathogen emergence, few studies have explored the diversity, distribution and dynamics of such MGEs within their host bacterial populations.
ICEKp is an integrative conjugative element (ICE) that mobilises the ybt locus, which encodes biosynthesis of the siderophore yersiniabactin and its receptor [3]. Yersiniabactin and other siderophore systems are considered to be key bacterial virulence factors as they provide mechanisms for scavenging iron (an essential nutrient) from host transport proteins, thereby enhancing the ability of bacteria to survive and replicate within the host [4][5][6]. Nearly all Klebsiella pneumoniae produce the siderophore enterobactin, however its scavenging mechanisms are inhibited by human lipocalin-2 (Lcn2), which has a strong binding affinity for ferric and aferric enterobactin [7] and induces an inflammatory response upon binding [8]. Yersiniabactin escapes Lcn2 binding, thus avoiding the inflammatory response and enhancing bacterial growth and dissemination to the spleen, although it does not provide sufficient iron to allow growth in human serum or urine [8][9][10][11]. Yersiniabactin can also bind other heavy metals besides iron; for example, yersiniabactin expressed by uropathogenic Escherichia coli has been shown to bind Cu 2+ , providing protection against copper toxicity and redox-based phagocyte defences [12]. Yersiniabactin is by far the most common K. pneumoniae high-virulence determinant, present in roughly a third of clinical isolates, and is significantly associated with strains isolated from bacteraemia and tissue-invasive infections such as liver abscess, compared with those from non-invasive infections or asymptomatic colonisation [3,13]. In contrast, the virulence plasmid pK2044 described in strain NTUH-K2044, which encodes hypermucoidy and the acquired siderophores salmochelin (which modifies enterobactin to escapes Lcn2 binding [14]) and aerobactin (which can scavenge iron from the host blood protein transferrin [15]), is present in less than 5 % of K. pneumoniae isolates sampled from infections [13].
The ybt locus was first described in the Yersinia high pathogenicity island (HPI), variants of which have since been reported in other species of the family Enterobacteriaceae [16], including K. pneumoniae, where ybt is located within ICEKp [3,17,18]. ICEKp is self-transmissible, involving excision (requiring the gene xis), formation of an extrachromosomal circular intermediate (requiring the integrase gene int and 17 bp direct repeats at both outer ends), mobilization to recipient cells (requiring virB1, mobB and oriT) and integration at attO sites present in four closely-located tRNA-Asn copies in the K. pneumoniae chromosome [3,19]. ICEKp sometimes carries additional virulence determinants, including iro (encoding salmochelin synthesis) or clb (also known as pks, encoding synthesis of the genotoxic polyketide colibactin [3,17,20], which can induce doublestrand DNA breaks in eukaryotic cells [21]). Therefore, ICEKp represents a prominent virulence element that strongly influences the pathogenicity of strains of K. pneumoniae. Although it is significantly associated with invasive infections, the diversity of ICEKp structures and their transmission dynamics within the host bacterial population have not yet been characterized. Here we address this important gap in K. pneumoniae virulence evolutionary dynamics by using comparative genomics. We investigate ybt phylogenetics, ICEKp structure, integration sites and chromosomal genotypes, and use these signals to track the movement of ICEKp in K. pneumoniae populations, and explore the relationship of this MGE with its bacterial host.

RESULTS
Diversity of the ybt locus in K. pneumoniae First we screened for ybt genes in 2498 genomes of members of the K. pneumoniae complex (data sources are listed in Table S1, available in the online version of this article), and found ybt in 39.5 % of 2289 K. pneumoniae, but only 2 out of 146 Klebsiella variicola and none out of 63 Klebsiella quasipneumoniae. Prevalence was 40.0 % in the carbapenemaseassociated K. pneumoniae clonal group (CG) 258, 87.8 % in the hypervirulent K. pneumoniae CG23, and 32.2 % in the wider K. pneumoniae population. Consistent with previous reports [3,13], amongst human isolates with reliable clinical source information (Table S1), the presence of ybt was significantly associated with infection isolates [odds ratio (OR) =3.6, P<1Â10 À7 ], particularly those from invasive infections (OR=28.6 for liver abscess, OR=4.1 for blood isolates; see Table S2).
Each of the 11 ybt locus genes displayed substantial diversity within the K. pneumoniae population (see Supplementary Text; Fig. S1, Table S3). The majority of ybt loci grouped into 17 ybt phylogenetic lineages, with a mean nucleotide divergence of 0.034 % and mean 8 out of 11 shared loci within lineages, compared with a mean 0.478 % nucleotide divergence and mean 0 out of 11 shared loci between lineages. A total of 11 recombination events were identified in the ybt locus (Fig. S2). Nine appeared to involve import of divergent alleles from outside the K. pneumoniae population (1.5-8.37 % nucleotide divergence from ybt alleles detected in other K. pneumoniae) and two involved exchange of irp2 alleles within K. pneumoniae (0.40-0.41 % nucleotide divergence; see Fig. S2). We further explored the genetic diversity of ybt using phylogenetic and multi-locus sequence typing (MLST) analyses (see Methods). Ybt locus sequence types (YbSTs), defined by unique combinations of ybt gene alleles, were assigned to 842 ybt+ isolates (Table S4). A total of 329

IMPACT STATEMENT
Klebsiella pneumoniae infections are becoming increasingly difficult to treat with antibiotics. Some K. pneumoniae strains also carry extra genes that allow them to synthesise yersiniabactin, an iron-scavenging molecule, which enhances their ability to cause disease. These genes are located on a genetic element that can easily transfer between strains. Here, we screened 2498 K. pneumoniae genome sequences and found substantial diversity in the yersiniabactin genes and the associated genetic elements, including a novel mechanism of transfer, and detected hundreds of distinct yersiniabactin acquisition events between strains of K. pneumoniae. We show that these yersiniabactin mobile genetic elements are specifically adapted to the K. pneumoniae population but also occasionally acquired by other bacterial members of the family Enterobacteriaceae such as Escherichia coli. These insights into the movement and genetics of yersiniabactin genes allow tracking of the evolution and spread of yersiniabactin in global K. pneumoniae populations and monitoring for acquisition of yersiniabactin in antibiotic-resistant strains.
Analysis of protein coding sequences yielded dN/dS values below 0.6 for all genes (Table S3), consistent with moderate purifying selection. Nonsense or frameshift mutations were identified in ybt genes in 11 % of isolates carrying the ybt locus; these mostly affected irp2 (6.5 %) or irp1 (4 %) (see Table S3), which encode key structural components in the yersiniabactin biosynthesis (Fig. 2a) [22,23]. Inactivation of either of these genes prevents synthesis of yersiniabactin in Yersinia enterocolitica [24], and is predicted to have the same effect in K. pneumoniae. Most of these mutations (85 %) were only observed in a single isolate, indicating that they are not conserved and potentially arose during storage or culture. Consistent with this hypothesis, the presence of inactivating mutations in ybt genes was significantly associated with historical isolates (70 % amongst ybt+ isolates stored since at least 1960, 8 % amongst ybt+ isolates stored since 2000; OR 27 [95 % CI 11-74], P<10 À13 using Fisher's exact test). The notable exceptions were ST67 K. pneumoniae subspecies rhinoscleromatis genomes (conserved frameshift in irp2) and ST3 (conserved nonsense mutation in irp1), indicating negative selection in these lineages.
Diversity of ICEKp structures and integration sites in K. pneumoniae With the exception of ybt 4 (which we found to be plasmidborne and representing a novel mechanism of ybt transfer in K. pneumoniae; see Supplementary Text), all ybt loci detected in the K. pneumoniae genomes were located within an ICEKp structure integrated into one of four copies of tRNA-Asn located in a chromosomal region that spans 16.4 kbp in size in strains lacking MGE insertions at these sites ( Fig. S5). Examples of ICEKp integration were observed at all four tRNA-Asn sites ( Fig. 1), but the frequency of integration differed substantially by site: 35.7, 44.7 and 19.5 % for sites 1, 3 and 4 respectively, and just one integration at site 2. Multiple ICEKp integration sites were observed for most ybt lineages (Fig. 1); thus, there is no evidence that ICEKp variants target specific tRNA-Asn copies.
The boundaries of each ICEKp variant were identified by the 17 bp direct repeats formed upon integration [3], and their structures were compared. This confirmed that ICEKp structures identified in K. pneumoniae share several features: (i) a P4-like integrase gene, int, at the left end; (ii) the 29 kbp ybt locus; (iii) a 14 kbp sequence encoding the xis excisionase, virB-type 4 secretion system (T4SS), oriT transfer origin and mobBC proteins (responsible for mobilisation) [3]. In addition, we found that each ICEKp carried a distinct cluster of cargo genes at its right end, which we used to classify the ICEs into 14 distinct structures which we labelled as ICEKp2, ICEKp3, etc., preserving the original nomenclature of ICEKp1 [3,25] (see Fig. 2b; Table S6). We detected occasional additional gene content variation between ICEKp sequences, arising from transposases and other insertion or deletion events. Most of the 14 ICEKp structures were uniquely associated with a monophyletic group in the ybt nucleotide-based phylogeny (i.e. a single ybt lineage; see Figs 1 and 2; Table S6); the exception was the ICEKp10 structure, which carries the clb locus in the cargo region and was associated with ybt lineages 1, 12 and 17 (details below).
All ICEKp carried the integrase gene int; however we identified two variant forms of ICEKp lacking the mobilisation genes. First, K. pneumoniae subspecies rhinoscleromatis (ST67) genomes carried ybt 11 but lacked the entire mobilisation module (Fig. 2c); as noted above, they also carried nonsense mutations in irp2. Second, strain NCTC 11697 carried a highly divergent ybt locus (>2 % nucleotide , mobilisation module (blue) and Zn 2+ /Mn 2+ module (purple, usually present, light purple, rarely present). In a and b, the variable gene content unique to each ICEKp structure, which is typically separated from the mobilisation module by an antirestriction protein (light grey arrow), is shown in a unique colour as per Fig. 1. Grey rectangles represent direct repeats; black rectangles, P4-like integrase genes. Genes that make up the ybt locus are also shown and shaded according to their overall role in yersiniabactin synthesis (further details provided in Table S3). (b) Genetic structures of apparently intact ICEKp variants (see Table S6 and sequences deposited in GenBank for details of specific genes). (c) Disrupted ICEKp loci. divergence from all other ybt sequences, marked with ** in Fig. 1) and lacked the virB-T4SS and xis genes (Fig. 2c). An approximately 34 kbp Zn 2+ and Mn 2+ metabolism module (KpZM) was identified upstream of six different ICEKp structures (Fig. 2). This module includes an integrase at the left end that shares 97.5 % amino acid identity with that of ICEKp, and the same 17 bp direct repeat was found upstream of both integrases and downstream of ICEKp. It is therefore likely that the entire sequence between the outermost direct repeats (grey bars in Fig. 2b), including the KpZM module, ybt locus, mobility and cargo regions, can be mobilised together as a single MGE (see Supplementary Text).
ICEKp1 was the first yersiniabactin ICE reported in K. pneumoniae [3,25] and carries a 18 kbp insertion between the ybt and mobilisation genes, which our comparative analyses showed to be quite atypical (see Fig. 2b). As previously reported, the inserted sequence encodes iro and rmpA (which upregulates capsule production and is associated with hypermucoid phenotype) and is homologous to a region on the virulence plasmid pLVPK [3]. The only other ICEKp structure in which we identified known K. pneumoniae virulence determinants was ICEKp10, whose cargo region harbours the approximately 51 kbp colibactin (clb) locus. The ICEKp10 structure corresponds to the genomic island described in ST23 strain 1084 as GM1-GM3 of genomic island KPHPI208, and in ST66 strain Kp52.145 as an ICE-Kp1-like region [17,18]. We detected ICEKp10 in 40 % of ST258 (31 % of CG258), 77 % of ST23 (61 % of CG23) and 4.0 % of other K. pneumoniae genomes including 25 other STs (total 13.85 and 38.43 % of all, and ybt+, K. pneumoniae genomes respectively). Notably, all but three of the ICEKp10 strains carried the KpZM module at the left end, indicating that the clb locus (Table S7) is usually mobilized within a larger structure including KpZM. MLST analysis of clb genes identified 65 CbSTs (Table S8), similar to the number of YbSTs detected in ICEKp10 (n=86). Phylogenetic analysis of the clb locus (excluding clbJ and clbK (due to a 4153 bp deletion that commonly spans the two genes; see Transmission of ICEKp in the K. pneumoniae host population We identified n=206 unique combinations of ICEKp structure, chromosomal ST and integration site. These probably represent distinct ybt acquisition events, however it is possible that the ICEKp could migrate to another insertion site following initial integration; hence a more conservative estimate for distinct ybt acquisition events is the number of unique combinations of ICEKp structure and chromosomal ST, n=189 (ST phylogenetic relationships are given on Fig. S7). The most widely distributed variant was ICEKp4 (found in 37 chromosomal STs) followed by ICEKp10, ICEKp5 and ICEKp3 (n=24, 23 and 23 chromosomal STs, respectively). Conversely, ICEKp7, ICEKp8 and ICEKp13 were found in one K. pneumoniae host strain each (ST111, site 1, n=2; ST37, site 4, n=7; ST1393, site 1, n=1; respectively); ICEKp14 was found only in K. variicola (ST1986, site 4, n=1).
A total of 26 K. pneumoniae chromosomal STs showed evidence of multiple insertion sites and/or ICEKp structures, indicative of multiple integrations of ICEKp within the evolutionary history of these clones (Figs 4, S7 and Table S4). Most unique acquisition events defined by unique combinations of ICEKp structure and chromosomal STs (65 %) were identified in a single genome sequence. The frequency of ybt carriage and unique ybt acquisitions per ST was correlated with the number of genomes observed per ST (R 2 =0.71, P<1Â10 À8 for log-linear relationship; see Fig. 5(c)), indicating that the discovery of novel integrations within lineages is largely a function of sampling. This implies that ICEKp may frequently be gained and lost from all lineages and that deeper sampling would continue to uncover further acquisitions and losses. Notably, of the 35 clonal groups that were represented by at least 10 genomes, 30 (86 %) included at least one ICEKp acquisition (Figs 4 and S7). The five other common clonal groups each consisted mostly of isolates from a localised hospital cluster (ST323, Melbourne; ST490, Oxford; ST512, Italy; ST681, Melbourne; ST874, Cambridge); and we predict that more diverse sampling of these clonal groups would detect ICEKp acquisition events.
Of the unique ICEKp acquisition events that were detected in more than one genome, 68 % (n=50 out of 73) showed diversity in the YbST. This diversity was attributable to minor allelic changes (SNPs in a median of 2.5 loci), consistent with clonal expansion of ICEKp-positive K. pneumoniae strains and diversification of the ybt locus in situ. The greatest amount of YbST diversity within such groups was observed in hypervirulent clones ST23 (18 YbSTs of ICEKp10/ybt 1 in site 1), ST86 (12 YbSTs of ICEKp3 in site 3) and ST67 K. pneumoniae subspecies rhinoscleromatis (five YbSTs in site 1); followed by hospital-outbreak-associated MDR clones ST15 (six YbSTs of ICEKp4 in site 1 and five in site 3), ST45 (five YbSTs of ICEKp4), ST101 (five YbSTs of ICEKp3 in site 3) and ST258 (detailed below). This level of diversity is indicative of long-term maintenance of the ICEKp in these lineages, allowing time for the ybt genes to accumulate mutations.
Given the clinical significance of the carbapenemase-associated CG258 [26], we explored ICEKp acquisition in these genomes in greater detail. Ybt was detected in 269 CG258 isolates (40 %) from 17 countries; 218 isolates also carried clb (nearly all from the USA; see Table S4). A set of 58 YbSTs were identified amongst CG258 isolates and clustered into seven ybt lineages associated with six ICEKp structures. Comparison of ybt lineage, ICEKp structure and insertion site with a recombination-filtered core genome phylogeny for representative CG258 strains with yersiniabactin indicated dozens of independent acquisitions of ICEKp sequence variants in this clonal complex (Fig. 5). Near-identical clb 2B (ICEKp10/ybt 17) sequences were identified in 212 ST258 strains (40 %), mostly at tRNA-Asn site 3, isolated in the USA during 2003-2014. Most of these isolates carried the clbJ/clbK deletion (n=175, 83 %), and also transposase insertions within other clb genes (n=173, see Table S4) that may prevent colibactin production (Fig. 5). A total of 27 ybt+clb+ ST258 isolates (5 %) had an apparently intact clb locus; two were isolated in Colombia in 2009 and the rest from the USA during 2004-2010 (Table S4), including the previously reported KPNIH33 [27]. The results demonstrate the very high genetic and functional dynamics of ICEKp within a very recently emerged K. pneumoniae epidemic lineage, estimated to have emerged in the mid-1990s [28].
Relationship to ybt in other species Finally, we sought to understand the relationship between ICEKp, which we have shown to circulate within K. pneumoniae, and the ybt loci found in other bacterial species. BLAST searching NCBI GenBank identified the ybt locus in n=242 genomes from 11 species outside the K. pneumoniae complex (Table S9); all belonged to the family Enterobacteriaceae. The phylogenetic and structural relationships of these loci with those found in K. pneumoniae are shown in Fig. 6, which indicates that ybt sequences mobilised by ICEKp form a subclade that is strongly associated with the species K. pneumoniae. The vast majority of ICEKp sequences were found in K. pneumoniae (97 %); the exceptions were 15 E. coli, 6 Klebsiella aerogenes, 4 Citrobacter koseri, 2 K. variicola and 1 Enterobacter hormaechei [29]; see Fig. 6, Table S9). These include one novel ICEKp variant in E. coli strain C8 (accession CP010125.1), however this was also detected in a recently sequenced K. pneumoniae draft genome (accession GCF_002248635.1). ICEKp accounted for just 11 % of ybt sequences detected outside K. pneumoniae, and in the remaining 89 % it was not associated with ICEKp or any other identifiable conjugative machinery. Notably, ybt sequences from the Yersinia HPI of the Yersinia pseudotuberculosis/pestis complex clustered within the ICEKp clade of ybt sequences, and not with the ybt sequences from Yersinia enterocolitica (Fig. 6). This indicates that the HPI found in the Y. pseudotuberculosis complex is derived from ICEKp. Furthermore, the HPI shares the int (Fig. S8) and xis of ICEKp, facilitating integration and excision of the HPI, but appears to have lost the conjugative machinery that enables its spread between cells from distinct lineages of yersiniae, consistent with the results of previous investigations that showed that the HPI is unable to self-transmit [30,31]. Additionally, the int in the HPI of Y. enterocolitica and Y. pseudotuberculosis are unlikely to encode a functional integrase due to frameshift and nonsense mutations.
The most genetically distant ybt loci were found in the chromosomes of Klebsiella oxytoca and the related species Klebsiella michiganensis and Klebsiella grimontii (Fig. 6), with no identifiable mobility-associated genes in their proximity; we propose the hypothesis that the locus may have originated in the ancestor of these species before becoming mobilised. A related form, situated next to a tRNA-Asn but with no identifiable integrase gene, was also present in Klebsiella ornithinolytica (also known as Raoultella ornithinolytica) (Fig. 6). The ybt sequences in members of the genus Salmonella formed two related clades, one chromosomal and one plasmid-borne, but lacked proximal mobility-associated genes (Fig. 6); notably the plasmid-borne form of ybt in K. pneumoniae appears to be derived from that of ICEKp and not the plasmid-borne form from members of the genus Salmonella (Fig. 6). Finally, this analysis sheds some light on the origins of the atypical form of ybt in K. pneumoniae NCTC 11697, which includes the integrase but lacks the T4SS mobility region (Fig. 2), and whose position in the ybt tree indicates that it may be related to a progenitor element from which ICEKp evolved via acquisition of the excisionase and conjugative machinery.

DISCUSSION
This study provides key insights into the evolution, dynamics and structure of ICEKp, the most frequent MGE associated with hypervirulent/invasive K. pneumoniae infections.
The population structure of ICEKp comprises numerous sublineages that are each associated with unique complements of cargo genes in addition to the yersiniabactin synthesis locus ybt (Figs 1 and 2). With the exception of   Fig. 1). Columns on the right are binary indicators of the genetic context of the ybt locus in each case: (i) located on the chromosome, (ii) located at a tRNA-Asn site, (iii) presence of an integrase (grey indicates nonfunctional integrase caused by a frameshift or nonsense mutation), excisionase and/or virB-T4SS+mobBC portion of the mobility module. For ybt sequences in the clade associated with ICEKp, subclades corresponding to the ybt lineages (defined in Fig. 1) are shaded and additional indicator columns are included to show the corresponding ICEKp structure where present (defined in Fig. 2) coloured according to the inset legend (* indicates ICEKp structure not resolvable from available draft genome sequences, ** indicates novel ICE structure). The number of genomes of each species in which the defined ybt lineages was detected is printed on the right. inactivating mutations that appear to arise in culture and occasionally in hospital-associated lineages (discussed below), the ybt and clb loci appear to be under strong purifying selection in the K. pneumoniae population (low dN/ dS, see Tables S3 and S7) and all variants are predicted to synthesise the same yersiniabactin and colibactin polyketide molecules.
The data demonstrate that ICEKp circulates dynamically within the K. pneumoniae population. The sheer number of distinct ICEKp acquisitions detected in K. pneumoniae (n!189), and the scale of distinct acquisition events within individual K. pneumoniae clonal groups (Figs 4 and 5), indicates this MGE is highly transmissible within the host bacterial population. Genetic separation of the ICEKp form of ybt compared with those in other members of the family Enterobacteriaceae (Fig. 6), and lack of ICEKp outside K. pneumoniae, indicates that ICEKp may be specifically adapted to circulate in K. pneumoniae. We propose that this MGE has been a feature of the K. pneumoniae population that predates K. pneumoniae sublineage diversification, because (i) the ybt genes of ICEKp displayed a similar degree of nucleotide diversity as K. pneumoniae core chromosomal genes (mean 0.5 %) [13] and (ii) ICEKp was notably rare or absent from K. pneumoniae's closest relatives, K. variicola and K. quasipneumoniae.
The data indicate that the population prevalence of ICEKp within K. pneumoniae (around one third of the population) is sustained through highly dynamic horizontal gene transfer events rather than stable maintenance within K. pneumoniae lineages by vertical inheritance. The intermediate frequency of ICEKp in K. pneumoniae is indicative of the existence of some form of balancing selection for the encoded traits. This typically occurs when a trait is most beneficial only when it is not shared by the entire population (e.g. antigenic variation resulting in variable susceptibility to predators or host immunity [32]), or the trait has selective advantages only under certain conditions but high costs in others. Acquisition of ybt has benefits in certain iron-depleted conditions, which are presumably encountered in a wide range of environmental and host-associated niches [6]; siderophores including yersiniabactin also confer significant growth advantage in heavy metal contaminated soil irrespective of iron content [33]. Hence the dynamics of ICEKp/ybt may reflect the diverse lifestyles of K. pneumoniae, which can vary between hosts and in the environment. However loss of ybt also appears to be common, probably occurring due to the high-energy costs from synthesising the polyketide hybrid molecule. Inactivation of irp1 or irp2 in historical isolates (70 %) supports strong negative selection against yersiniabactin production in rich media. Notably, K. pneumoniae almost universally can synthesise enterobactin, hence the benefit of ybt depends on not only the availability of iron but also the form of iron, and other factors such as ability to compete with host iron-binding proteins and evasion of the mammalian immune system's targeting of enterobactin via Lcn2 [6,8,9]. ICEKp cargo genes probably contribute additional costs and benefits to host cells, modifying the fitness equation for their bacterial host; further work will be needed to explore the differential effects and functional relevance of these genes. It is notable that clb-carrying ICEKp10 was widespread (detected across 32 K. pneumoniae lineages) but the clb genes were frequently disrupted, indicating subjection to balancing selection. These disruptions were particularly high amongst the hospital-associated MDR clone ST258, and may indicate selection against costly colibactin production in hospitaladapted strains that already benefit from positive selection under antimicrobial exposure.
Concerningly, our data highlights the possibility that the rate of ICEKp transmission in the population may be sufficiently high that ybt is readily available to most K. pneumoniae lineages. Hence new clinically important high-pathogenicity lineages could theoretically arise at any time following introduction of ybt to a strain background that already has features favourable for transmission or pathogenicity in humans, including antimicrobial resistance (AMR). Indeed we found ICEKp to be frequent amongst many of the recognised MDR K. pneumoniae clones, such as CG258, indicating that the convergence of AMR and yersiniabactin production is happening frequently in K. pneumoniae, potentiating the emergence of lineages that pose substantially greater risk to human health than the broader K. pneumoniae population, which typically behaves as an opportunistic, mostly susceptible, pathogen. FIB K plasmid-borne ybt constitutes an entirely novel mechanism for ybt mobilisation in K. pneumoniae. The FIB K plasmid replicon is very common and highly stable in K. pneumoniae but not E. coli [34,35], indicating that these plasmids are adapted to K. pneumoniae hosts and have the potential to readily transmit ybt within the population. Worryingly, many FIB K plasmids have already acquired AMR transposons [36], indicating that there may be few barriers to convergence of AMR and virulence genes in a single FIB K plasmid replicon. Given its potential transmissibility and stability in K. pneumoniae hosts, this forms another substantial public health threat and warrants careful monitoring.
The extensive diversity uncovered amongst ybt and clb sequences and ICEKp structures in this study provides several epidemiological markers with which to track their movements in the K. pneumoniae population through analysis of whole-genome sequence data, which is increasingly being generated for infection control and AMR surveillance (b) Taxonomic relationships between Enterobacteriaceae species in which ybt was detected, in the form of a neighbour-joining tree of concatenated gyrA and rpoB sequences extracted from ybt-positive genomes.
purposes [37,38]. The work presented here provides a clear framework for straightforward detection, typing and interpretation of ybt and clb sequences via the YbST and CbST schemes (Figs 1 and 3), which are publicly available in the BIGSdb K. pneumoniae database and can be easily interrogated using the BIGSdb web application or using common tools such as BLAST (https://github.com/katholt/Kleborate) or SRST2 [39]. In doing so, detection of these key virulence loci provides much-needed insights into the emergence and spread of pathogenic K. pneumoniae lineages, which will be particularly important for tracking the convergence of virulence and AMR in this troublesome pathogen.
Of broader relevance, the data show that the deepest diversity of ybt sequences is present in the members of the genus Klebsiella and that MGE-borne ybt emerged within K. pneumoniae before spreading to other Enterobacteriaceae (Fig. 6).
In particular the HPI of Y. pestis and Y. pseudotuberculosis, where yersiniabactin was first identified and from which it draws its name, is derived from the ICEKp of K. pneumoniae; hence the name klebsibactin may have been more appropriate. This adds to the growing body of evidence that members of the genus Klebsiella act as a reservoir of AMR and pathogenicity genes for other members of the family Enterobacteriaceae; KPC and NDM-1 being recent examples of AMR genes first identified in members of the genus Klebsiella that have rapidly become widespread [40,41]. We suggest that this unique role of Klebsiella is linked to its more generalist lifestyle, which offers more opportunities to sample accessory genes from a wide array of gene pools. In support of this, members of the genus Klebsiella exhibit extreme differences in gene content within and between species, their accessory genes display a wide range of DNA G+C contents and taxonomic sources [13], and strains from environmental niches, such as K. oxytoca, can have very large genomes that exceed 6 Mbp. The present work further emphasises the clinical importance of the unique position that members of the genus Klebsiella occupy in the broader microbial sphere as a source of important pathogenicity as well as AMR genes for other members of the family Enterobacteriaceae, and should be a motivating factor for further exploration of the ecological and evolutionary mechanisms behind this phenomenon.

Bacterial genome sequences
We analysed a total of 2498 K. pneumoniae genomes (2284 K. pneumoniae sensu stricto, 63 K. quasipneumoniae, 146 K. variicola, 5 undefined or hybrid [13]) obtained from various sources representing a diverse geographical and clonal distribution (Table S1; see Table S4 for full list of isolates and their properties). Just under a third of these genomes had been collected and sequenced in-house during four previous studies of human hospital isolates [13,[42][43][44]. These isolates from genotypically and geographically diverse backgrounds, which had clinical source information and were not associated with outbreaks, were used to estimate the distribution of the yersiniabactin locus amongst human isolates associated with the different types of infections listed in Table S2.
Where available, Illumina short reads were analysed directly and assembled using SPAdes v3.6.1, storing the assembly graphs for further analysis of genetic context. Where reads were unavailable (n=921), publicly available pre-assembled contigs were used. These had been generated using various strategies and assembly graphs were not available for inspection.
One isolate from our collection (strain INF167, isolated from a patient at the Alfred Hospital, Melbourne, Australia in 2013) was subjected to further sequencing using a Min-ION Mk1B and R9 Mk1 flow cell (Oxford Nanopore Technologies). A 2D MinION library was generated from 1.5 µg purified genomic DNA using the Nanopore Sequencing Kit (SQK-NSK007). DNA was repaired (NEBNext FFPE RepairMix), prepared for ligation (NEBNextUltra II End-Repair/dA-tailing Module) and ligated with adapters (NEB Blunt/TA Ligase Master Mix). We sequenced the library for 48 h, obtaining 3862 reads (mean length 3049 bp, maximum 44 026 bp) that were used to scaffold the SPAdes assembly graph using a novel hybrid assembly algorithm (http:// github.com/rrwick/Unicycler). The resulting assembly included one circular plasmid, which was annotated using Prokka [45] and submitted to GenBank under the accession number KY454639.
In order to produce novel MLST schemes [48] for the yersiniabactin and colibactin loci, sequences of the alleles for genes belonging to the yersiniabactin (ybtS, ybtX, ybtQ, ybtP, ybtA, irp2, irp1, ybtU, ybtT, ybtE, fyuA) and colibactin (clbABCDEFGHIJKLMNOPQR) synthesis loci were extracted from the K. pneumoniae genome, by comparison to known alleles in the K. pneumoniae BIGSdb database. To maximise resolution for the novel virulence locus MLST schemes, we included in the definition of sequence types alleles for all 11 genes of the ybt locus and 16 out of 18 genes of the clb locus (clbJ and clbK were excluded as they are subject to a common deletion as described in the Results). Each observed combination of alleles was assigned a unique yersiniabactin sequence type (YbST, listed in Table S5) or colibactin sequence type (CbST, listed in Table S8). The schemes and allele sequences are available from the BIGSdb-K. pneumoniae website and in the Kleborate repository (https://github.com/katholt/Kleborate), which includes a command-line tool for genotyping new genomes. All genomes with detectable ybt or clb sequences were included in the definition of YbSTs or CbSTs, with the exception of 61 genomes for which data quality was too low for accurate calling of all alleles (criteria: read depth <20Â;<90 % agreement of alleles at the read level; and/or incomplete assembly of the ybt or clb region, which usually was associated with low read depth and generally poor assembly quality with N50 <100 000 bp).

Phylogenetic analyses
For each YbST, alignments of the concatenated corresponding allele sequences were produced using Muscle v3.8.31. Recombination events were identified using Gubbins v2.0.0 [49], which screens for regions with a high density of single nuclear polymorphisms (SNPs) that are likely to represent an imported sequence variant. The initial alignment was 28 214 bp long with 2234 variant sites, of which 232 were identified as recombinant and masked from the alignment (regions shown in Fig. S2, visualised using Phandango (https://github.com/jameshadfield/phandango/). Maximum likelihood (ML) trees were inferred from the recombination-masked alignment by running RAxML v7.7.2 [50] five times with the generalised time-reversible (GTR) model and a Gamma distribution, selecting the final tree with the highest likelihood. Lineages were defined as monophyletic groups of YbSTs whose members shared features within the group (at least six shared YbST loci, same ICEKp structures) but were distinguished from other groups (zero or one shared YbST loci, different ICEKp structures). The same approach was used to generate a colibactin ML tree. ML phylogenies for the ybt loci and concatenated rpoB and gyrA sequences from representative K. pneumoniae and other Enterobacteriaceae bacteria were also generated by running RAxML v7.7.2 [50]. Nodes with lower than a bootstrap value of 75 in the Enterobacteriaceae ybt phylogeny were collapsed to polytomies with TreeCollapserCL 4 v3.2 (http://emmahodcroft.com/TreeCollapseCL.html).
Core genome SNP trees for K. pneumoniae (using one representative genome per each unique ST) and for CG258 (which includes ST258, ST11, ST340 and ST512; using a selection of representative CG258 isolates, these were subsampled from a maximum-likelihood phylogeny encompassing all CG258 isolates to remove near-identical isolates with the same year and country of isolation that were probably sequenced from an outbreak) were inferred using the mapping pipeline RedDog v1b5 (https://github.com/ katholt/reddog) to (i) map short reads against K. pneumoniae ST23 strain NTUH-K2044 [25] and ST258 strain NJST258-1 [51], respectively, using Bowtie 2 v2.2.3, and (ii) identify core gene SNPs using SAMtools v1.1. The resulting SNP alignments were subjected to analysis with Gubbins v2.0.0 (to filter recombinant sites), and RAxML v7.7.2 to infer clonal phylogenies.
Translation of the nucleotide sequence into amino acid sequence, inspection of non-synonymous, frameshift and nonsense mutations, and calculations for dN/dS ratios and significance testing for conservation or positive selection were conducted using MEGA 6.06 [52].

Chromosomal insertion sites and ICE structures
For each ybt-positive (ybt+) genome, the annotated assembly was manually inspected to determine which of the four tRNA-Asn sites was occupied by ICEKp. This was done with reference to the MGH78578 genome, which lacks any genomic islands at tRNA-Asn sites. The Artemis genome viewer was used to inspect the annotation of the region; BLAST+ was used for genome comparison; and when the region failed to assemble into a single contig, Bandage [53] was used to inspect the locus in the assembly graph where available. Once the insertion site was determined, the structure of the ICEKp was inferred by extracting the sequence between the flanking direct 17 bp repeats 'CCAGTCAGAG-GAGCCAA', either directly from the contigs using Artemis or from the assembly graph using Bandage. Representative sequences for each ICEKp structure (unless derived from previously assembled genomes) were annotated and deposited in GenBank (accession numbers KY454627-KY454638).

Data availability
Annotated ICE and plasmid sequences generated in this study are available in NCBI Genbank under the accession numbers specified in the text. Yersiniabactin and colibactin MLST schemes are available in the K. pneumoniae BIGSdb database at http://bigsdb.pasteur.fr/klebsiella/klebsiella. html. All whole-genome sequences analysed in this study are freely available in NCBI, accession numbers are given in Table S4.

Conflicts of interest
The authors declare that they have no conflicts of interest.