Worldwide transmission of ST11-KL64 carbapenem-resistant Klebsiella pneumoniae: an analysis of publicly available genomes

ABSTRACT ST11-KL64 is an internationally distributed lineage of carbapenem-resistant Klebsiella pneumoniae and is the most common type in China. The international and interprovincial (in China) transmission of ST11-KL64 CRKP remains to be elucidated. We used both static clusters defined based on a fixed cutoff of ≤21 pairwise single-nucleotide polymorphisms and dynamic groups defined by modeling the likelihood to be linked by a transmission threshold to investigate the transmission of ST11-KL64 strains based on genome sequences mining. We analyzed all publicly available genomes (n = 730) of ST11-KL64 strains, almost all of which had known carbapenemase genes with KPC-2 being dominant. We identified 4 clusters of international transmission and 14 clusters of interprovincial transmission across China of ST11-KL64 strains. We found that dynamic grouping could provide further resolution for determining clonal relatedness in addition to the widely adopted static clustering and therefore increases the confidence for inferring transmission. IMPORTANCE Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a serious challenge for clinical management and is prone to spread in and between healthcare settings. ST11-KL64 is the dominant CRKP type in China with a worldwide distribution. Here, we used two different methods, the widely used clustering based on a fixed single-nucleotide polymorphism (SNP) cutoff and the recently developed grouping by modeling transmission likelihood, to mine all 730 publicly available ST11-KL64 genomes. We identified international transmission of several strains and interprovincial transmission in China of a few, which warrants further investigations to uncover the mechanisms for their spread. We found that static clustering based on ≤21 fixed SNPs is sensitive to detect transmission and dynamic grouping has higher resolutions to provide complementary information. We suggest the use of the two methods in combination for analyzing transmission of bacterial strains. Our findings highlight the need of coordinated actions at both international and interprovincial levels for tackling multi-drug resistant organisms.

IMPORTANCE Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a serious challenge for clinical management and is prone to spread in and between healthcare settings. ST11-KL64 is the dominant CRKP type in China with a worldwide distribution. Here, we used two different methods, the widely used clustering based on a fixed single-nucleo tide polymorphism (SNP) cutoff and the recently developed grouping by modeling transmission likelihood, to mine all 730 publicly available ST11-KL64 genomes. We identified international transmission of several strains and interprovincial transmission in China of a few, which warrants further investigations to uncover the mechanisms for their spread. We found that static clustering based on ≤21 fixed SNPs is sensitive to detect transmission and dynamic grouping has higher resolutions to provide comple mentary information. We suggest the use of the two methods in combination for analyzing transmission of bacterial strains. Our findings highlight the need of coordi nated actions at both international and interprovincial levels for tackling multi-drug resistant organisms.
KEYWORDS Klebsiella pneumoniae, carbapenem resistance, transmission clusters C arbapenem-resistant Klebsiella pneumoniae (CRKP) is a severe threat for human health globally (1). Currently, available data indicate that the international or regional spread of CRKP is largely driven by certain sequence types (STs) such as ST11, ST15, ST37, ST258, and ST307 (2)(3)(4). Among these types, ST11 CRKP is mainly seen in Asia and South America (3,5) and could be assigned to several capsular (KL) types. In particular, ST11-KL64 is the dominant CRKP type at present in China (6) and also appears to be commonly seen in Brazil (7). However, the transmission of ST11-KL64 CRKP in the world and across provinces in China remains to be elucidated. Recently, the national transmission of CRKP in the USA has been addressed using both static clusters defined based on a fixed cutoff of ≤21 pairwise single-nucleotide polymorphisms (SNPs) and dynamic clusters defined by modeling the likelihood to be linked by a transmis sion threshold (8). We therefore employ the described methodology to investigate the transmission of ST11-KL64 CRKP including the international transmission and the interprovincial transmission in China.
Thirty-two static clusters and 59 dynamic groups were identified from 730 ST11-KL64 genomes As of 1 June 2022, there were 13,625 K. pneumoniae genome assemblies in NCBI. By quality control, 1,039 genomes were excluded due to duplicated biosample (n = 103), low-quality assembly as defined by NCBI (e.g., excessive frameshifted proteins and fragmented assembly; n = 674), < 95% completeness (n = 51), > 5% contamination (n = 25), > 50% heterogeneity (n = 162), or genomes belonging to species other than K. pneumoniae (n = 24). The remaining 12,586 genomes were included in further study, comprising 730 ST11-KL64 ones with STs and capsular types being determined using Kleborate v2.2.0 (9). Almost all (n = 704, 96.4%) of the 730 ST11-KL64 strains had genes encoding known carbapenemases, among which KPC-2 was the most common (94.0%, 686/730). Information about the geographical location was available for 713 out of the 730 genomes with most (n = 659; 659/713, 92.4%) seen in China, 50 (7.0%) in Brazil and 1 in each of Canada, Japan, Spain, and Switzerland (Dataset S1). These 730 genome assemblies were subjected to removing recombination regions using Gubbins v3.1.6 (10) and thereafter to calling core SNPs with Snippy (https://github.com/tseemann/snippy) using strain 090357 (accession no. CP066523) as the reference. The phylogeny was built from core SNPs under GTR model with site rate variation and a 100-bootstrap test. A maximum likelihood phylogenomic tree ( Fig. 1 for simplified view and Fig. S1 for more details) was therefore inferred with RAxML v8.2.4 (11) using 100 bootstrapping under the GTR model. Static clusters were conducted by R program TransCluster based on a fixed cutoff of ≤21 pairwise SNPs (12). We used the term of dynamic groups instead of dynamic clusters to avoid confusion with static clusters. Dynamic groups were assigned based on modeling the likelihood that isolates were linked by a transmission threshold (T), which was calculated by R program TransCluster (12) with combining the rate of SNP accumulation (λ), the collection time of ST11-KL64 strains and an estimated transmission rate (β). A λ value of 10.1 substitutions/genome/year was calculated as previously using paired longitudinal samples of KPC-producing CRKP (13). A β value of 5.8 was estimated previously from epidemiologic investigation of nosocomial K. pneumoniae (14). T value of 5 was selected according to the previous study of the CRKP transmission in U.S. hospitals (8).
Overall, 32 static clusters and 59 dynamic groups were identified and incorporated 650 (89.0%) and 539 (73.8%) strains, respectively. Among the 32 static clusters, cluster 1 was the largest comprising 515 strains (70.5%) and could be further assigned to 32 dynamic groups (Table S1, listing all static clusters and dynamic groups). Strains of nine static clusters could not be assigned to a dynamic group. The remaining 22 static clusters matched one or two dynamic groups (Table S1). On the other hand, each individual group of the 59 dynamic ones matched a single static cluster only (Table S1). Therefore, dynamic groups appeared to generate complementary, yet overlapping, information for static clusters and provide further resolution to infer transmission of CRKP strains.
We evaluated the impact of applying different SNP cutoffs up to 25 on the number of static clusters and the matching with dynamic groups. When 21 SNPs were applied, 89.0% of all strains could be assigned to a static cluster (Table S2) and all but one dynamic group could match a single cluster (Fig. S2). In contrast, when 15 or 16 SNPs were used to define static clusters, all dynamic groups could match a single static cluster ( Fig. S2) but the proportion of strains able to be assigned to a cluster dropped to 84.2% or 85.2% (Table S2). We also evaluated the impact of adjusting the transmission threshold (T) and β parameter on dynamic grouping as described previously (7). When a T threshold was selected from a wide range (1, 20) with a fixed β value of 5.8 (14) , the number of international groups was 2 (T = 20) to 6 (T = 4, 5, or 6) and that of interpro vince groups was 6 (T = 13) to 24 (T = 3) (Fig. S3). Conversely, when a β parameter was selected from a wide range (1, 20) with a fixed T value of 5 (8), the number of interna tional groups was 2 (β = 1) to 7 (β = 7) and that of interprovince groups was 7 (β = 1 or 3) to 28 (β = 9) (Fig. S3). The above analysis highlights that the choose of parameters (the SNP cutoff, the T threshold, and the β parameter) will generate varied numbers of clusters and groups. Nevertheless, although the number of groups and clusters varied with the use of different parameters, both international and interprovince transmission were identified. We therefore performed further analyses based on static clusters defined with a fixed cutoff of ≤21 pairwise SNPs, which was applied previously (12), and dynamic groups generated with a λ value of 10.1 substitutions/genome/year, a β value of 5.8, and a T value of 5 as described previously (8). Observation mSphere

Four clusters associated with international transmission
Next, we aligned the geographic information with the clusters and groups to identify transmission. Four static clusters, clusters 1, 2, 5, and 23, comprised strains from more than one country (

Fourteen static clusters associated with interprovincial transmission in China
Genome sequences of ST11-KL64 K. pneumoniae strains from 16 out of the 34 provin cial regions of China were available in GenBank (Dataset S1). Fourteen static clusters comprised strains from two or more provinces of China, suggesting possible interpro vincial transmission. The three static clusters and the five dynamic groups associated with international transmission were also found to be associated with interprovincial transmission in China (Table 1). Cluster 1 contained strains from 12 provinces and among the 32 dynamic groups within cluster 1, nine contained strains from more than one province. In particular, three dynamic groups (groups 1, 2, and 3) within cluster 1 contained 84 to 163 strains from 8 to 10 provinces, suggesting wide dissemination across China. Notably, these three dynamic groups were also associated with international transmission and may therefore represent high-risk clones, warranting further studies. Four clusters (2, 4, 7, and 16) contained one or two dynamic groups with strains from more than one province. The remaining nine static clusters comprising strains from two or more provinces contained dynamic groups with strains from a single province or could not be assigned to dynamic groups ( Table 1). The possible interprovincial transmission of the nine clusters would be therefore missed by using dynamic clustering alone. This highlights the need of static clusters and dynamic groups to be used in combination for analyzing CRKP transmission.

Limitations and conclusion
We are aware of the limitations of this analysis. First, publicly available genomes were highly biased and therefore the international and interprovincial transmission here are likely to represent only a tip of an iceberg with wider transmission of more ST11-KL64 strains are waiting for discovery. Second, the geographical locations of the available ST11-KL64 strains largely restricted to China and, to a lesser extent, Brazil. This may imply that ST11-KL64 strains are not yet widely distributed in the world at present but could also result from under sampling. Third, metadata sets of the ST11-KL64 strains are not available, preventing further mining to identify the sources, reconstruct the transmission events, and uncover the drivers for the international and interprovincial transmission. Such mining would provide much-needed information for designing countermeasures.  Table S1. In Table 1, only static clusters containing strains from more than one country or more than one province of China are shown. Within cluster 1, only dynamic groups containing strains from more than one country or more than one province of China are shown. For some strains, the information about the country or the Chinese province is not available. Static clusters and dynamic groups with international distribution are in bold. b (Number of genomes [number of genomes without known carbapenemase-encoding genes; otherwise, all strains had known carbapenemase-encoding genes]). c -, could not be assigned to a dynamic group.
multi-drug resistant organisms and both international and interprovincial transmission of bacterial strains have been identified even when different SNP cutoffs and parameter values were used.
In conclusion, we analyzed 730 publicly available genomes of ST11-KL64 strains, almost all of which had known carbapenemase genes, by two methodologies, static clustering based on a fixed SNP cutoff and dynamic grouping by modeling trans mission likelihood. We highlighted that ST11-KL64 is an internationally distributed lineage warranting rigorous monitoring and further studies. We identified international transmission of several ST11-KL64 strains and interprovincial transmission in China of a few strains. We also found several ST11-KL64 strains with particularly wide distribution, which warrants further investigations to uncover the mechanisms for their spread. We argued that dynamic grouping could provide complementary information to enhance the resolution for determining clonal relatedness to the widely adopted method based on a fixed SNP cutoff and the two methods may be used in combination for analyzing transmission of bacterial strains. We believe that coordinated actions at both interna tional and interprovincial levels should be taken to tackle multi-drug resistant organisms like ST11-KL64 CRKP. There is no conflict of interest for all authors.

ADDITIONAL FILES
The following material is available online.