Comparison of carbapenem-susceptible and carbapenem-resistant Enterobacterales at nine sites in the USA, 2013–2016: a resource for antimicrobial resistance investigators

Carbapenem-resistant Enterobacterales (CRE) are an urgent public health threat. Genomic sequencing is an important tool for investigating CRE. Through the Division of Healthcare Quality Promotion Sentinel Surveillance system, we collected CRE and carbapenem-susceptible Enterobacterales (CSE) from nine clinical laboratories in the USA from 2013 to 2016 and analysed both phenotypic and genomic sequencing data for 680 isolates. We describe the molecular epidemiology and antimicrobial susceptibility testing (AST) data of this collection of isolates. We also performed a phenotype–genotype correlation for the carbapenems and evaluated the presence of virulence genes in Klebsiella pneumoniae complex isolates. These AST and genomic sequencing data can be used to compare and contrast CRE and CSE at these sites and serve as a resource for the antimicrobial resistance research community.


INTRODUCTION
Carbapenem-resistant Enterobacterales (CRE) are responsible for approximately 13 100 infections per year in the USA and are considered an urgent public health threat by the Centers for Disease Control and Prevention (CDC) [1,2].Mortality and healthcare costs associated with these infections are high [3,4].The molecular epidemiology of CRE differs by country and region [5,6].For example, the Klebsiella pneumoniae carbapenemase (KPC) is endemic in some regions of the USA, whereas the New Delhi metallo-β-lactamase (NDM) is endemic in regions of the Indian subcontinent [5,6].In the early stages of the CRE epidemic in the USA, KPC-producing K. pneumoniae ST258 was identified as the dominant strain [7,8].Over time, the molecular epidemiology has diversified with multiple species, multiple strains and multiple carbapenemase genes contributing to the carbapenem resistance landscape [9,10].For example, detection of NDM has become more common in the Antibiotic Resistance Laboratory Network in the USA [11].
Genomic sequencing is a critical tool in the investigation of CRE [12].In addition to understanding molecular epidemiology, it can be used in outbreak investigations [12][13][14][15][16][17], and to identify the presence of genes of interest (e.g.known and putative carbapenemase genes [18,19] or hypervirulence markers [20]) in a bacterial isolate.Several investigators have also evaluated the correlation between antimicrobial resistance genes detected by genomic sequencing and antimicrobial susceptibility testing (AST) data [21][22][23].This is foundational work for understanding the potential for genomic results to provide clinically actionable data, although the use of genomic sequencing for this purpose remains aspirational currently due to the complex relationship between genotype and phenotype, cost, and turnaround time [24].Similarly, virulence factors have been correlated with infection severity to provide new insights into why some infections lead to more severe disease than others [25].
There have been many studies performed describing the molecular epidemiology of CRE in the USA, all involving different time periods and different regions [8-11, 18, 22, 26-30].However, few studies have incorporated the molecular epidemiology of contemporaneous and geographically matched carbapenem-susceptible Enterobacterales (CSE).The inclusion of these isolates allows for exploration of the ways in which carbapenem resistance develops (i.e. common strains acquiring carbapenem-resistance genes or new strains harbouring these genes).Through the Division of Healthcare Quality Promotion (DHQP) Sentinel Surveillance system, we collected CRE and CSE isolates from nine clinical laboratories in the USA from 2013 to 2016.Isolates underwent AST and genomic sequencing.We use these data to describe the molecular epidemiology of CRE and CSE at these sites during this time, to evaluate the presence of antimicrobial resistance genes, and to correlate the observed AST phenotype with the genotype for the carbapenems.We also evaluate the presence of virulence genes in the K. pneumoniae complex isolates.

Source of isolates
From October 2013 to June 2016, isolates were collected from nine clinical laboratories across the USA as part of the DHQP Sentinel Surveillance system and sent to CDC.In addition to the isolates, accompanying information included specimen collection date, patient age, specimen source, organism identification and phenotype (CRE or CSE).Participating laboratories were located in California (participated from October 2013 to September 2015), Iowa, Maryland, New Mexico, New York (two laboratories), North Carolina, Pennsylvania and Washington.All were located in metropolitan areas and seven (77.8 %) were affiliated with an academic medical centre.Each site was asked to submit approximately 12 CSE isolates every 3 months (range 10-15 per site every 3 months, n=1160 total) and all their CRE isolates (range 0-24 per site every 3 months, n=514 total).Enterobacterales isolates were considered CRE if they tested resistant to any carbapenem (doripenem, ertapenem, imipenem or meropenem) according to Clinical and Laboratory Standards Institute (CLSI) breakpoints or were documented to produce a carbapenemase at the clinical laboratory [31][32][33].For Morganella morganii, Proteus spp. or Providencia spp. to be classified as CRE, they needed to test resistant to doripenem, ertapenem or meropenem as these species have intrinsically elevated imipenem minimum inhibitory concentrations (MICs) [31].Sites were asked to submit only one isolate per patient per 3 month period for both CRE and CSE.Isolates could be from any source (sterile or non-sterile) but included isolates must have undergone identification and AST as part of the clinical laboratory's routine workflow for clinical purposes.No isolates were collected exclusively for the purposes of this study.

Significance as a BioResource to the community
This unique dataset of 680 isolates significantly bolsters publicly available data for public health, molecular epidemiology and molecular biology efforts.It comprises high-quality genomic data from carbapenem-susceptible Enterobacterales (CSE) and confirmed carbapenem-resistant Enterobacterales (CRE) identified through routine culture at nine clinical laboratories from across the USA.The CSE and CRE were collected contemporaneously and from the same catchments allowing for comparison of the strain dynamics of these populations.This dataset serves as a significant resource for further analyses -the genomic data are curated with reference antimicrobial susceptibility data for each isolate.In addition, these data may be valuable for improving our understanding of phylogenomics, the spread of carbapenem resistance and molecular epidemiology.
was performed using in-house-developed frozen reference broth microdilution panels according to CLSI guidance [34].A list of all the antimicrobials tested can be found in File S1.All categorical interpretations were determined according to CLSI guidance in the M100-S30 document [35].For tigecycline, where no CLSI guidance was available, U.S. Food and Drug Administration (FDA) interpretive criteria were used [36].P-values for comparisons of categorical variables were calculated using the chi-squared test.

Genomic sequencing
A subset of CRE and CSE were selected for genomic sequencing including 421 CRE and 401 CSE.CRE were selected if not already sequenced for other purposes.CSE were selected so the species distribution for CSE would be similar to that of CRE.Only isolates that conformed to the phenotype to which they were submitted were eligible for genomic sequencing.At the Translational Genomics Research Institute North (TGen North), genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen) following the manufacturer's instructions for Gram-negative bacteria.Genomic libraries were prepared with a 500 bp insert size using the KAPA Library Preparation Kit (Roche) and quantified using the KAPA Library Quantification Kit (Roche) and sequenced on a MiSeq System using v2 (2×250) or v3 (2×300) chemistry (Illumina) or a NextSeq System using a High-or Mid-Output kit (2×150) (Illumina).

Bioinformatics analysis
Genomic sequencing reads were processed through CDC's in-house QuAISAR-H pipeline; all publicly available tools and versions can be found on the QuAISAR-H repository [37].Specifically, sequences were processed with BBDuk v38.26 and Trimmomatic v0.35 to remove adapters and PhiX, and to quality trim sequences [38,39].Samples were assembled de novo using SPAdes v3.13.0 with settings --careful and --only-assembler and --phred-offset 33 [40].After assembly, scaffolds of <500 bp were removed.We filtered samples on quality control metrics requiring >25× raw read coverage, <300 contigs after trimming, and the ratio of assembly length to the calculated median National Center for Biotechnology Information (NCBI) RefSeq genome size between +0.20 and −0.20.Filtering removed 125 samples from further analysis.Taxonomic identification was completed by finding the highest average nucleotide identity (ANI) against NCBI's RefSeq database using pyANI v.0.2.11.Samples with incongruous matches between their best taxonomic hit via ANI and MALDI-ToF were also excluded (n=17).
Sequence types (STs) were assigned using the PubMLST database and mlst v2.19 [41,42].ST diversity was measured by calculating Simpson's diversity index (1-D).1-D ranges from 0 to 1 with higher values indicating higher diversity.
For phylogenetic analyses based on SNPs, TGen North used NASP to create an SNP matrix from the sample sets of interest [51].First, sequence reads were aligned with BWA-MEM to a reference genome (Escherichia coli ST131: NCBI GenBank accession no.HG941718, K. pneumoniae ST258: GenBank accession no.CP006923) that had regions of recombination removed [52][53][54][55].SNPs were called with GATK [56].Nucleotide positions filtered out included SNP loci with <10× coverage or <90 % consensus in any one sample, and SNP loci that were not present in all the genomes in the sample set.From NASP, results were output in an SNP matrix from a core genome common to all the isolates in the analysis.Phylogenetic trees were generated from the NASP SNP matrices with mega11 and visualized in MicroReact [57,58].
Sequence reads were submitted to NCBI's Sequence Read Archive (SRA) database under the BioProject PRJNA288601 (File S1).

CDC AST data and presence of β-lactamase genes
Among 352 confirmed CRE with quality sequencing data, 47 were resistant to only one carbapenem (13.4 %), 305 were resistant to two or more carbapenems (86.6 %), and 263 were resistant to all carbapenems tested (74.7 %).Of the 47 isolates resistant to only one carbapenem, 46 were resistant to ertapenem and one was resistant to imipenem.Overall, 282 CRE isolates harboured a carbapenemase gene (80.1 %); 15 were resistant to only one carbapenem (5.3 %), 267 were resistant to two or more carbapenems (94.7 %), and 244 were resistant to all carbapenems tested (86.5 %).There were 276 CRE isolates with bla KPC ; 13 were resistant to only one carbapenem (4.7 %), 263 were resistant to two or more carbapenems (95.3 %), and In addition to the four carbapenems, AST data were generated for 19 additional antimicrobial agents (Fig. 1, File S1).These data were also stratified by carbapenemase producing (CP)-CRE status (CP-CRE, non-CP-CRE or CSE).For some nonbeta-lactam bug-drug combinations, CP-CRE were less frequently susceptible to antimicrobial agents than non-CP-CRE.An example of this pattern is ciprofloxacin and Enterobacter cloacae complex; 11 % of CP-CR Enterobacter cloacae complex were susceptible to ciprofloxacin compared to 58 % of non-CP-CR Enterobacter cloacae complex (P=0.0029).Another example is tobramycin and Escherichia coli; 31 % of CP-CR Escherichia coli were susceptible to tobramycin compared to 73 % of non-CP-CR Escherichia coli (P=0.0191).

Molecular epidemiology of the isolates
Simpson's diversity index (1-D) values for each species and phenotype revealed high diversity with the exception of CR K. pneumoniae complex (1-D=0.38)(Table 3).There were 164 unique STs for the 476 K. pneumoniae complex isolates (Table 4, File S2, Fig. S1A).There was also one isolate for which an ST could not be generated due to an incomplete sequence for a single gene.The most common ST was ST258 (n=207) comprising 43.5 % of all isolates; 98.6 % of ST258 isolates harboured bla KPC (File S2, Fig. S2).There were 39 unique STs for the 87 Escherichia coli isolates (Table 4, File S2, Fig. S1B).ST131 was most common and accounted for 32.2 % of all Escherichia coli (n=28) (Fig. 2).There were 47 unique STs for the 71 Enterobacter cloacae complex isolates (Table 4, File S2, Fig. S1C).
Plamsid markers were identified in 589 of 680 isolates (86.6 %).CRE isolates carried a mean of 3.3 plasmid markers and CSE carried a mean of 1.9 markers.A detailed list of identified plasmid markers can be found in File S1.

DISCUSSION
This study provides a snapshot of the molecular epidemiology of CRE at nine sites in the USA between 2013 and 2016.A unique feature was the inclusion of CSE.We describe the predominance of ST258 amongst the K. pneumoniae complex isolates with bla KPC (86.1%) and the predominance of ST131 amongst both CR (38.7%) and CS (28.6%)Escherichia coli isolates (32.2 % overall).This confirms what others have found over the same period [18].ST131 Escherichia coli isolates did not cluster by site or phenotype (CRE vs. CSE).In contrast, the ST258 K. pneumoniae complex isolates clustered by site.These findings along with the Simpson's diversity index results suggest that CR in the K. pneumoniae complex is driven by clonal transmission.
Inclusion of CSE also allows for monitoring of potential reservoirs of hypervirulent Klebsiella.We identified biomarkers for hypervirulence in six K. pneumoniae complex isolates without carbapenemase genes.These isolates were identified in combinations of serotype and ST known for hypervirulence [59][60][61][62] and all carried the plasmid-mediated rmpA associated with a mucoid phenotype [63].There is the risk that these STs associated with hypervirulence may acquire carbapenemases, which has already been documented [20].
This study has detailed AST data using reference broth microdilution which can be analysed in conjunction with genomic sequencing data.We focused on carbapenemase genes and the CR phenotype and found 80.1 % of the CRE isolates harboured carbapenemase genes, although this varied depending on the site (range: 9.1-96.3%) and species (only 8.3 % of CR K. aerogenes had a carbapenemase while 100 % of CR C. freundii complex had a carbapenemase).Other CDC studies using different catchment areas and periods have found that 35-73 % of CRE are CP-CRE [18,29,64].The percentage was probably higher in this study given the geographical distribution of sites submitting isolates.For example, there were two sites from New York, and the early impact of the KPC outbreak in New York has been well described [65][66][67].Most carbapenemase genes detected in this study were bla KPC variants (97.5%), which is consistent with the molecular epidemiology of CRE at the time the study was conducted [18].One unique finding was a bla IMP-8 isolate (sample name: CS-666) that was submitted as CSE and confirmed as carbapenem-susceptible.This isolate tested as intermediate to all four carbapenems.This could be an instance of a minor AST error (labelling an isolate as intermediate when it is resistant) as broth microdilution MIC testing is only accurate to within ±1 doubling dilution [68].Another possible explanation is that bla IMP-8 is a weak carbapenemase and isolates with this gene frequently test susceptible to carbapenems [69].In addition to the CR phenotypes, many other genotype-phenotype associations can be examined as there are data on aminoglycosides, fluoroquinolones and many other drugs.
This work is subject to several limitations.Most significantly, the isolates were collected in 2013-2016, so the molecular epidemiology at the time this study was conducted may not reflect the current epidemiology.For example, CP-CRE was more frequently due to bla KPC at the time the study was conducted.Also, the sites may not be representative of the USA as a whole.All laboratories submitting isolates for this study were in metropolitan areas and most were affiliated with academic medical centres.Also, the isolates were tested several years ago so newer antimicrobial agents (e.g.ceftazidime-avibactam, meropenemvaborbactam, imipenem-relebactam and cefiderocol) were not included and the modified carbapenem inactivation method test was not performed to evaluate the presence of phenotypic carbapenemase production.Furthermore, only ~80 % of the isolates were successfully sequenced, which is a high failure rate, but we do not think there was a bias in which isolates failed sequencing.We also did not look at porin mutations or other mutations that may have contributed to carbapenem resistance beyond the presence of antimicrobial resistance genes.Finally, the isolates were submitted as a convenience sample rather than a random sample, so the isolates sent to CDC may be biased in some way.For example, a greater proportion of CSE were blood isolates compared to CRE.
This unique dataset of 680 isolates significantly bolsters publicly available data for public health, molecular epidemiology and molecular biology efforts.It comprises high-quality genomic data, not only from verified CRE but also from CSE, from across the USA.We believe this dataset serves as a significant resource for further analyses as the genomic data are curated with each isolate's AST data.In addition, these data may be valuable for improving the understanding of phylogenomics, the spread of carbapenem resistance and molecular epidemiology.Continued efforts not only to generate and share microbiological data but also to expand, fully mine and harness these data are critical, as these scientific findings have and will continue to provide tools for public health action and, in some cases, clinical decision-making.

Fig. 2 .
Fig.2.Maximum-parsimony phylogenetic tree (with 10 bootstrap replicates) based on 13 297 parsimony-informative SNPs in 28 genomes of Escherichia coli ST131 isolates aligned to the EC958 genome (GenBank accession no.HG941718) with two apparent regions of recombination removed (positions 3385000-3420000 and 3425000-3455000).This analysis covered 4.17 Mb (82.6 % of the 4.81 Mb reference genome of EC958).The coverage for each genome in the dataset ranged from 88.8 to 94.6 % of the reference genome.The consistency index is 0.95, indicating a low level of homoplasy in the dataset.The figure can be found at: https://microreact.org/project/kddx7Bby9VYhM1iVAQ3CQJ-st131-cdc.

Table 3 .
Simpson's diversity index (1-D) by phenotype and species

Table 4 .
Molecular epidemiology, carbapenemase genes and sequence types