Comprehensive analysis of horizontal gene transfer among multidrug-resistant bacterial pathogens in a single hospital

Multidrug-resistant bacterial pathogens pose a serious public health threat, especially in hospital settings. Horizontal gene transfer (HGT) of mobile genetic elements (MGEs) contributes to this threat by facilitating the rapid spread of genes conferring antibiotic resistance, enhanced virulence, and environmental persistence between nosocomial pathogens. Despite recent advances in microbial genomics, studies of HGT in hospital settings remain limited in scope. The objective of this study was to identify and track the movement of MGEs within a single hospital system using unbiased methods. We screened the genomes of 2,173 bacterial isolates from healthcare-associated infections collected over an 18-month time period to identify nucleotide regions that were identical in the genomes of bacteria belonging to distinct genera. These putative MGEs were found in 196 isolates belonging to 11 different genera; they grouped into 51 clusters of related elements, and they were most often shared between related genera. To resolve the genomic locations of the most prevalent MGEs, we performed long-read sequencing on a subset of representative isolates and generated highly contiguous, hybrid-assembled genomes. Many of these genomes contained plasmids and chromosomal elements encoding one or more of the MGEs we identified, which were often arranged in a mosaic fashion. We then tracked the appearance of ten MGE-bearing plasmids in all 2,173 genomes, and found evidence supporting the transfer of plasmids between patients independent from bacterial transmission. Finally, we identified two instances of likely plasmid transfer across genera within individual patients. In one instance, the plasmid appeared to have subsequently transferred to a second patient. By surveying a large number of bacterial genomes sampled from infections at a single hospital in a systematic and unbiased manner, we were able to track the independent transfer of MGEs over time. This work expands our understanding of HGT in healthcare settings, and can inform efforts to limit the spread of drug-resistant pathogens in hospitals.

from healthcare-associated infections collected over an 18-month time period to identify 23 nucleotide regions that were identical in the genomes of bacteria belonging to distinct genera. 24 These putative MGEs were found in 196 isolates belonging to 11 different genera; they grouped 25 into 51 clusters of related elements, and they were most often shared between related genera. 26 To resolve the genomic locations of the most prevalent MGEs, we performed long-read 27 sequencing on a subset of representative isolates and generated highly contiguous, hybrid-28 assembled genomes. Many of these genomes contained plasmids and chromosomal elements 29 encoding one or more of the MGEs we identified, which were often arranged in a mosaic 30 fashion. We then tracked the appearance of ten MGE-bearing plasmids in all 2,173 genomes, 31 and found evidence supporting the transfer of plasmids between patients independent from 32 bacterial transmission. Finally, we identified two instances of likely plasmid transfer across 33 genera within individual patients. In one instance, the plasmid appeared to have subsequently 34 transferred to a second patient. By surveying a large number of bacterial genomes sampled 35 from infections at a single hospital in a systematic and unbiased manner, we were able to track 36 the independent transfer of MGEs over time. This work expands our understanding of HGT in 37 healthcare settings, and can inform efforts to limit the spread of drug-resistant pathogens in 38 hospitals. 39

INTRODUCTION 41
Horizontal gene transfer (HGT) is a driving force behind the multidrug-resistance and 42 heightened virulence of healthcare-associated bacterial infections 1 . Genes conferring antibiotic 43 resistance, heightened virulence, and environmental persistence are often encoded on mobile 44 genetic elements (MGEs), which can be readily shared between bacterial pathogens via HGT 2 . 45 While rates of HGT are not well quantified in clinical settings, prior studies have shown that 46 MGEs can mediate and/or exacerbate nosocomial outbreaks 3-6 . Recent studies have also 47 To further investigate the genomic context of the MGEs identified, we selected representative 130 isolates from the largest MGE clusters for long-read sequencing using Oxford Nanopore 131 technology. Hybrid assembly using short Illumina reads and long Nanopore reads generated 132 highly contiguous chromosomal and plasmid sequences, which allowed us to resolve larger 133 elements carrying one or more of the most prevalent MGE clusters (Table 1). We found that 134 several of the smaller and more prevalent MGEs were carried on a variety of different plasmid 135 and chromosomal elements, which we designated as "MGE lineages" (Table 1, Fig. 4A). These 136 smaller MGEs co-occurred in different orders, orientations, and combinations on the larger 137 elements. This kind of "nesting" of MGEs within larger mobile elements has been previously 138 observed 6 , and our findings further support the mosaic, mix-and-match nature of the smaller 139 MGEs we identified. We also confirmed that these MGEs were truly mobile, since they 140 appeared to be able to move independently between multiple distinct larger mobile elements. A 141 closer examination of the three largest MGE clusters (C1, C2, C3) showed that C1 sequences 142 did not all share a common "core" nucleotide sequence, but rather could be aligned in a 143 pairwise fashion to generate a contiguous sequence (Fig. 4B). MGE clusters C2 and C3, on the 144 other hand, did contain "core" sequences that were present in all genomes carrying the MGE 145 (Fig. 4C,4D). 146 147 Plasmids carrying MGE clusters are found in multiple sequence types, species, and 148 genera circulating in the same hospital 149 More than half (104/196) of the MGE-carrying genomes in our dataset contained one or more of 150 the five most prevalent MGEs we identified (C1-C5, Fig. 1B). All five MGEs were small (usually 151 less than 10kb), and were predicted to be carried on plasmids shared between 152 Enterobacteriaceae. We set out to resolve the genomic context of each of these five MGEs in all 153 isolates containing them. We used an iterative approach involving long-read sequencing and 154 hybrid assembly of representative isolates to generate reference sequences of MGE-containing elements (chromosomal or plasmid), followed by mapping of contigs from Illumina-only 156 assemblies to these reference sequences to assess their coverage in every genome (Methods). 157 This approach allowed us to query the presence of plasmids and chromosomal elements from 158 genomes sequenced with Ilumina technology alone, without requiring long-read sequencing of 159 all isolates or relying on external reference sequences. We found that 11 of the 104 isolates (all 160 E. coli) carried cluster C1 and C3 MGEs on their chromosome, while the remaining 93 isolates 161 carried clustered MGEs on 17 distinct plasmids. Seven of these plasmids were present in only 162 one isolate in the dataset, but 10 plasmids appeared to be shared between more than one 163 isolate ( While all of the MGEs we originally identified were present in the genomes of bacteria belonging 168 to different genera, the plasmids that we resolved were variable in how widely they were shared. 169 For example, some plasmids were only found among isolates belonging to a single species and 170 multilocus sequence type (ST), suggesting that they were likely transmitted between patients 171 along with the bacteria that were carrying them (Fig. 5A). These included a blaKPC-3 172 carbapenemase-encoding plasmid (pKLP00149_2) found in K. pneumoniae isolates belonging 173 to ST258, a multidrug-resistant and highly virulent hospital-adapted bacterial lineage that has 174 recently undergone clonal expansion in our hospital 18 . We also found a blaOXA-1 extended 175 spectrum beta-lactamase-encoding plasmid in E. coli isolates belonging to ST131, another 176 multidrug-resistant and hypervirulent bacterial lineage 25 . In addition to plasmids that occurred in 177 bacteria belonging to the same ST, we also identified plasmids that were present in isolates 178 belonging to different STs of the same species, or in different species of the same genus (Fig.  179 5B). All isolates in this case were K. pneumoniae or K. oxytoca, suggesting widespread sharing 180 of plasmids between distinct Klebsiella species and STs. The plasmids often carried antibiotic 181 resistance genes, and many also carried metal interaction genes (Table 1). Finally, we identified 182 three different plasmids that were shared between different bacterial genera all belonging to the 183 Enterobacteriaceae (Fig. 5C). One small plasmid (pKLP00155_6) carrying the colicin bacterial 184 toxin was found in 26 isolates belonging to 10 different STs and four different genera. Taken 185 together, these results indicate that some plasmids carrying putative MGEs were likely inherited 186 vertically as bacteria were transmitted between patients in the hospital, while others appear to 187 have transferred independently of bacterial transmission. 188 189

Likely HGT across genera within individual patients 190
By cross-referencing the isolates containing MGE sequences with de-identified patient data, we 191 found two instances where identical MGEs were found in pairs of isolates of different genera 192 that were collected from the same patient, on the same date, and from the same sample source. 193 To resolve the complete MGE profiles of these cases, we performed long-read sequencing and 194 hybrid assembly on all genomes involved (Fig. 6) In the second case of putative within-patient HGT, a K. pneumoniae ST231 isolate (KLP00187) 205 and a C. braakii ST356 isolate (CB00017) were both collected from the same urine sample of 206 Patient C (Fig. 6B). Both isolates carried nearly identical 196.8kb IncFIB(K)/IncFII(K) plasmids 207 conferring resistance to aminoglycosides, beta-lactams, chloramphenicol, fluoroquinolones, 208 sulfonamides, tetracyclines, and trimethoprim, as well as operons encoding copper and arsenic 209 resistance. In addition, isolates from two subsequent patients (Patient D and Patient E) also 210 carried plasmids belonging to the same lineage as the plasmid shared between KLP00187 and 211 CB00017. Alignment of the sequences of all four plasmids showed that the plasmids isolated 212 from Patient C were nearly identical, while the plasmids from Patients D and E had small 213 differences in their gene content and organization (Fig. 6B). Systematic chart review did not 214 identify any strong epidemiologic links between the three patients, suggesting that this plasmid 215 was not passed directly between these patients and might instead have transferred via 216 additional bacterial isolates or populations that were not sampled. 217

DISCUSSION 219
In this study, we identified MGEs in a large dataset of whole-genome sequences of clinical 220 bacterial isolates collected over an 18-month period from a single hospital. We identified, 221 clustered, and characterized identical sequences found in multiple distinct genera, and in the 222 process uncovered both expected and unexpected cases of MGE occurrence. We confirmed 223 that some of the most common MGEs identified were fragments of larger mobile elements. We 224 performed long-read sequencing to resolve these larger elements, which were almost always 225 plasmids. When we traced the presence of various plasmid lineages over time, we found some 226 that were likely transmitted vertically along with the bacteria carrying them, and others that 227 appeared to be transferred horizontally between unrelated bacteria. 228

229
Our study adds to the body of knowledge of HGT in hospital settings in new and important 230 ways. We analyzed a large dataset of clinical isolates collected from a single health system, and 231 used a systematic and unbiased approach to identify MGEs regardless of their type or gene 232 content. While prior studies have used genomic epidemiology to study how HGT contributes to the transmission, persistence, and virulence of bacterial pathogens 4,5,19,20 , the technical 234 challenges of resolving MGEs from whole-genome sequencing data have limited the scope of 235 these findings 16 . Other studies have deliberately tracked HGT in healthcare settings by focusing 236 either on mobile genes of interest, such as those encoding drug resistance 7,9,14 , or on specific 237 classes of MGEs 28 . Both of these approaches can generate biased interpretations of the driving 238 forces behind HGT in clinical settings. For this reason we selected a pairwise alignment-based 239 approach, whereby we only looked for identical sequences in the genomes of very distantly 240 related bacteria. In doing so, we did not limit ourselves to only looking for "known" MGEs, and 241 thus obtained a more accurate and comprehensive overview of the dynamics of HGT between 242 bacterial genera in our hospital. 243

244
What might cause horizontally-transferred nucleotide sequences to be found at very high 245 identity within phylogenetically distinct bacteria? We predicted that there might be two possible 246 causes: Either the sequences we identified represent MGEs that recently underwent HGT and 247 have not had time to diverge from one another, or they represent genetic elements that are 248 highly intolerant to mutation. We suspect that our dataset contains both cases. In the two 249 instances of likely within-patient HGT, both plasmids isolated from the same patient were nearly 250 identical to one another, suggesting that they were indeed transferred shortly before the 251 bacteria were isolated. In both cases we also observed similar plasmids in the genomes of 252 isolates from other patients, but we identified a likely route of transfer between patients only in 253 the case where the subsequent plasmid was also nearly identical. This finding further supports 254 the idea that high plasmid identity is evidence of recent transfer. On the other hand, the Tn7 255 transposon sequence we uncovered that was identical in bacterial isolates from three different 256 genera was also identical to over two dozen publicly available genome sequences queried 257 through a standard NCBI BLAST search. The insertion of the Tn7 transposon downstream of glmS in all of our isolates suggests TnsD-mediated transposition 29 , but the reason why the 259 entire transposon sequence is so highly conserved is unclear. 260

261
The vast majority of MGE sequences identified through our approach contained signatures of 262 mobile elements, and our follow-up work demonstrated that they could very likely move 263 independently and assemble mosaically on larger mobile elements, such as plasmids, 264 integrative conjugative elements, and other genomic islands. Antibiotic resistance genes were 265 present in fewer MGE clusters than we anticipated, given how many resistance genes are 266 known to be MGE-associated. Our follow-up analysis showed, however, that resistance genes 267 were indeed highly prevalent among the larger MGEs that we resolved. This suggests that 268 resistance genes often reside on smaller and more variable elements, which would have been 269 filtered out by the parameters of our initial screen. A recent study of clinical K. pneumoniae 270 genomes showed that while antibiotic resistance genes were largely maintained at the 271 population level, they were variably present on different MGEs that fluctuated in their prevalence 272 over time 24 . Finally, we were somewhat surprised by the large number of metal-interacting 273 genes and operons within the MGEs that we identified. While metal-interacting genes and 274 operons have been hypothesized to confer disinfectant tolerance and increased virulence 30,31 , 275 precisely how these elements might increase bacterial survival in the hospital environment 276 and/or contribute to infection requires further study. 277 278 Identification of risk factors and common exposures for HGT has previously been 279 proposed 1,14,18,32 , but the results of prior efforts have been limited because large genomic 280 datasets from single health systems with corresponding epidemiologic data have not been 281 widely available 33 . The use of routine whole-genome sequencing for outbreak surveillance in our 282 hospital has allowed us to begin to study how the transmission of MGEs might be similar or 283 different from bacterial transmission. In addition to finding evidence of vertical transfer of plasmids accompanying bacterial transmission, we also identified several cases in which the 285 same MGE lineage was identified in two or more isolates of different sequence types, species, 286 or genera. In some cases, these isolates were collected within days or weeks of one another. 287 This finding underscores how rapidly MGEs can move between bacterial populations, 288 particularly in hospitalized patients 1,21 , and highlights the importance of pairing genome 289 sequencing with epidemiologic data to uncover routes of MGE transmission. 290 291 There were several limitations to our study. First, the dataset that we used only contained 292 genomes of isolates from clinical infections from a pre-selected list of species, and did not 293 include environmental samples or isolates from patient colonization. Second, our method to 294 screen for putative MGE sequences based on cross-genus alignment was based on somewhat 295 arbitrary cutoffs, and we largely ignored MGEs that only transferred between bacteria within a 296 single genus. Additionally, the cross-genus parameter we employed may have artificially 297 enriched the number of MGEs we identified among Enterobacteriaceae, which are known to 298 readily undergo HGT with one another 7 . Third, we assigned MGE lineages relative to single 299 reference sequences and based on our analysis on reference sequence coverage; subsequent 300 MGEs that either gained additional sequence or rearranged their contents would still be 301 assigned to the same lineage, even though they may have diverged substantially from the 302 reference MGE 6 . Finally, this study was based exclusively on comparative genome analyses, 303 and the MGEs we resolved from clinical isolate genomes were not queried for their capacity to 304 undergo HGT in vitro. 305 306 In conclusion, we have shown how bacterial whole genome sequence data, which is 307 increasingly being generated in clinical settings, can be leveraged to study the dynamics of HGT 308 between drug-resistant bacterial pathogens within a single hospital. Our future work will include 309 further characterization of the MGEs we resolved, assessment of MGE sharing across closer genetic distances, and incorporation of additional epidemiologic information to identify shared 311 exposures and possible routes for MGE transfer independent from bacterial transmission. 312 Ultimately we aim to develop this analysis into a reliable method that can generate actionable 313 information and enhance traditional approaches to prevent and control multidrug- and isolates having at least 90% coverage of the reference element were assigned to that 387 element's "lineage." Among isolates having less than 90% coverage, a representative was again 388 selected for long-read sequencing and hybrid assembly, and the process was repeated until all 389 104 isolates had been assigned to a lineage. Lineages were named based on the MGE-390 containing element type (c = chromosomal, p = plasmid), the reference isolate, and the hybrid 391 assembly contig number, denoted with an underscore at the end of the name. MGE cluster-392 containing plasmids resolved through hybrid assembly were also used as reference sequences 393 to query their presence in the entire 2,173 genome data set using the same BLASTn coverage-394 based analysis as above. When isolate genomes showed high coverage of multiple reference 395 plasmids, the longest plasmid having at least 90% coverage was recorded. 396 397 Systematic chart review to assess epidemiologic links between patients with the same 398 plasmids 399 Patients whose isolates carried the two plasmids found to putatively transfer within individual 400 patients were reviewed using a systematic approach modified from previously published 401 methodologies examining patient locations and procedures for potential similarities 49,50 . Patients 402 were considered infected/colonized with the recovered plasmid on the day of the patients' 403 culture and all subsequent days. Potential transfer events were considered significant for 404 locations if an uninfected/uncolonized patient was housed on the same unit location or service 405 line location (units with shared staff) at the same time or different time as a patient 406 infected/colonized with the plasmid, using a 60-day window prior to the newly infected/colonized 407 patient's culture date. Additionally, procedures (e.g. operation room procedures, bedside 408 invasive procedures) were evaluated for commonalities among all patients 60 days prior to 409 infection/colonization, as well as potential procedures contaminated by prior infected/colonized 410 patients that could have transferred to newly infected/colonized patients, again using a 60-day 411 window prior to the culture date. Procedures were deemed significant if >1 patient had a similar 412 procedure, or if there was a shared procedure within the 60-day window. Clin. Microbiol. Rev. 12, 147-179 (1999).   marcescens genome (SER00094) and two P. aeruginosa genomes (PSA00048 and 569 PSA00656). Blue = intS integrase; green = formaldehyde resistance genes; gray = UvrABC 570 system genes. Type IV secretion machinery is marked with an orange bar, and gray shading marks sequences that are 100% identical between isolates. (C) Identical Tn7 transposons 572 shared between A. baumannii,E. coli,and P. mirabilis (MGE cluster C17). The Tn7 sequence of 573 the pR721 plasmid is shown at the top. The tnsABCDE transposon machinery is marked with an 574 orange bar, and the glmS gene, which flanks the Tn7 insertion site, is colored red. Shared drug 575 resistance genes are colored magenta, and an xerH tyrosine recombinase is colored blue. Gray 576 shading marks sequences that are 100% identical between isolates. 577  Table 1. Shape and color of data points correspond to bacterial species and ST, respectively. 593      . The VanA operon, conferring vancomycin resistance, is marked with an orange bar. Shared drug resistance genes are colored magenta, and mobile element genes are colored blue. Gray shading marks a stretch of DNA sequence that is 100% identical between isolates. (B) Identical portions of an integrated conjugative element (MGE cluster C30) shared between an S. marcescens genome (SER00094) and two P. aeruginosa genomes (PSA00048 and PSA00656). Blue = intS integrase; green = formaldehyde resistance genes; gray = UvrABC system genes. Type IV secretion machinery is marked with an orange bar, and gray shading marks sequences that are 100% identical between isolates. (C) Identical Tn7 transposons shared between A. baumannii, E. coli, and P. mirabilis (MGE cluster C17). The Tn7 sequence of the pR721 plasmid is shown at the top. The tnsABCDE transposon machinery is marked with an orange bar, and the glmS gene, which flanks the Tn7 insertion site, is colored red. Shared drug resistance genes are colored magenta, and an xerH tyrosine recombinase is colored blue. Gray shading marks sequences that are 100% identical between isolates. elements (black bars) that encode MGE clusters C1, C2, and C3. Lowercase letters in sequence names indicate element type (c = chromosome, p = plasmid). Homologous cluster sequences are connected to one another with colored links (purple = C1, orange = C2, green = C3, gray = other). Inner circle depicts MGE genes involved in MGE mobilization (blue), antibiotic resistance (red) and metal interaction (gray). (B-D) Alignments of sequences grouped into MGE clusters C1 (B), C2 (C), and C3 (D) from the larger MGEs displayed in (A). ORFs are colored by function (blue = mobilization, red = antibiotic resistance, green = other/hypothetical). Antibiotic resistance genes are labeled above and dark gray blocks connect sequences that are identical over at least 5kb.    IncFIB(pQil)/IncFII(K) carbapenemase-encoding plasmid was resolved from two genomes of different bacterial isolates from the same clinical specimen from Patient A. A nearly identical plasmid was also identified in an isolate from Patient B, who occupied a hospital room adjacent to Patient A. (C) Alignment of a 196.8kb IncFIB(K)/IncFII(K) multidrug-resistance plasmid resolved from two genomes of different bacterial isolates from the same clinical specimen from Patient C. Similar plasmids were also found in isolates from two additional patients (Patient D and Patient E), who had no identifiable epidemiologic links with Patient C. ORFs are colored by function (blue = mobilization, red = antibiotic resistance, gray = metal-interacting, green = other/hypothetical). Antibiotic resistance genes, metal-interacting operons, and Type IV secretion components are labeled. Shading between sequences indicates regions >5kb with >99.9% identity, and pairwise identities across the entire plasmid are noted to the right.