Comparison of Escherichia coli ST131 Pulsotypes, by Epidemiologic Traits, 1967–2009

Certain high-prevalence pulsed-field gel electrophoresis types exhibited distinctive temporal patterns and epidemiologic associations.

T he prevalence of resistance to fl uoroquinolones and extended-spectrum cephalosporins in Escherichia coli has increased dramatically over the past decade. This increase is largely the result of the widespread emergence of a single disseminated E. coli clonal group, designated sequence type (ST) 131 according to multilocus sequence typing (MLST) (1,2). E. coli ST131 is characterized by serotype O25b:H4 and often produces CTX-M-15 or other extended-spectrum β-lactamases (ESBLs) (3)(4)(5). Unlike most other antimicrobial drug-resistant E. coli, ST131 derives from virulence-associated phylogenetic group B2 and typically exhibits multiple virulence factors, including adhesins, siderophores, toxins, and group 2 capsule (1-7). It thereby poses the dual threat of extensive antimicrobial drug resistance plus virulence.
By defi nition, ST131 is homogeneous with respect to housekeeping gene sequence across the 7 MLST loci; however, within-lineage genetic variation has been noted since ST131 was fi rst described (3)(4)(5). Specifi cally, diversity of pulsed-fi eld gel electrophoresis (PFGE) profi les has provided insights into the ecology of ST131. For example, the presence of ST131 isolates with similar PFGE profi les in widely dispersed locales and of isolates with quite different profi les in the same locale has suggested rapid and Comparison of Escherichia coli ST131 Pulsotypes, by Epidemiologic Traits, 1967Traits, -2009 ongoing global dissemination of ST131 (3,8). Likewise, recovery of ST131 isolates with similar PFGE profi les from multiple household members (9)(10)(11)(12) and from food animals (or retail meats) and humans (13) has suggested host-to-host or foodborne transmission, respectively, as potential mechanisms for dissemination of ST131. However, relevant studies to date have included relatively few isolates, locales, and sources and limited time periods (2,6). In addition, the idiosyncratic nature of PFGE analysis precludes across-study comparisons. Thus, we analyzed 579 ST131 isolates from diverse sources according to a standardized PFGE protocol and then compared PFGE profi les with other characteristics, including geographic origin, time of collection, ecologic source, and antimicrobial drug-resistance traits.

Isolates
The 579 ST131 study isolates, some previously published (3,(9)(10)(11)(12)(14)(15)(16)(17)(18)(19), were compiled as a series of convenience samples from collaborators in diverse locales. The isolates came already identifi ed as ST131 or as generic E. coli in need of screening for ST131 status. They derived mostly from collections assembled by investigators or reference laboratories on the basis of specifi c resistance phenotypes, O antigens, geographic origins, and/or clinical syndromes of interest. Some isolates were from cases or case series involving infected humans or animals with distinctive signs and symptoms and/or predisposing conditions (9)(10)(11)(12)14,15).
Isolates were accompanied by data regarding date of isolation (or receipt in the reference laboratory), ecologic source (i.e., host species, food, or water), and locale of origin. For some isolates, data were available regarding resistance-associated characteristics, i.e., fl uoroquinolone resistance; ESBL production; and presence of bla CTX-M-15 , which encodes the CTX-M-15 ESBL variant. If not provided, this information was newly generated.

PFGE Analysis
PFGE analysis of XbaI-restricted total DNA of isolates was performed according to a standardized protocol (20) by a single observer in 1 laboratory. Profi les were captured and analyzed digitally by using BioNumerics software version 6.6 (Applied Maths, Austin, TX, USA). Marker lanes in each gel (E. coli O157:H7 strain g5244) enabled normalization within and across gels. Band positions were assigned manually, with computer assistance. The band tolerance setting, as derived empirically from analysis of multiple same-isolate profi les, was 1.15%.
Pairwise Dice similarity coeffi cients were used to defi ne pulsotypes. Isolates exhibiting >94% profi le similarity (≈3-band difference) to the index isolate for an established pulsotype, implying genetic similarity (21), were assigned to that pulsotype; others became the index isolate for a new pulsotype. Newly encountered pulsotypes were numbered sequentially. A PFGE profi le dendrogram was constructed according to the unweighted pair group method for 87 (15%) of the isolates (selected randomly after inclusion of 2 representatives of each pulsotype with >6 members) plus the earliest isolated (1967) and earliest published (1985) isolates (8).

Susceptibility Testing
Disk diffusion testing for ciprofl oxacin susceptibility and ESBL production was performed on isolates of unknown fl uoroquinolone or ESBL phenotype as described (22,23). Fluoroquinolone resistance was defi ned as nonsusceptibility to ciprofl oxacin.

Statistical Analyses
Geographic origin was categorized as United States (with 4 subregions-West, Midwest, South, and Northeast-as defi ned by the US Census Bureau [www. census.gov/geo/www/us_regdiv.pdf]), Canada, and other international locales combined. Ecologic source was categorized as human, companion animal, food animal, other animal, food, and water. Year of isolation/submission was assessed both continuously and categorically (e.g., pre-1990 vs. later).
Comparisons of proportions were tested by using 2-tailed Fisher exact (unpaired comparisons) and McNemar (paired comparisons) tests. Comparisons involving continuous variables were tested by using the Mann-Whitney U test (2-tailed). Other variables were assessed as independent predictors of selected pulsotype categories by using multivariable logistic regression analysis. The signifi cance criterion was p<0.05.

Isolate Origins and Characteristics
The 579 ST131 study isolates were derived from humans (

Temporal Patterns
Pulsotypes varied signifi cantly by temporal occurrence. The 65 multiple-isolate pulsotypes and 12 high-prevalence pulsotypes were signifi cantly associated with more recent dates of isolation/submission, relative to the low-prevalence and single-isolate pulsotypes (Table 1). Temporal variation was also evident among the 12 high-prevalence pulsotypes; 4 were signifi cantly associated with later and 1 (type 955) with earlier occurrence (Table 1). Analysis of temporal prevalence trends ( Figure) showed that the 12 high-prevalence pulsotypes accounted collectively for only 5% of 20 isolates during the earliest period (1967-1989) but for 58% of isolates during subsequent years (p<0.001). Three of these pulsotypes (988, 800, 812) were the top 1, 2, or 3 most prevalent, overall and within each interval from 1990 forward. These 3 types appeared sequentially by overall pulsotype prevalence (i.e., in 1990-1999 for type 968, in 2000-2002 for type 800, and in 2005 for type 812) and, except for type 800 in 2003, were detected continuously after fi rst appearing. After it appeared, type 968 maintained a consistently high prevalence (>19%), whereas types 800 and 812 exhibited early prevalence spikes followed by sizeable drops. In contrast, the 9 other high-prevalence types appeared intermittently, which, depending on the pulsotype, was mostly in earlier years, later years, or sporadically throughout ( Figure).
A temporal trend was also evident in the PFGE dendrogram (online Appendix Figure, wwwnc.cdc.gov/ EID/article/18/1/11-1627-FA1.htm), which extended to 67% similarity. The more highly similar PFGE profi les in the upper region of the tree involved mostly recent isolates and higher prevalence pulsotypes, whereas the more basal, dissimilar profi les toward the lower region of the tree involved more older isolates (including isolates from 1967, 1982, 1985, and 1986) and low-prevalence pulsotypes.

Geographic Distribution
The pulsotypes primarily exhibited a broad geographic distribution, yet there was some geographic segregation. Table 2 shows the number of mutually exclusive geographic regions (among 6 total) in which each of the 65 multipleisolate pulsotypes were found. Only 18 of 65 multipleisolate pulsotypes were limited to a single geographic region (p<0.001 for occurrence in 1 vs. multiple regions, McNemar test). Moreover, these 18 pulsotypes included only 2 (13 pulsotypes), 3 (4 pulsotypes), or 4 (1 pulsotype) isolates each and represented <50% of pulsotypes within their size category. In contrast, all pulsotypes including >5 isolates were found in multiple geographic regions, and 3 of the 6 pulsotypes comprising >8 isolates spanned all 6 geographic regions (Table 2). Table 3 shows the overlap among regions by the number of shared pulsotypes and by the number of isolates in these pulsotypes. Each region overlapped partially with every other region (Table 3).
Against this background of broad geographic distribution, substantial geographic segregation of pulsotypes was evident. For example, at the isolate level, each region was negatively associated with at least 1 other region; the US West and non-Canadian international sites exhibited the greatest number of such negative associations, suggesting somewhat locale-specifi c pulsotype populations in these regions ( Table 3). The only positive association between nonoverlapping regions involved the US Midwest and Canada.   Table 4 provides a pulsotype-level analysis of these geographic associations. For example, the high-prevalence and multiple-isolate pulsotypes were collectively signifi cantly overrepresented in Canada, and the high-prevalence pulsotypes were also signifi cantly underrepresented in the United States, specifi cally, in the US West. Among individual high-prevalence pulsotypes, 968 was overrepresented in the US Midwest and underrepresented in the US West; 800 was over-represented in Canada and under-represented in the United States and the US West; 812 was over-represented in the US South; 987 was over-represented internationally (specifi cally in Australia, data not shown) and under-represented in the United States; and 1202 was over-represented in the US West (Table 4).

Source Distribution
In contrast with the generally broad geographic distribution of pulsotypes, the source distribution was more restricted, and source-specifi c segregation predominated over across-source commonality. For example, only 13 of 65 multiple-isolate pulsotypes spanned multiple sources (p<0.001, McNemar test); most of these included only 2 sources each, and none included >4 (of 6 possible) sources (Table 2).
Likewise, the by-source distribution of pulsotypes (Table 5) showed less overall commonality than did geographic distribution. Still, it showed multiple positive and negative associations at the isolate and the pulsotype level. Specifi cally, isolates from humans were associated positively with pulsotypes comprising isolates from water and negatively with pulsotypes comprising isolates from companion animals, food animals, or food. Isolates from companion animals were associated positively with pulsotypes containing isolates from other animals, and isolates from food animals were associated positively with pulsotypes containing isolates from food (Table  5). In addition, pulsotypes containing isolates from food animals were associated negatively with pulsotypes containing isolates from humans, but they were associated positively with pulsotypes containing isolates from food (Table 5).
Signifi cant by-source segregation also was evident for individual pulsotypes (Table 6). Collectively, the multiple-isolate and high-prevalence pulsotypes were associated positively with humans and negatively with  food animals and environmental sources, and highprevalence pulsotypes were associated with companion animals. Furthermore, 5 high-prevalence pulsotypes were individually signifi cantly distributed by source: type 968 was associated positively with pets and other animals and negatively with food animals and environmental sources, 800 was associated positively with humans and negatively with food animals, 812 was associated positively with humans, 1202 was associated negatively with humans and positively with food animals, and 955 was associated negatively with humans and positively with pets and environmental sources ( Table 6). Five additional pulsotypes, comprising 2 food animal isolates each, were signifi cantly associated with food animals (p = 0.004 for each; data not shown).

Antimicrobial Drug Resistance
Fluoroquinolone resistance, ESBL production, and bla CTX-M-15 also segregated signifi cantly by pulsotype in varied patterns ( Table 7). The high-prevalence and multipleisolate pulsotypes collectively and type 968 were associated positively with fl uoroquinolone resistance but indifferently with ESBL production and bla CTX-M-15 . In contrast, type 800 was associated positively with fl uoroquinolone resistance but negatively with ESBL production and bla CTX-M-15 , whereas types 905, 812, and 919 were associated positively with all 3 traits, and type 987 was associated negatively with all 3 traits (Table 7).

Multivariable Analysis
All 3 resistance traits, plus several source groups and geographic regions, exhibited signifi cant associations with year of isolation/submission (Table 1), suggesting possible confounding by temporal correlations among variables. Thus, we used multivariable logistic regression analysis to assess for independent associations of selected predictor variables with pulsotype. Separate models were constructed for the 3 most prevalent pulsotypes, the highfrequency pulsotypes, and the multiple-isolate pulsotypes, by using as candidate predictor variables 1 representative from each epidemiologic or resistance category (year, ecologic source, locale, fl uoroquinolone phenotype, and ESBL status). ESBL status was a signifi cant predictor in all 5 resulting models, as was fl uoroquinolone resistance in 4 models (fl uoroquinolone resistance was excluded from the fi fth model because of its 100% prevalence in pulsotype 812), year in 3 models, and human source in 2 models (Table 8). In contrast, US origin (the representative geographic variable) was not a signifi cant predictor in any model.

Indistinguishable PFGE Profi le Isolates
We also assessed associations with other variables for the 7 largest clusters of isolates with indistinguishable PFGE profi les; each cluster contained 4-5 isolates. Of the 31 constituent isolates, 28 were recent (2007)(2008)(2009) and the other 3 were from 2002 or 2004. Of the 7 clusters, 6 included isolates from multiple locales, from multiple continents in 3 instances. In contrast, only 3 clusters came from multiple host species. Whereas each cluster was internally homogeneous for fl uoroquinolone phenotype (6 all-resistant clusters, 1 all-susceptible cluster), 4 were internally heterogeneous according to ESBL and/or blaCTX-M-15 status.

Discussion
We used PFGE analysis to defi ne population structure among 579 diverse E. coli ST131 isolates and then assessed temporal, geographic, ecologic, and resistance trait associations for the various pulsotypes, i.e., presumed sub-ST genetic lineages. Our fi ndings support 4 main conclusions. First, although ST131 is highly diverse at the pulsotype level, a small number of high-frequency pulsotypes predominate, and pulsotype 968 accounts for 24% of the population. Second, pulsotypes differ in prevalence over time; high-prevalence pulsotypes tend to occur in more recent years, consistent with recent emergence and expansion, implying greater fi tness. Third, whereas broad geographic distribution predominates over locale-specifi c segregation, implying widespread dispersal  rather than localized endemicity, segregation by ecologic source predominates over across-source commonality, implying niche adaptation rather than broad host-range capability and interspecies transmission. Fourth, resistance traits (i.e., fl uoroquinolone resistance, ESBL production, and bla CTX-M-15 ) are highly pulsotype-specifi c, suggesting predominantly subclonal distribution. The striking prevalence disparities among pulsotypes suggest that certain pulsotypes, especially the exceptionally successful pulsotype 968, possess fi tness advantages over others. In retrospect, pulsotype 968 accounted for all previously reported household clusters, 3 of which involved serious or fatal disease in >1 household members (9)(10)(11)(12). A possible founder effect for type 968 is unlikely because the pulsotypes that were detected earliest were mostly lowprevalence types; higher prevalence pulsotypes appeared only later, seemingly outcompeting the low-prevalence types. In regard to possible fi tness advantages, ESBL production and bla CTX-M-15 were signifi cantly associated with several high-prevalence pulsotypes and were present in most or all of their members (Table 7). However, they were not signifi cantly associated with (predominant) pulsotype 968 and so are unlikely the main explanation for the recent expansion of ST131, of which pulsotype 968 was the single main component (Figure). In contrast, fl uoroquinolone resistance was signifi cantly associated with each of the 4 most prevalent pulsotypes and collectively with the 12 high-prevalence and 65 multiple-isolate pulsotypes. Thus, fl uoroquinolone resistance may have made a major contribution to the recent expansion of ST131.
Although the predominant pattern was broad dispersal of pulsotypes, localized segregation also occurred. These trends imply considerable ongoing dissemination and intermixing of ST131 lineages among locales (suffi cient to largely preclude establishment of locale-specifi c populations) but with variable degrees of intermixing versus segregation by locale and pulsotype. For example, the US Midwest and Canada shared pulsotypes more extensively than did other regions. Conversely, non-Canadian international locales and the US West had less pulsotype commonality with other regions (i.e., had more highly locale-specifi c populations) than did other locales. Several high-prevalence pulsotypes similarly exhibited distinct patterns of distribution and were variably concentrated in specifi c regions. Similar patterns have been described previously for ST131 but in lesser detail and without statistical analysis (3,6,8,17,24,25). The undefi ned mechanisms for the ongoing dispersal of ST131, possibly including international travel and commerce, wild bird migration, and foodborne or waterborne transmission, and its limited locale-specifi c segregation by pulsotype warrant study.
The associations of specifi c pulsotypes with different ecologic sources are relevant to the dispersal mechanisms of ST131. In contrast to the striking geographic dissemination of pulsotypes, we also found some evidence of niche segregation. Positive associations for niche segregation were found between humans and water, companion animals and other animals, and food animals and food. In contrast, negative associations were found between humans and most other sources, companion animals or other animals and food or food animals, and water and companion animals or food animals. These fi ndings implicate humans as the source for ST131 isolates found in water and implicate food animals as the source for isolates found in food. In contrast, and consistent with fi ndings in most, but not all, previous studies (13,(26)(27)(28), these fi ndings indicate that food animals and food are not major sources of ST131 for humans. They also suggest no special pet -human commonality of ST131 pulsotypes, notwithstanding some well-documented overlap (29). This argues against pets and the food supply as major vehicles for dissemination of ST131 strains among humans. Indeed, in 1 study, 7% of healthy humans were found to be colonized intestinally with an ST131 strain (30); thus, humans may be the main reservoir for human-associated strains. A larger, more current, and more systematically assembled study population is needed to confi rm the fi ndings of the present study.
Until now, the earliest reported isolate of ST131 was from a patient with urosepsis in 1985 (8). Here, we report 3 earlier isolations, from 1967, 1982, and 1983, none of which were from a high-prevalence pulsotype. This fi nding documents the presence of ST131 decades before its emergence as a disseminated human pathogen and suggests an opportunity to compare early isolates with recent isolates for characteristics that might confer enhanced fi tness, possibly contributing to the emergence of ST131.
Our study had limitations. First, the population was a convenience sample with multiple possible sources of bias. Second, despite considerable diversity, the population was not balanced; it predominantly comprised recent isolates from human in the United States, reducing both generalizability and power for comparisons involving other times, sources, and regions. Third, minimal associated data (especially clinical details) were available for many isolates, limiting the possible epidemiologic analyses. Fourth, PFGE profi les refl ect genetic relationships only indirectly and require subjective interpretation. Fifth, the multiple comparisons could have produced spurious associations by chance alone. However, the proportion of comparisons yielding a signifi cant p value was much greater, and the associated p values much smaller, than should occur by chance alone. Last, the 94% PFGE similarity pulsotype criterion was somewhat arbitrary and possibly suboptimal; however, an alternate 100% similarity criterion yielded qualitatively similar conclusions.
Our study also had strengths. The population was the largest reported to date for ST131 (2) and the most extensively distributed by time, source, and region. The PFGE analysis was conducted by 1 experienced observer in 1 laboratory by using software that enabled concurrent comparisons for all isolates. Diverse univariable and multivariable statistical approaches were used, pulsotypes were analyzed collectively and individually, and PFGE profi les were assessed by using 2 similarity thresholds (94% and 100%) and in a dendrogram.
Thus, within a large, diverse collection of E. coli ST131 isolates, we documented extensive PFGE profi le diversity and a predominance of certain high-prevalence pulsotypes (particularly pulsotype 968, 24% overall) that exhibited distinctive temporal patterns of emergence. Notwithstanding some geographic localization, pulsotypes were extensively dispersed by region. In contrast, they were more highly source specifi c; in particular, isolates from humans exhibited almost no commonality with isolates from food animals or foods. Pulsotype 968 was much more closely associated with fl uoroquinolone resistance than with ESBL production or bla CTX-M-15 , suggesting a greater role for fl uoroquinolone resistance than ESBLs in the expansion of this dominant pulsotype and ST131 in general. These fi ndings considerably advance our understanding of the genetic structure, ecology, geographic distribution, and emergence of this widely disseminated antimicrobial drugresistant pathogen, which represents a growing public health threat.