Fine-Scale Structure Analysis Shows Epidemic Patterns of Clonal Complex 95, a Cosmopolitan Escherichia coli Lineage Responsible for Extraintestinal Infection

Escherichia coli clonal complex 95 represents a cosmopolitan, genetically diverse lineage, and the extensive substructure observed in this lineage is epidemiologically and clinically relevant. The frequency with which CC95 strains are responsible for extraintestinal infection appears to have been stable over the past 15 years. However, the different subgroups identified within this lineage have an epidemic structure depending on the host, sample, continent, and time. Thus, the evolution and spread of strains belonging to CC95 are very different from those of another cosmopolitan human-associated clonal complex, CC131, which has increased significantly in frequency as a cause of extraintestinal infection over the past 15 years due to the evolution and spread of two very closely related, nearly monomorphic lineages.

Although members of CC95 are rare in nonhuman vertebrates, in humans, CC95 isolates are not only frequently observed, but their frequency appears to be temporally stable. Two large collections of E. coli strains collected from human feces, urine, and blood by the Microbiology Laboratory of the Canberra Hospital, Australia, in 2002 (20) and again in 2014 and 2015 using nearly identical methods were assigned to a phylogroup and screened for members of CC95 as described in Materials and Methods. Although the frequency of phylogroup B2 strains was highest among urine isolates and lowest among fecal isolates, with blood isolates intermediate, there was no change in the relative abundance of phylogroup B2 strains between 2002 and 2014, irrespective of the isolate's source (feces, urine, or blood) (nominal logistic regression: year, P ϭ 0.98; source, P Ͻ 0.001; year-source interaction, P ϭ 0.86) (Fig. 1A). Among the B2 strains, the CC95 strains represented 19% of the isolates, but here again, there was no significant change in the frequency of CC95 isolates between 2002 and 2014 and no difference in the frequency of CC95 isolates with respect to the isolate's source (nominal logistic regression: year, P ϭ 0.07; source, P ϭ 0. 26; year-source interaction, P ϭ 0.67) (Fig. 1B).
CC95 pangenome analysis. (i) The pangenome. There were a total of 17,603 genes among 200 CC95 strains recovered from feces and extraintestinal sites at a variety of geographic locations. The core genome (present in Ն99% of strains) consisted of 3,134 genes: 534 genes were found in from 95% to Ͻ99% of strains, 1,196 were found in 30% to Ͻ95% of strains, and 11,934 were found in fewer than 15% of the strains. Over 5,400 genes were found in only a single isolate, and 2,210 genes were found in all 200 isolates, with the balance of the genes being present in Ͼ1 or Ͻ200 isolates, giving rise to a U-shaped distribution for the number of isolates in which a gene is present (Fig. 2).
(ii) Core genome variation. The phylogeny inferred for these 200 CC95 isolates revealed, as expected, significant structure within CC95, with the observed clusters correlating largely with a strain's serotype (Fig. 3). Five main clusters were evident. The cluster of strains with an O1:H7 or an O2a:H7 serotype was designated subgroup A. The cluster of strains with an O18:H7 serotype, including UTI89, was designated subgroup B. The cluster of O1:H7 strains, which also included O25b:H4 and O2a:H7 strains, was designated subgroup C. The cluster of strains that included those with O45a:H7, O1:H7, and O2a:H4 serotypes was designated subgroup D. The cluster of strains with an O2a:H4 serotype was designated subgroup E. Additionally, 9 (5%) of the 200 strains examined did not fall into one of these five subgroups, and 4 of these strains with the H5 antigen and fimH allele 15 were distinct from the other CC95 strains (Fig. 3). Within four of the five CC95 subgroups, strains that shared the same serotype also tended to share the same fimH allele (21). The sole exception was the subgroup B strains, which despite all being O18:H7 exhibit four different fimH alleles (Fig. 3).
Subgroup D comprised strains of diverse serotypes. However, there seemed to be little value in further subdividing subgroup D, as all of the subgroup D strains were separated from the other subgroups by long branches. Additionally, a practical reason for not subdividing subgroup D is that although CC95 strains are relatively common, they will never represent a large fraction of isolates in any strain collection, and smaller sample sizes mean less statistical power.
(iii) Variable gene content. Analysis of variable genes showed that the substructure revealed by the core genome phylogeny was reflected in the variable gene content of the strains (Fig. 4). Subgroups A, B, and C had variable gene contents that were distinct from those of the other phylogroups. The subgroup E strains were an exception, since despite being tightly clustered and distinct according to the core genome, they showed some variable gene content overlap with subgroup D strains.
Multiple genes within the variable genome have been identified as potentially enhancing a strain's propensity to cause extraintestinal disease or colonize a host (22,23). In silico analysis of 40 such virulence factors among the 200 CC95 isolates showed that afdA, as well as focG, lpfA, and terC, is absent from all strains (see Table S1 in the supplemental material). In contrast, all CC95 strains have fimH, fyuA, and ompT. Be-tween these extremes, less than 5% of CC95 strains have the gene sfa, iha, or tsh or harbor the plasmid encoding microcin B17 or colicins B and M, whereas more than 95% have the virulence genes neuC, sitA, usp, and vat. Other virulence genes have midrange prevalence values. As with serotype, the presence of many virulence genes varied in relation to the subgroup membership of the CC95 strain (Table 1). Overall, subgroup A strains possessed the colibactin gene cluster, the self-adhesion locus antigen 43, and the toxins tcpC and senB. Subgroup B strains also have the colibactin locus, but in addition possessed the invasion of brain epithelium gene ibeA and were likely to have the invasion determinant tia and the toxin determinants cnf1, hylD, and cdiA. Relative to strains belonging to subgroups A and B, strains belonging to subgroups C, D, and E had FIG 2 Distribution of the genes in the pangenome of E. coli CC95. The number of genes found in 1 to 200 strains, respectively, is shown. All genomes were annotated using Prokka, and the pangenome was determined using ROARY.

FIG 3
Substructure of 200 E. coli CC95 strains. Group B2 strain ED1a was used as an outgroup (not shown). The single nucleotide polymorphisms (SNPs) were detected using the Harvest Suite (43) of tools and ED1a as the reference strain. Gubbins (53) was used to infer recombination events, and recombinant sites were removed. A maximum likelihood tree was inferred with a general time-reversible (GTR) model of evolution using MEGA 6.0 (54). more similar extraintestinal virulence factor profiles, but differed from subgroup A and B strains in having the plasmid-encoded bacteriocins colicin Ia and microcin V. Subgroup C, D, and E strains were also very likely to possess the iron-uptake-related genes, ireA, iroN, and iucAC. Subgroup C strains typically hosted a colicin E1 plasmid, while subgroup D and E strains did not. Most subgroup D strains possessed antigen 43, while subgroup C and E strains did not.
(iv) Antibiotic resistance determinants. None of the 200 CC95 isolates had the chromosomal mutations in gyrA or parC associated with fluoroquinolone resistance. To investigate the extent to which the presence of resistant determinants that are typically plasmid associated varied among CC95 subgroups, the analysis was restricted to the 83 Australian strains from humans for which whole-genome sequence (WGS) data were FIG 4 Similarity among E. coli CC95 isolates based on the variable genome. Genes present in all genomes or in a single strain were eliminated. Among-strain similarity was quantified using a Jaccard metric, and the dimensions of the matrix were reduced using principal-coordinate (PCO) analysis as implemented in PAST. Turquoise denotes subgroup A strains, purple B strains, orange C strains, pink D strains, green E strains, and black unassigned strains. available. Preliminary analysis indicated that presence of particular resistance determinants varied in relation to both serotype and CC95 subgroup. Consequently, for statistical comparisons the subgroup A strains were split into serotypes O1:H7 and O2a:H4, and the subgroup C strains were split into serotypes O1:H7 and O25b:H4. Subgroup D and the unassigned strains were excluded due to small sample sizes. Statistical analysis of individual plasmid-borne antibiotic resistance determinants was restricted to those determinants observed 6 or more times (the number of subgroup/ serotype categories).
Intrinsic extraintestinal virulence of CC95. Phylogroup B2 strains generally are known to be highly virulent in mouse models of extraintestinal infection (24). The virulence of 58 CC95 strains isolated from Australia, France, and the United States, which included representatives of all five CC95 subgroups, was assayed using the mouse sepsis model. Consistent with their expected high virulence, 82% of strains killed all of the mice challenged with the strain. The strains' subgroup membership explained some of the variation in ability to kill 100% of tested mice (contingency table analysis: likelihood ratio, 2 ϭ 12.27, P ϭ 0.002). Killing of all tested mice was observed for all subgroup D strains (n ϭ 12), 94% of subgroup B strains (n ϭ 18), and 62% of subgroup A strains (n ϭ 24). Strains belonging to CC95 subgroups E and C also killed 100% of mice tested, but were not included in the preceding analysis, as only a total of 6 strains from these two subgroups were tested.
Multiplex PCR for CC95 subgroup assignment. The pangenome analysis revealed which genes were present in all members of a subgroup, but rare or absent in the other subgroups. This allowed a multiplex PCR screening tool to be developed for subgroup identification (see Materials and Methods for details and Fig. 5 for an example of the assay). Application of the multiplex method to those strains from Australia and France for which whole-genome sequence data were available showed that the method correctly assigned 92% of the strains to the appropriate subgroup ( Table 3).
Epidemiology of CC95 subgroups. The availability of simple PCR-based screening methods for determination of the CC95 subgroup membership of CC95 strains allowed collections of CC95 isolates from Australia, France, and the United States to be screened for the relative abundance of the CC95 subgroups. This in turn allowed an analysis of factors influencing the relative abundance of CC95 subgroups.  (20) and again in 2014 and 2015 yielded 172 CC95 strains, distributed by subgroup as follows: 48% A, 18% B, 18% C, 3% D, 9% E, and 4% indeterminate. Host age and sex were known for 147 CC95 strains, in addition to the strain's source and year of isolation. Strains were classified as having either a fecal or extraintestinal (blood or urine) source. Strains of subgroups D and E and unassigned strains were uncommon and so were pooled (Fig. 1C). Nominal logistic regression analysis revealed that the relative abundance of CC95 subgroups varied significantly with strain source, year of isolation, and host sex, but not host age (isolate source, P ϭ 0.04; year of isolation, P Ͻ 0.001; host sex, P ϭ 0.01; year of isolation-host sex interaction, P ϭ 0.003). (Backwards elimination was used to eliminate nonsignificant terms, and only statistically significant terms are presented.) The underlying prevalence values were inspected to determine the basis for these results. For the 2002 isolates, subgroup B strains were isolated more frequently from extraintestinal infection and from males than females. For the 2015 isolates, subgroup  A strains were more frequent among fecal isolates than extraintestinal isolates, subgroup B strains were confined to females, and subgroup C strains were more common among females than males. In comparison with the 2002 isolates, the 2015 isolates had a lower frequency of subgroup B and a higher frequency of subgroup C, the latter reflecting an increase in the frequency of both the O1:H7 and O25b:H4 serotypes.
(ii) Distribution of CC95 subgroups in France. Four collections of E. coli isolates from France (two of blood isolates and two of fecal isolates) were screened for CC95 strains, and the subgroup membership of the identified CC95 strains was determined. The blood isolates were collected in 2005 (25) and in 2005 to 2007 (13), while the fecal isolates were collected in 2000 (26) and 2010 (27). The 208 CC95 strains detected in these collections were distributed by subgroup as follows: 4% A, 12% B, 4% C, 69% D, and 9% E, and 1 strain could not be assigned to a subgroup.
Although the two sets of blood isolates were collected at similar times, they were taken from either adults or children, while the two sets of fecal isolates were collected 10 years apart. Thus, sample sizes were insufficient to permit an analysis that incorporated the effects of isolate source, host age, and year. However, a significant difference in the relative abundance of CC95 subgroups was detected when comparing the CC95 fecal and blood isolates (contingency table analysis: likelihood ratio, 2 ϭ 12.46, P ϭ 0.01). Among the 174 CC95 blood isolates, 4% were subgroup A, 10% B, 3% C, 74% D, and 16% E, and 1 isolate was unassigned. In contrast, among the 33 CC95 fecal isolates, 6% were A, 26% B, 12% C, 47% D, and 9% E. Whereas the CC95 blood strains belonged overwhelmingly to subgroup D, the fecal strains, although also most likely to belong to subgroup D, also frequently belonged subgroup B.
In these collections, host age and sex were known for 147 CC95 strains isolated from blood. The relative abundance of CC95 subgroups varied with host sex but not with host age (nominal logistic regression: age, 2 ϭ 5.06, P ϭ 0.28; sex, 2 ϭ 17.55, P ϭ 0.0015; age-sex interaction, 2 ϭ 6.15, P ϭ 0.188). Among the 114 CC95 blood isolates recovered from females, 3% were A, 6% B, 3% C, 77% D, and 11% E. In contrast, among the 33 blood isolates from males, 6% were A, 30% B, 0% C, 55% D, and 9% E. Although subgroup D isolates were the most likely subgroup to be recovered from females, subgroup D isolates were less likely to be observed among blood isolates from males and subgroup B isolates were more frequent.
(iii) Distribution of CC95 subgroups in the United States. Seven collections of E. coli strains from the United States acquired between 1981 and 2015 from human feces, blood, urine, and other extraintestinal sites and diverse localities (22,(28)(29)(30)(31)(32), were screened for CC95, and the subgroup membership of the identified CC95 strains was determined. The 146 detected CC95 strains were distributed by subgroup as follows: 65% A, 27% B, 3% D, and 4% E. None represented subgroup C.
A comparison of CC95 strains from blood and urine identified a significant source effect on the relative abundance of CC95 subgroups (contingency table analysis: likelihood ratio, 2 ϭ 6.46, P Ͼ 2 ϭ 0.039). The blood isolates belonged mainly to subgroup A (70%), with only a small minority from subgroup B (13%) or other backgrounds (17%). In contrast, although the urine isolates likewise belonged mainly to subgroup A (62%), 33% belonged to subgroup B, and only 5% represented other backgrounds.
The 104 urine isolates were collected over 27 years and exhibited a significant change in relative subgroup abundance over this period (nominal logistic regression: year of isolation, 2 ϭ 5.06, P ϭ 0.28; sex, 2 ϭ 17.55, P ϭ 0.0015; age-sex interaction, 2 ϭ 15.09, P Ͻ 0.001). Specifically, subgroup A declined in relative frequency, with a concomitant increase in the other subgroups.
(iv) Among-country subgroup frequency comparisons. The relative abundance of CC95 subgroups varies with the strain's source of isolation (blood, urine, or feces) (Fig. 1C), and the available CC95 data are quite biased among countries with respect to isolate source. Therefore, comparisons of the relative abundance of different subgroups among CC95 strains from Australia, France, and the United States were restricted to strains from the same sources.
In these specimen-type-stratified comparisons, among the urine isolates, those from the United States were predominantly from subgroups A and B, while those from Australia were predominantly from subgroups A and C (Table 4). Among the blood isolates, those from France were predominantly from subgroup D, whereas none of those from Australia were from subgroup D. Among the fecal isolates, those from France were mostly from subgroup D, but subgroup B isolates were also quite common, whereas those from Australia were mostly from subgroup A.
Virulence factor data obtained through PCR screening were available for many of the non-Australian CC95 strains as well as the Australian strains. However, given the very nonrandom distribution of CC95 subgroups among countries, there were insufficient sample sizes to compare strains belonging to all subgroups among all countries. Therefore, a comparison of the distributions of virulence factors was restricted to CC95 subgroup B (Table 5). These analyses revealed that the virulence trait profile of subgroup B CC95 isolates varied with the country of origin.
Of the observed geographic differences in virulence trait profiles, the most striking related to the pap operon. Specifically, the pap operon was absent in most isolates from France but present in most isolates from Australia and the United States, whereas among pap-positive isolates, all of those from France and the United States had papG allele III, while those from Australia had papG allele II. As for other virulence factors, the salmochelin locus (iroN) was less frequent among isolates from Australia than those from France or the United States, the aerobactin locus (iut) was common among isolates from France, but absent from most isolates from the United States, and cytotoxic necrotizing factor 1 (cnf1) was uncommon among isolates from France, but present in almost all isolates from the United States.

DISCUSSION
E. coli CC95 has a cosmopolitan distribution, as it has been recovered from every continent, including Antarctica. This clonal complex has a narrow host range-one seemingly restricted to humans and human-associated birds and mammals, such as commercial poultry and companion animals. Although data are limited, the available  evidence indicates that CC95 strains are capable of persisting in the human gut for extended periods, as has been observed for other B2 strains, as residence times of 2 to 4 years have been observed (33). Strains of the complex are significant extraintestinal pathogens of humans and globally appear to account for about 17% of infections caused by E. coli. The pangenome of CC95 was estimated to be in excess of 17,000 genes, a value similar to that found in a collection of 20 strains representative of E. coli as a whole (34). The distribution of these genes among genomes was also very similar to that reported by Touchon et al. (34); about 21% of genes were present in Ͼ95% of the isolates, while 68% were present in Ͻ15% of isolates.
The existence of genetic substructure in CC95 has long been recognized (35,36), and based on variation in the core genome, it appears that the great majority of ST95 isolates belong to one of five subgroups. The subgroups are nonrandomly distributed geographically, with subgroup D strains being encountered more commonly in Europe than in Australia or North America. Subgroup C strains with an O1:H7 serotype may be endemic to Australia, as they have not been observed in other parts of the world. The distribution of extraintestinal virulence traits also varies among subgroups and likewise in relation to geographic origin. These geographic differences extend to different alleles of a particular gene, as noted for papG allele III, which is far more likely to be observed among CC95 isolates from the United States than those from Australia.
The frequency of CC95 in human populations would be expected to change if the frequency of phylogroup B2 strains changed, as has been the case in France over the past 30 years (27). In the Australian E. coli collections, however, there was no change in the frequency of phylogroup B2 strains between 2002 and 2014, irrespective of a strain's origin (feces, urine, or blood) (Fig. 1A). Similarly, there was no change in the frequency of CC95 strains in the Australian collections with respect to either sample date or a strain's origin (Fig. 1B). Similarly, Day et al. (37), in examining the clonal composition of E. coli strains causing bacteremia in the United Kingdom and Ireland between 2001 and 2010, found that CC95 strains were responsible for 11.3% of bacteremia cases overall, with no real change in frequency over time.
In stark contrast to the apparent temporal stability within E. coli of the relative abundance of CC95 at the clonal complex level is the variability within CC95 of the relative abundance of the various CC95 subgroups between 2002 and 2104 in Australia (Fig. 1C) and between 2011 and 2015 in the United States. However, the natures of the changes on the two continents were different. In Australia, subgroup B isolates decline in frequency, while subgroup C isolates became more common, whereas in the United States, subgroup A declined in frequency and the frequency of subgroup B was unchanged.
The substructure that exists in CC95 is epidemiologically relevant as, depending on geographic location, strains of some subgroups are more likely to cause septicemia than urinary tract infection or are more likely to be recovered from feces than from extraintestinal sites. Members of some subgroups also appear to less likely to be recovered from males or more likely to be recovered from extraintestinal sites in females. Additional support for the epidemiological relevance of the subgroups comes from results from the mouse sepsis model, which reflects a strain's extraintestinal virulence. These results indicate that although most CC95 strains are highly virulent, some are not, and that some of this variation in virulence is explained by a strain's subgroup membership.
Collectively, the available evidence indicates that although strains belonging to CC95 may be cosmopolitan, human movement patterns have been insufficient to homogenize the distribution of the CC95 subgroups. Rather, the manner in which CC95 strains evolve appears to vary both spatially and temporally. The observation that the relative frequency of CC95 subgroups at a single locality has changed over time indicates that the relative fitness of the subgroups has changed, although whether such changes are relative to other members of the clonal complex or to other strains of E. coli is unknown. Also unknown is the extent to which these apparent changes in relative fitness are a consequence of direct competition between members of the different subgroups and of how the different subgroups respond to changes in the host/external environment. Stochastic effects undoubtedly also play a role, as might be the case for the dominance of papG allele III among subgroup B strains in North America.
The ST131 clonal complex (CC131) is another very common human-associated E. coli lineage. However, few similarities are apparent between the evolution of CC131 and CC95. CC131 is a genetically more diverse clonal complex than CC95, but much of the success of CC131 is due to the evolution and spread of two very closely related, virtually monomorphic, fluoroquinolone-resistant lineages with an O25b:H4 serotype, known as H30R/C1 and H30Rx/C2 (9,10). These lineages were virtually unknown prior to the turn of the century but have now spread worldwide and are currently the E. coli lineages most likely to be responsible for extraintestinal infection (38). Although a number of hypotheses have been proposed to explain the success of these ST131 lineages, such as enhanced virulence and antibiotic resistance, none have strong empirical support.
Collectively, the evidence indicates that although CC95 may be a temporally stable pandemic lineage of E. coli that is often responsible for extraintestinal disease in humans, its evolution bears little resemblance to that of other pandemic pathogenic lineages, such as Salmonella enterica serovar Typhi, Shigella sonnei, or even E. coli CC131. Rather, the evolution of CC95 is shaped by local conditions operating on at least a continental scale and over relatively short time frames. Moreover, the differences among the various CC95 subgroups are epidemiologically relevant. The analytical approaches that have been used so successfully to investigate the evolution and spread of other human pathogens, such as Yersinia pestis and Salmonella enterica serovar Agona (39), are unlikely to be as successful for CC95 strains, given the extent of spatial and temporal variation observed in this complex.
The pangenome analysis revealed that the genetic structure of Escherichia coli is fractal in nature. At the species level, E. coli exhibits considerable genetic structure (phylogroups), as discerned by examining the phylogenetic relationships among strains based on genes of the core genome. This substructure is reflected also in the similarity analyses based on the species' variable gene content (40). The distribution of genes is such that a minority of genes of the pangenome are present in all strains, while most genes in the pangenome are present in just one or a few strains. Notably, the same patterns are observed among the strains within a single clonal complex. CC95 exhibits substructure in its core genome, and this substructure is reflected in the variable genome of CC95, with most genes present in few isolates and a minority of genes present in all isolates. The results of this analysis of CC95 pangenome and of the analysis by Touchon et al. (34) of the E. coli pangenome cannot be compared directly due to differences in methodology. However, the size of the pangenome of CC95 is certainly close to that of the species as a whole. It is remarkable that the same pangenome can reflect the core genome structure observed for both the whole species and a clonal complex within the species. Further studies are required to elucidate the evolutionary processes that might lead to the observed patterns.

MATERIALS AND METHODS
Determining the CC95 pangenome. The nature of the pangenome of CC95 was investigated using WGS data for 200 CC95 strains. These included a collection of strains from Australia (n ϭ 114) and France (n ϭ 17) for which WGS data (Roche 454 GS FLXϩ system, Illumina MiSeq, and Illumina HiSeq2000) were available plus a collection of CC95 strains (n ϭ 66) as identified by Deshpande et al. (55) for which WGS data were available from NCBI, as well as three CC95 reference strains (APECO1, S88, and UTI89) ( Table S1). The assemblies and annotations for all non-NCBI strains are available in Enterobase (http:// enterobase.warwick.ac.uk/).
The assembled genomes were annotated using Prokka (41), and ROARY (42) was used for the pangenome analysis. The Harvest suite of tools (43) was used to align the strains and visualize the inferred phylogenetic relationships of these 200 CC95 strains. As expected, the phylogeny revealed significant structure within CC95, and the observed clustering largely correlated with a strain's serotype (Fig. 3).
Subgroup detection using PCR-based typing. SCOARY (47) was used to identify genes found be present in virtually all members of a subgroup and rare or absent in all other subgroups. The resulting gene sets were then examined for genes likely to be suitable for PCR targeting.
A hypothetical protein-coding gene was found to be present in all subgroup A strains and absent in all but three other CC95 strains (B525, E8766, and E9345). A different hypothetical protein-coding gene was found to be present in all subgroup C strains, although this gene was also present in three subgroup D strains. A glycosyltransferase group 2 family protein-coding gene was unique to subgroup B strains. Subgroup D strains were defined by the presence of yjgJ, a putative transcriptional regulator-coding gene. Subgroup E strains were defined by the E. coli restriction-modification enzyme type I M subunit (hsdM) gene. The nucleotide sequences for these genes are presented in Table S2 in the supplemental material.
Primers were designed to specifically target each of these genes, such that fragments of each unique gene could be amplified in a single multiplex PCR (see Table S3 in the supplemental material). Distribution of CC95 and CC95 subgroups. A number of existing E. coli strain collections acquired for a variety of reasons and over a number of years from hosts living in Australia, France, and the United States were screened to determine which strains in the collections belonged to CC95. Additional details of these collections are presented in the Results section. The phylogroup membership of these strains had been determined previously. CC95 identity was determined by either multilocus sequencing typing or PCR-based screening using one or more of the published CC95 detection methods (7,48,49). The methods used for PCR-based virulence factor screening are described in the references cited for each of the strain collections used in this study.
All CC95 strains identified in the various strain collections were then screened using the CC95 subgroup multiplex PCR, together with PCR-based O typing (50) and H typing for H7 (51) and H4 (52). Any strain yielding an svg product together with a subgroup C and subgroup D PCR product was considered a subgroup D strain (Fig. 5).
Intrinsic extraintestinal virulence. The virulence of CC95 strains was tested in a mouse model of sepsis following neck subcutaneous inoculation of 2 ϫ 10 8 bacteria as described by Johnson et al. (24). Briefly, 10 outbred female Swiss mice were inoculated per strain, and death was monitored up to 7 days after inoculation. In each experiment, two E. coli control strains were systematically tested: K-12 strain MG1655, which does not kill mice, and strain CFT073, which kills 100% of inoculated mice. Experiments conducted in France (27 strains belonging to all subgroups) followed the European and national regulations for housing and care of laboratory animals after pertinent review and approval by the Bioethics Committee at Santiago de Compostela University and by the French Veterinary Services (certificate no. A 75-18-05). Experiments conducted in Minnesota (31 strains belonging to subgroups A, B, and D) followed federal regulations for ethical care and use of laboratory animals, with approval by the local Institutional Animal Care and Use Committee (protocol 120603).
Statistical analysis. Factors influencing the relative frequency of CC95 subgroups were investigated using either contingency table analysis or nominal logistic regression where appropriate. For analyses involving multiple factors, nonsignificant terms were dropped using backwards elimination, and only significant terms were reported.
Accession number(s). The raw sequence read files have been deposited in NCBI and are associated with BioProject PRJNA385370, accession no. SRX2786117 to SRX2786219.

ACKNOWLEDGMENTS
This material is based in part on work supported by Office of Research and Development, Medical Research Service, Department of Veterans Affairs (grant 1 I01 CX000920-01 to J.R.J.). This work was partially supported by a grant from the Fondation Pour la Recherche Médicale (Équipe FRM 2016, DEQ20161136698) to E.D.