O-Antigen Diversification Masks Identification of Highly Pathogenic Shiga Toxin-Producing Escherichia coli O104:H4-Like Strains

ABSTRACT Shiga toxin-producing Escherichia coli (STEC) can give rise to a range of clinical outcomes from diarrhea to the life-threatening systemic condition hemolytic-uremic syndrome (HUS). Although STEC O157:H7 is the serotype most frequently associated with HUS, a major outbreak of HUS occurred in 2011 in Germany and was caused by a rare serotype, STEC O104:H4. Prior to 2011 and since the outbreak, STEC O104:H4 strains have only rarely been associated with human infections. From 2012 to 2020, intensified STEC surveillance was performed in Germany where the subtyping of ~8,000 clinical isolates by molecular methods, including whole-genome sequencing, was carried out. A rare STEC serotype, O181:H4, associated with HUS was identified, and like the STEC O104:H4 outbreak strain, this strain belongs to sequence type 678 (ST678). Genomic and virulence comparisons revealed that the two strains are phylogenetically related and differ principally in the gene cluster encoding their respective lipopolysaccharide O-antigens but exhibit similar virulence phenotypes. In addition, five other serotypes belonging to ST678 from human clinical infection, such as OX13:H4, O127:H4, OgN-RKI9:H4, O131:H4, and O69:H4, were identified from diverse locations worldwide. IMPORTANCE Our data suggest that the high-virulence ensemble of the STEC O104:H4 outbreak strain remains a global threat because genomically similar strains cause disease worldwide but that the horizontal acquisition of O-antigen gene clusters has diversified the O-antigens of strains belonging to ST678. Thus, the identification of these highly pathogenic strains is masked by diverse and rare O-antigens, thereby confounding the interpretation of their potential risk.

from 2012 to 2020. This strain collection included a stool sample isolate (17-07187) from a 6-year-old girl who had bloody diarrhea and HUS in December 2017. She had not traveled outside her home in Northwest Germany before becoming ill. Serotyping and whole-genome sequencing (WGS) revealed that the strain belonged to an unusual serotype, O181:H4, that had not been previously associated with HUS. The strain had stx 2a but lacked the LEE pathogenicity island (marker gene eae). Furthermore, the strain carried characteristic EAEC markers, including aatA, aggR, AAF/I genes, and the autotransporter protease genes pic, sigA, and sepA ( Fig. 1A; see also Table S1 in the supplemental material).
The genomes of STEC O181:H4 strain 17-07187 and the O104:H4 outbreak strain differ mainly in their O-antigen gene clusters and mobile genetic elements. The chromosomes (without plasmids) of O181:H4 isolate 17-07187 and O104:H4 outbreak strain FWSEC0009 were very similar (;99.7% nucleotide identity in the core genome that is shared between these strains). The most striking difference between them was their respective O-antigen gene clusters (OAGCs) ( Fig. 2A and B). Although these two clusters were both situated at the same location in the chromosome, between galF and hisI (Fig. 2B), their gene contents and organizations were very different. Furthermore, their respective GC contents, 36.8% for O181 and 37% for O104, differed from the chromosome GC content (;50.7%), highlighting the likely role of lateral gene transfer in driving OAGC exchange.
Fourteen potential prophage regions are present in the O181:H4 isolate, and 16 are present in the O104:H4 outbreak strain ( Fig. 2A and Table S3). Eleven of the fourteen prophages exhibited substantial sequence identity (83 to 99.9%) to their O104:H4 counterparts, and importantly, the stx 2 -carrying prophages were nearly identical (;99.9% nucleotide identity) ( Fig. 2C and Table S4). Both stx phages are inserted into the tryptophan repressor binding protein gene wrbA.
The virulence gene profiles of the 34 non-O104 ST678 strains were generally similar to that of the O104:H4 outbreak strain; however, there were a few differences. Specifically, sepA was found exclusively in AAF/I-positive strains, and EAST1 was present in all AAF/IIIpositive strains and only three of the AAF/I-positive isolates (Table S8).
The 34 non-O104:H4 ST678 strains were isolated in countries of Europe, Africa, and North and South America. In addition, several were from individuals with a travel history that might link these to East Asia ( Fig. 3A and Fig. S2B and C). Together, these observations show that ST678 E. coli strains are found among seven different E. coli serotypes that have been linked to diarrheal disease on several continents.
Phylogenomic analyses of ST678 E. coli strains suggest the occurrence of multiple O-antigen gene exchange events. The OAGCs encoding the six O groups in the 34 non-O104:H4 ST678 strains are found at the same chromosomal location as the 2011 O104:H4 outbreak strain but are composed of largely disparate genes (Fig. 4A). These clusters also have a GC content (36.8 to 42.1%) distinct from that of the backbone genome (50.5 to 50.7%), suggesting that they were acquired by horizontal gene exchange. To explore the phylogenetic relationships among the 34 non-O104:H4 ST678 strains and a set of O104:H4 strains, their shared single nucleotide polymorphisms (SNPs) were analyzed using O104:H4 outbreak strain FWSEC0009 as a reference. A positive correlation (R = 0.81; R 2 = 0,66) was found between the isolation time and genetic divergence (Fig.  S2D), which supports that mutations have accumulated in a clocklike fashion without notable outliers. Phylogenetic analysis shows that AAF/I-positive strains within ST678 are closely related to each other, clustering into clade I, whereas AAF/III-positive strains are more diverse and form more basal branches in a rooted maximum likelihood phylogenetic tree (Fig. 4B). This structure of the phylogeny suggests that clade I strains were derived from an AAF/III-positive precursor. Within clade I, subclades Ia and Ib contain some non-O104:H4 strains. Clade Ia is composed of O104:H4 nonoutbreak strains isolated from 2015 to 2021 in the United Kingdom and Kenya and the 2018 OX13:H4 strain that was associated with travel to Ethiopia. Clade Ib contains all 21 O181:H4 strains and 3 O127:H4 strains isolated from diverse continents from 2011 to 2021. Since the majority of the ST678 strains are of serotype O104:H4, including basal phylogenetic branches and most isolates in clade I (Fig. 4B), it is parsimonious to assume that subclade Ib emerged from an O104:H4 predecessor by replacing O104 antigen genes with O181 antigen genes. Similarly, our phylogenetic analysis suggests that O127:H4 strains within clade Ib arose from an O181:H4 precursor. Together, these observations indicate that OAGC exchange has occurred repeatedly within ST678 strains.

DISCUSSION
The E. coli O104:H4 outbreak in Germany in the early summer of 2011 was a public health emergency; however, this serotype was rarely isolated as a cause of HUS after the epidemic subsided. Nevertheless, our findings suggest that the unusual set of virulence factors that characterized this Shiga toxin 2-producing EAEC strain remains a threat to human health. Serotype conversion has cloaked this highly virulent genotype with several O-antigens, including O181, O127, and OX13, which are present in both stx-positive and stx-negative disease-linked isolates closely related to the O104:H4 outbreak strain. We found non-O104:H4 ST678 strains from a variety of countries in Europe, Africa, and the Americas, and several of the cases were associated with travel to Africa and Asia, suggesting that these virulence-associated strains are globally distributed. Like the O104:H4 outbreak strain, the AAF/I-positive ST678 strains had similar chromosomes, similar pAA-linked virulence genes, as well as similar virulence factors, including the SPATEs (5,7,8,12). The most salient difference in the chromosomes of these strains from that of the outbreak strain was their respective OAGCs and other mobile genetic elements. Thus, the horizontal exchange of these clusters appears to have been a critical step in the evolution of these new pathogenic serotypes, some of which were linked to HUS or bloody diarrhea.
The prime example uncovered here is the likely derivation of O181:H4 pathogens from an O104:H4 precursor via OAG exchange. Among the 21 O181:H4 strains, 6 harbored an stx 2a -carrying prophage, including HUS-linked strain 17-07187. In four of these strains, the stx 2a prophage was very similar to the stx 2a prophage of the 2011 outbreak strain (Fig. 3A), suggesting that the O-antigen exchange was a more recent event in their evolution than the acquisition of the Shiga toxin-encoding prophage. The absence of stx prophages in 15 of the O181:H4 isolates may be due to a lack of stx prophage acquisition or may have resulted from the loss of their stx prophages, which is well documented in STEC isolates of other serotypes (23). Thus, in addition to O-antigen exchange, the ongoing horizontal transmission of mobile genetic elements such as stx phages has contributed to the diversification of diarrheagenic ST678.
OAGCs of Gram-negative bacteria are hot spots for diversifying selection and recombination events (15,19). Serotype conversion by the lateral exchange of OAGCs has played an important role in the evolution of enteric pathogens. For example, Vibrio cholerae serogroup O139, which arose from the exchange of O1 and O139 OAGCs, transiently replaced the dominant O1 group as the cause of cholera from 1992 to 1993 (13,24). Our discovery of seven distinct O groups that share the flagellar H4 antigen and a highly similar virulence-linked genetic makeup provides a compelling example of the role of O-antigen exchange in the diversification of diarrheagenic E. coli. In addition, STEC O157:H7 is thought to have arisen from an enteropathogenic E. coli (EPEC)-like O55:H7 strain that initially acquired stx via phage transduction and subsequently acquired the O157 O group by the exchange of the O55 with the O157 OAGC (25). Also, an O182-O156 switch is thought to have occurred in STEC strains persistently infecting cattle (18). Consequently, different OAGs may be found in highly related genotypes (26,27).
We can only speculate about the conditions driving OAGC exchange among ST678 strains. Shiga toxin-producing O104:H4 EAEC strains, such as the 2011 outbreak strain, are considered human-restricted pathogens and have not been isolated from animals such as cattle (28). However, interestingly, a recent description highlighted the detection of an STEC O104:H4 strain in pork (29). It is possible that OAGC exchange occurred in a human host where the human intestinal microbiome may contain E. coli strains of O groups such as O181 and O127 that could have donated their OAGC to an O104:H4 recipient by some means of horizontal exchange. Also, epidemiological investigations suggest that fenugreek sprouts were the food source that initiated the STEC O104:H4 epidemic; therefore, plants colonized by microbiota may be another possible site for OAGC exchange (4).
Conclusion. In conclusion, our study highlights how the lateral exchange of OAGCs can lead to the rapid diversification of a globally important pathogen. Furthermore, an important clinical implication of these findings is that serotype surveillance cannot be used as a simple proxy for strain virulence and needs to be complemented by virulence gene or genome analysis. Surveillance to uncover how highly virulent strains may reemerge and spread in new O-antigen outfits is warranted.

MATERIALS AND METHODS
Study strains. In the context of intensified STEC surveillance in Germany, clinical isolates were collected at the National Reference Center for Salmonella and Other Bacterial Enteric Pathogens and analyzed for serotype, stx and subtypes, eaeA, hlyA, and aatA as described previously (30). Strains were grown on nutrient agar (Oxoid GmbH, Germany), Luria-Bertani (LB) broth, or enterohemolysin agar (Sifin GmbH, Germany). Genome sequencing was carried out on a subset of the strains, and further genome sequence data for the strains were gathered from EnteroBase and the NCBI (see Table S1 in the supplemental material).
Bioinformatics analyses. The de novo assembly of the PacBio sequence data (103-fold average coverage) was performed by GATC Biotech by utilizing HGAP3 (Pacific Biosciences, USA). Polishing of the assembled genome and plasmids was performed with Illumina short reads by using Pilon (version 1.22) (31). Quality control and trimming of MiSeq raw reads with the subsequent detection of the serotype and virulence genes were performed as described previously (16). Genomic comparisons were carried out using MAUVE (version 1.1.1) and MAFFT (version 1.3.7) as plug-ins in Geneious (version 11.1.5; Biomatters Ltd.) (32,33). Gaps within the MAFFT alignment were excluded using the Geneious mask alignment function. Ridom SeqSphere1 (version 7.2.0; Ridom GmbH, Germany) was used to determine MLST Warwick sequence types and to create minimum-spanning trees based on 2,513 allele targets (here, genes) from the E. coli cgMLST EnteroBase scheme with pairwise ignoring of missing values from genome assemblies (20). Different variants of the cgMLST genes among different genomes are represented as ADs. Phage prediction was carried out by analyzing the genome sequences using PHASTER (34). RAST was used for coding sequence (CDS) annotation (35).
SNP-based alignment and maximum likelihood-based phylogenetic tree. The mapping of sequencing reads, the generation of consensus sequences, and alignment calculations were performed using the BatchMap pipeline with the FWSEC0009 genome sequence as a reference (36). Gubbins (version 3.2.1) was used to identify loci containing elevated densities of base substitutions (putative recombinations) within the alignment while concurrently constructing an alignment and phylogeny (RAxML tree) based on the putative point mutations (SNPs) outside these regions (37). The alignment generated by Gubbins was used to create a maximum likelihood-based phylogenetic tree using PhyML 3.3.20180214 (Geneious plug-in, substitution model HKY85, and 100 bootstraps) (38).
Temporal signal and "clocklikeness" of molecular phylogenies. TempEst was used to analyze the RAxML tree generated by Gubbins in conjunction with collection year data to validate the molecularclock assumption (39). The best-fitting root was chosen for linear regression analyses.
Cytotoxicity, adherence, and infection assays. The viability of Vero cells after inoculation with diluted bacterial culture supernatants (1:200) was examined using 3-(4,5-dimethylthiazole-2-yl)-2,5-diphenyltetrazolium bromide (MTT) (5). Bacteria and Vero cells were prepared as described previously (30). Adherence to HEp-2 cells was performed as described previously (36). For infant rabbit infection assays, litters of mixed-gender 2-day-old New Zealand White infant rabbits with the lactating doe were acquired from Charles River Canada (strain code 052). Infant rabbits were orogastrically inoculated on the day of arrival with 10 9 CFU of streptomycin-resistant O104:H4 strain C227-11 and O181:H4 strain 17-07187 suspended in 500 mL of 2.5% sodium bicarbonate (pH 9) using a size 4 French catheter as described previously, except that no ranitidine was administered (12). No antibiotics were used prior to or during infection. Infant rabbits were monitored for signs of illness and euthanized at 3 days (68 to 72 h) postinfection. Tissue samples taken from the stomach, small intestine, cecum, and colon were homogenized in sterile phosphate-buffered saline (PBS) using a minibeadbeater-16 instrument (Biospec Products, Inc.), and CFU were determined by serial dilution and plating onto LB medium containing 200 mg/mL streptomycin (12). Ethics statement. Rabbit studies were conducted according to protocols reviewed and approved by the Brigham and Women's Hospital Committee on Animals (IACUC protocol 2016N000334) and Animal Welfare Assurance of Compliance (number A4752-01) in accordance with recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health (40)