Genome-wide association study of hemolytic uremic syndrome causing Shiga toxin-producing Escherichia coli from Sweden, 1994–2018

Shiga toxin-producing Escherichia coli (STEC) infection can cause clinical manifestations ranging from diarrhea to potentially fatal hemolytic uremic syndrome (HUS). This study is aimed at identifying STEC genetic factors associated with the development of HUS in Sweden. A total of 238 STEC genomes from STEC-infected patients with and without HUS between 1994 and 2018 in Sweden were included in this study. Serotypes, Shiga toxin gene (stx) subtypes, and virulence genes were characterized in correlation to clinical symptoms (HUS and non-HUS), and pan-genome wide association study was performed. Sixty-five strains belonged to O157:H7, and 173 belonged to non-O157 serotypes. Our study revealed that strains of O157:H7 serotype especially clade 8 were most commonly found in patients with HUS in Sweden. stx2a and stx2a + stx2c subtypes were significantly associated with HUS. Other virulence factors associated with HUS mainly included intimin (eae) and its receptor (tir), adhesion factors, toxins, and secretion system proteins. Pangenome wide-association study identified numbers of accessory genes significantly overrepresented in HUS-STEC strains, including genes encoding outer membrane proteins, transcriptional regulators, phage-related proteins, and numerous genes related to hypothetical proteins. Whole-genome phylogeny and multiple correspondence analysis of pangenomes could not differentiate HUS-STEC from non-HUS-STEC strains. In O157:H7 cluster, strains from HUS patients clustered closely; however, no significant difference in virulence genes was found in O157 strains from patients with and without HUS. These results suggest that STEC strains from different phylogenetic backgrounds may independently acquire genes determining their pathogenicity and confirm that other non-bacterial factors and/or bacteria-host interaction may affect STEC pathogenesis. Supplementary Information The online version contains supplementary material available at 10.1007/s10096-023-04600-1.


Introduction
Shiga toxin-producing Escherichia coli (STEC) represents a diverse group of E. coli producing one or two different types of Shiga toxin (Stx) [1]. STEC infection causes clinical manifestations ranging from mild, watery diarrhea to bloody diarrhea with severe abdominal pain (hemorrhagic colitis), and potentially fatal hemolytic uremic syndrome (HUS) characterized by the triad of non-immune hemolytic anemia, thrombocytopenia, and acute kidney injury. It has been reported that 5-15% of STEC cases progress to HUS [2,3]. O157:H7 has been considered as the most common serotype associated with severe disease such as HUS. In recent years, the emerging clinical importance of non-O157 serotypes has been noted primarily due to the improvements in diagnostic tests [4][5][6].
The key STEC virulence factor Stx encoded by stx located on bacteriophages can damage intestinal, vascular, and renal cells leading to gastrointestinal and renal diseases [7]. There are two immunologically distinct Stx types, i.e., Stx1 and Stx2, which can be further divided into various subtypes [8]. Different Stx subtypes display dramatic differences in potency [9]. The presence of stx2 especially stx2a subtype (with and without stx2c) correlates highly with the development of HUS, whereas, other Stx1/Stx2 subtypes are linked to mild symptoms [10]. Stx production is essential but not sufficient for STEC virulence. The majority of pathogenic STEC strains, particularly O157:H7 serotype, possess a pathogenicity island known as the locus of enterocyte effacement (LEE), which encodes genes involved in effacement of intestinal epithelial cell microvilli and in intimate adherence between bacteria and the epithelial cell membrane [11]. The major virulence factors encoded on the LEE are intimin (encoded by eae), translocated intimin receptor (tir), and a type III secretion system [11]. STEC strains harbor additional virulence genes that influence their pathogenic potential, such as astA (enteroaggregative E. coli heat-stable toxin 1), toxB (cytotoxin), ehxA (enterohemolysin), and non-LEE encoded adherence genes [12]. The molecular mechanism underlying the pathogenicity among diverse STEC strains remains to be further elucidated.
Previous epidemiological studies have evaluated the risk of development of STEC-associated HUS in correlation to serotypes, stx subtypes, and other virulence factors in STEC strains from Nordic countries such as Finland, Norway, and Denmark [13][14][15][16], with various results. In Sweden, we have previously analyzed a collection of STEC strains from patients with HUS in correlation to clinical outcomes in HUS patients [17] and also strains from STEC-infected patients with/without bloody diarrhea [18], yet, a comparative study between HUS-STEC and non-HUS-STEC strains is lacking in Sweden. Herein, we performed a genome-wide association study on all clinical STEC strains isolated from patients with and without HUS in Sweden between 1994 and 2018, with the aim to identify genetic factors of STEC predicting the potential to cause HUS.

Materials and methods
Collection of STEC isolates and whole genome data STEC isolates were collected from STEC-infected patients with and without HUS in three regions in Sweden between 1994 and 2018; clinical characteristics were described previously [17,18]. Metadata of all STEC isolates used in this study are present in Supplementary Table S1. Genome assemblies of STEC isolates were accessible with accession numbers presented in Supplementary Table S1.
Fisher's exact test using R software version 4.1.1 (https:// www.r-proje ct. org) was used to assess association between stx subtypes/serotypes/virulence genes and HUS status; Benjamini-Hochberg method was used to adjust p values in the case of multiple testing. stx subtypes/serotypes/virulence genes with Benjamini-Hochberg adjusted p value below 0.05 were considered statistically significantly associated with HUS or non-HUS.

Pangenome-wide association study (PWAS)
Genome assemblies were annotated using Prokka v1.14.6 [20]; pangenomes of all STEC isolates were then calculated from genome annotations using Roary (https:// github. com/ sanger-patho gens/ Roary) [21] with the command: roary -s -e -mafft *.gff. Pangenomes consist of a complete set of core and accessory genes in all analyzed isolates [22]. In this study, core genes are defined as genes present in ≥ 99% of isolates; the remaining were classified as accessory (noncore) genes. Associations between the presence/absence of accessory genes and HUS vs. non-HUS symptoms were analyzed using Scoary v1.6.16 (run with 1,000 permutation replicates) [23]. Accessory genes were reported as statistically significantly associated with HUS or non-HUS if they attained a Benjamini-Hochberg adjusted p value below 0.05. Multiple correspondence analysis (MCA) of pangenomes was performed using the "gene_presence_absence" table generated from Roary as previously described [18]. The R function MCA from R package FactoMineR was used for the analysis [24].

Whole-genome phylogenetic analysis
Whole-genome multilocus sequence typing (wgMLST) and whole-genome phylogeny analysis were performed to assess phylogenetic relatedness of STEC isolates from patients with and without HUS. To define wgMLST allelic profiles, Fast-GeP (https:// github. com/ jizha ng-nz/ fast-GeP) [25] with default settings was performed. The complete genome sequence of O157:H7 strain Sakai (NC_002695.2) was used as a reference. The whole-genome polymorphic sites-based phylogeny was inferred from the concatenated sequences of the coding sequences shared by all genomes. All the regions with elevated densities of base substitutions were eliminated, and a final Maximum Likelihood tree was generated by Gubbins (version 2.3.4) [26] with default settings. The phylogenetic tree was annotated using on online tool ChiPlot (https:// www. chipl ot. online/).

Molecular characteristics of STEC isolates in correlation to HUS
A total of 238 STEC isolates from patients with HUS (n = 59) and without HUS (n = 179) were included in this study.  Table S1).

PWAS of STEC strains from patients with and without HUS
A total of 19,059 genes were identified in the pangenomes of 238 STEC strains using Roary. Scoary identified 954 accessory genes that were significantly overrepresented in HUS-STEC group compared to non-HUS-STEC group (Benjamini-Hochberg adjusted p < 0.05) (Supplementary Table S3). The majority of these significant genes, including  12 unique genes in HUS-STEC group, encoded hypothetical proteins (HP) based on annotation using Prokka. The functionally-characterized significant genes overrepresented in HUS-STEC group encoded intimin (eae) and its receptor (tir), adhesin proteins (yfcP, yehD, elfG, sfmA, etc.), and secretion system factors, in line with virulence genes characterization. In addition, genes encoding outer membrane proteins, transcriptional regulators, phage-related proteins, etc., were significantly more prevalent in HUS-STEC group (Supplementary Table S3). MCA of pangenomes separated O157:H7 strains from non-O157 strains, while no distinct cluster was observed for HUS-STEC group (Fig. 1A and  1B). PWAS was further performed on 65 O157:H7 strains to identify any accessory gene in this serogroup that might be associated with HUS. Pangenomes of 65 O157:H7 strains consisted of 6,608 genes. Scoary identified a number of accessory genes among O157:H7 strains that were significantly overrepresented in strains from HUS patients (Benjamini-Hochberg adjusted p < 0.05) ( Supplementary  Table S4); however, most of these genes were related to hypothetical proteins whose function remain to be characterized. MCA of pangenomes showed that O157:H7 strains from HUS patients, mostly belonging to clade 8, clustered closely, while strains from non-HUS patients were discretely distributed ( Fig. 1C and 1D).

Phylogenetic relationship of STEC strains from patients with and without HUS
A whole-genome phylogenetic tree was constructed by alignment of 2,341 shared genes in 238 STEC genomes (Fig. 2). Strains with the same serotype clustered together. In line with MCA of pangenomes, O157:H7 strains were phylogenetically separated from non-O157 strains, and O157 strains of clade 8 were grouped closely. Although no separate cluster was observed for HUS-STEC group, the majority of HUS-STEC strains were distributed on O157 cluster in particular clade 8 and O121 cluster. Strains of same serotype carried similar virulence gene spectrum independent of their HUS status. Genetically closely related strains were isolated from different years.

Discussion
In this study, we performed a genome-wide association study on a large collection of clinical O157 and non-O157 STEC strains from patients with and without HUS in Sweden between 1994 and 2018. O157:H7 can be classified into nine phylogenetically distinct lineages, as determined by single nucleotide polymorphism genotyping; one lineage (clade 8) was found to be associated with more severe disease such as HUS [27,28]. The majority of clinical human and bovine isolates belonged to the hypervirulent clade 8 in Argentina [29,30]. Our study showed that strains of O157:H7 serotype, especially those from the clade 8, were most commonly found in patients with HUS in Sweden. An earlier study showed that clade 8 strains were overrepresented among isolates from cattle farms associated with human cases in Sweden [31]. Shiga toxin gene subtypes stx2a and stx2a + stx2c were found to be significantly associated with development of HUS, while stx1a was associated with a reduced risk of HUS, in line with studies from other Nordic countries [13, Fig. 2 Whole-genome phylogeny of Shiga toxin-producing Escherichia coli (STEC) isolates. Circular representation of the Gubbins phylogenetic tree generated from the concatenated sequences of the shared loci found in the wgMLST analysis. Gubbins tree was annotated with relevant metadata using an online tool ChiPlot (https:// www. chipl ot. online/). The color of branches indicate the isolation year. Branch length is ignored for better visualization. The circle from the inner to outer represents HUS status, serotype (O157:H7 clade 8), and heatmap of representative virulence genes in each functional category that was significantly overrepresented in HUS-STEC strains compared to non-HUS-STEC strains 14]. Other virulence markers associated with HUS mainly included genes encoding intimin (eae), adherence factors, toxins, and type III secretion system proteins. It should be noted that the association observed between bacterial factors and clinical outcomes does not indicate any causal link. Further studies are warranted to examine the functions of these identified genetic makers and their potential roles in HUS pathogenesis. It is notable that there is great geographical variation in genetic characteristics of pathogenic STEC strains that correlates with disease severity. For instance, a recent study from Finland indicated that eae was not statistically overrepresented in HUS-STEC strains from pediatric patients, while cytolethal distending toxin (CDT) encoding genes cdtA, cdtB, and cdtC were the most discriminative virulence genes overrepresented in the Finnish pediatric HUS-STEC strains [16]. In the present study, we did not find CDT genes overrepresented in Swedish HUS-STEC strains. Moreover, most HUS-associated O157:H7 strains in Finland were non-clade 8, while all HUS-associated O157:H7 with one exception belonged to clade 8 in this study. Another recent study in Argentina demonstrated no relationship between disease severity and serotypes and genotypes of STEC [32]. These data suggest genetic differences of pathogenic STEC strains in different geographical regions and populations (e.g., age and sex). It may also indicate that non-bacterial factors (e.g., human immunity) and/or bacteria-host interaction play a more important role in STEC-associated disease progression. Future large-scale studies with representative strains and clinical data from various geographical regions and populations are essential to gain further insights.
In the present pangenome-wide association study, we identified a large number of accessory genes differentially presented in HUS-STEC and non-HUS-STEC strains. Besides virulence genes mentioned above, other genes that were significantly overrepresented in HUS-STEC strains mainly encode outer membrane proteins, transcriptional regulators, and phage-related proteins. In addition, numbers of significant genes encode hypothetical proteins (HP) whose functions are poorly understood, further studies are needed to characterize these HP genes and to evaluate their potential role in STEC pathogenesis. Whole genome phylogeny and MCA of pangenomes could not separate HUS-STEC strains from non-HUS-STEC strains, which was in line with earlier studies from Finland and Norway [16,33]. These results suggest that STEC strains from different phylogenetic lineages may independently acquire genes that determine their pathogenicity. It is noteworthy that O157:H7 strains from HUS patients grouped closely, separated from strains from non-HUS patients. Nevertheless, no significant difference in virulence genes was found between O157 strains from patients with and without HUS, and the majority of significant accessory genes identified at pangenome level were functionally uncharacterized.
These data support that other factors, e.g., infection dose of pathogen, variations in host innate, and adaptive immunity, may play an important role in STEC pathogenesis and development of HUS. Further study is warranted to elucidate the host factors in correlation to HUS pathogenesis.
In conclusion, our study revealed that STEC strains of O157:H7 serotype especially clade 8 variants were most commonly found in patients with HUS in Sweden. Genetic factors identified as molecular predictor for development of HUS included stx subtype stx2a, stx2a + stx2c, and genes encoding intimin, toxins, secretion system proteins, and transcriptional regulators. Further studies are needed to evaluate the functions of these genes and their role in the development of HUS. Whole genome phylogeny and MCA of pangenomes could not differentiate HUS-STEC strains from non-HUS-STEC strains, suggesting that STEC strains of diverse genetic backgrounds may independently acquire genes that determine their pathogenicity, and that other non-bacterial factors may play a crucial role in the development of HUS, which warrants further investigation.
Data availability Genome assemblies of STEC isolates were accessible in GenBank with accession numbers presented in Supplementary  Table S1.

Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will 1 3 need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.