Multilocus Variable Number Tandem Repeat Analysis

(MLVA) is a subtyping technique for characterizing human pathogenic bacteria such as enterohemorrhagic Escheri-chia coli (EHEC) O157. We determined the phylogeny of 202 epidemiologically unrelated EHEC O157:H7/H – clinical isolates through 8 MLVA loci obtained in Germany during 1987–2008. Biodiversity in the loci ranged from 0.66 to 0.90. Four of 8 loci showed null alleles and a frequency <44.1%. These loci were distributed among 48.5% of all strains. Overall, 141 MLVA profi les were identifi ed. Phylo-genetic analysis assigned 67.3% of the strains to 19 MLVA clusters. Specifi c MLVA profi les with an evolutionary persistence were identifi ed, particularly within sorbitol-fermenting EHEC O157:H – .These pathogens belonged to the same MLVA cluster. Our fi ndings indicate successful persistence of this clone. E nterohemorrhagic Escherichia coli (EHEC) O157:H7 infections have substantial medical, public health, and economic effects (1,2). Most symptomatically infected patients have painful bloody diarrhea (2,3). Hemolytic uremic syndrome (HUS) develops in ≈15% of infected children ≈1 week after the fi rst loose stool. HUS is a thrombotic mi-croangiopathy and consists of nonimmune hemolytic anemia , thrombocytopenia, and renal failure (1). Currently, HUS is the main cause of acute renal failure in children (4). In Germany, E. coli O157:H7, which is the most frequent EHEC serotype implicated in HUS, is not the only relevant EHEC O157 involved. Sorbitol-fermenting (SF) E. coli O157:H – (nonmotile) strains cause ≈20% of all cases of HUS (5). Unlike E. coli O157:H7, organisms within this clone can ferment sorbitol after overnight incubation on sor-bitol MacConkey agar. Although EHEC O157:H7 causes a zoonotic disease mainly associated with cattle, efforts to determine the animal reservoir of SF EHEC O157:H – have been unsuccessful (5). To identify reservoirs of EHEC O157:H7 infections and of other foodborne pathogens and to elucidate the molecular epidemiology of these pathogens in the United States, PulseNet was established in 1996 (6). This US national molecular subtyping network for foodborne disease surveillance facilitates subtyping of bacterial foodborne pathogens for epidemiologic purposes. This network is based on characterization of whole bacterial genomes by using macrorestriction digestion patterns that are separated by pulsed-fi eld gel electrophoresis (PFGE), a technique that has emerged as a common standard for subtyping EHEC O157 isolates (6). Despite its high discriminatory power, PFGE can be problematic because it requires great efforts to ensure intralaboratory and interlaboratory reproducibil-ity (7–10). Furthermore, its application is labor-intensive and diffi cult to automate. Thus, this …

E nterohemorrhagic Escherichia coli (EHEC) O157:H7 infections have substantial medical, public health, and economic effects (1,2). Most symptomatically infected patients have painful bloody diarrhea (2,3). Hemolytic uremic syndrome (HUS) develops in ≈15% of infected children ≈1 week after the fi rst loose stool. HUS is a thrombotic microangiopathy and consists of nonimmune hemolytic anemia, thrombocytopenia, and renal failure (1). Currently, HUS is the main cause of acute renal failure in children (4). In Germany, E. coli O157:H7, which is the most frequent EHEC serotype implicated in HUS, is not the only relevant EHEC O157 involved. Sorbitol-fermenting (SF) E. coli O157:H -(nonmotile) strains cause ≈20% of all cases of HUS (5). Unlike E. coli O157:H7, organisms within this clone can ferment sorbitol after overnight incubation on sorbitol MacConkey agar. Although EHEC O157:H7 causes a zoonotic disease mainly associated with cattle, efforts to determine the animal reservoir of SF EHEC O157:Hhave been unsuccessful (5).
To identify reservoirs of EHEC O157:H7 infections and of other foodborne pathogens and to elucidate the molecular epidemiology of these pathogens in the United States, PulseNet was established in 1996 (6). This US national molecular subtyping network for foodborne disease surveillance facilitates subtyping of bacterial foodborne pathogens for epidemiologic purposes. This network is based on characterization of whole bacterial genomes by using macrorestriction digestion patterns that are separated by pulsed-fi eld gel electrophoresis (PFGE), a technique that has emerged as a common standard for subtyping EHEC O157 isolates (6). Despite its high discriminatory power, PFGE can be problematic because it requires great efforts to ensure intralaboratory and interlaboratory reproducibility (7)(8)(9)(10). Furthermore, its application is labor-intensive and diffi cult to automate. Thus, this technique can be biased by subjective interpretation of band patterns (7,8). In addition, band patterns can be altered by the presence of mobile genetic elements.
To overcome these drawbacks, other molecular methods were developed, among them multilocus variable number tandem repeat (VNTR) analysis (MLVA). MLVA is based on the characterization of different VNTR regions throughout the bacterial genome. Repeat regions are am-Phylogenetic Analysis of Enterohemorrhagic Escherichia coli O157, Germany, 1987O157, Germany, -2008 plifi ed by using PCRs, and resulting fragments are sized to determine the number of repeats. The combination of numbers of repeats of different VNTR loci results in an allelic profi le known as the typing result. First developed in 1995 for Mycobacterium tuberculosis (11), MLVA is now a common typing method for an increasing number of pathogens (12,13). For EHEC O157, different MLVA schemes with some overlaps of VNTR regions have been published and have demonstrated a capability to detect outbreaks and differentiate closely related EHEC O157 isolates not discriminated by PFGE (8,14,15). These fi ndings qualify MLVA as the second-generation subtyping method for PulseNet (8).
In addition to its use in infectious disease surveillance, MLVA also can be used to study phylogeny of pathogens, especially recently evolved clonal pathogens such as M. tuberculosis (16,17) or Bacillus anthracis (18). However, because of limited diversity in their housekeeping genes, which are the genomic targets for phylogenetic investigations based on multilocus sequence typing (MLST), the common technique for phylogenetic studies (19,20), certain monomorphic organisms could not be suffi ciently differentiated by MLST (16,18). Similarly, EHEC O157 lacks diversity in its housekeeping genes (21,22), which hampers phylogenetic analysis of EHEC O157 by MLST.
We investigated the phylogeny of EHEC O157:H7 and SF EHEC O157:Hstrains isolated during 1987-2008 in Germany by applying the current PulseNet MLVA protocol for E. coli O157 (23). The purpose of our study was to gain a deeper insight into the evolution and spread of this pathogen since 1987, when the fi rst cases of EHEC O157 infections were detected (24,25).

Clinical Isolates
Up to 17 epidemiologically unrelated EHEC O157: H7/Hisolates per year obtained during 1987-2008 were randomly selected from the strain collection of the Institute of Hygiene and the National Consulting Laboratory on HUS, University Hospital Münster, Germany. All 202 O157 strains (61 of which were SF EHEC O157:H -) were isolated from humans, including patients with HUS (145), bloody diarrhea (12), or diarrhea without visible blood (40), and asymptomatic carriers (5) during epidemiologic investigations. Isolates were obtained from areas throughout Germany. Procedures used for detecting and isolating EHEC O157 from stool samples were described (26,27). Isolates were confi rmed as E. coli by the API 20 E test (bio-Mérieux, Marcy l'Etoile, France) and serotyped by using antisera against E. coli O antigens 1-181 and H antigens 1-56 (28). Subtyping of fl iC genes in nonmotile isolates by using HhaI restriction fragment length polymorphism of amplicons obtained with primers FSa1 and rFSa1 (29,30) confi rmed the presence of fl iCH7 in all isolates. EHEC O157:H7 strain EDL933 (31,32) was used as a reference strain in all analyses.

MLVA of EHEC O157
Strains were grown overnight on Columbia blood agar (Heipha; Eppelheim, Germany) at 37°C. A loop of a fresh culture was suspended in 100 μL of Chelex-100 solution (Bio-Rad, Hercules, CA, USA) and vortexed briefl y. After boiling and thorough mixing, samples were centrifuged and DNA-containing supernatants were stored at -20°C until use. To calibrate sequencer-specifi c variation of fragment length, the exact number of repeats of reference strain O157:H7 EDL933 was initially determined in silico on the basis of its genome sequence (reference sequences NC_002655 [chromosome] and NC_007414 [plasmid]; National Center for Biotechnology Information [NCBI], Bethesda, MD, USA) by using Tandem Repeats Finder software (33). Subsequently, the length of the in silico-determined repeats was subtracted from the fragment length of each respective VNTR locus generated in 8 independent capillary electrophoresis runs of strain EDL933 to determine the offset (primer plus VNTR-fl anking regions). This locus-specifi c offset was then used to calculate the correct number of repeats of unknown isolates. Fragments for MLVA typing were generated in 2 multiplex PCRs comprising either VNTR loci 3, 9, 25, and 34 (multiplex 1) or VNTR loci 17, 19, 36, and 37 (multiplex 2) (online Appendix Table, www.cdc.gov/EID/content/16/4/610-appT. htm), according to the current PulseNet MLVA protocol for E. coli O157 (23).
PCR amplifi cation was performed in a reaction mixture of 10 μL containing 5 μL of Type-it Multiplex Master Mix (QIAGEN, Hilden, Germany), ≈30 ng of DNA template, and VNTR-specifi c primers for each of the 4 VNTR loci. Concentration, primer sequences, and respective dyes used are shown in the online Appendix Table. PCRs were performed and prepared for subsequent analysis on sequencers in accordance with the manufacturer's instructions (online Appendix Table). PCR products were diluted 1:10 with water purifi ed by high performance liquid chromatography, and 1.0 μL of diluted DNA was mixed with 13.7 μL of HiDi formamide (Applied Biosystems, Foster City, CA, USA) and 0.3 μL of GeneScan-600 LIZ Size Standard (Applied Biosystems) as internal lane size standard. Before fragment sizing in the ABI Prism 3130xl Genetic Analyzer System (Applied Biosystems), samples were incubated for 5 min at 95°C and immediately frozen at -20°C for >3 min to denature the DNA.
If a VNTR locus was not detected during fragment analysis, reactions were repeated by using singleplex reactions with minor modifi cations to amplify the specifi c locus. In that particular instance, the primer concentration was increased to 0.2 μmol/L, annealing temperatures were reduced to 55°C, and the extension time was tripled to amplify larger fragments because of possible insertion sequence element transposition or other genetic events. Subsequently, fragments were characterized by using standard agarose gel electrophoresis. If the fragment was larger than the usual range of fragment sizes of the corresponding VNTR, the PCR product was sequenced.

Data Analysis
After fragment analysis, corresponding peak data were examined by using GeneMapper 4.0 software (Applied Biosystems) to calculate the repeat number for each VNTR locus on the basis of fragment length. Partial repeats were rounded to the closest repeat number in accordance with the current Centers for Disease Control and Prevention (CDC) (Atlanta, GA, USA) MLVA O157 protocol (23). If >1 amplicon for a specifi c VNTR locus was detected and the size difference matched >1 repeat lengths (so-called stutter peaks), the one with the highest fl uorescence level was used to calculate the repeat number. A null allele was assigned if either no amplicon was detected or agarose gel electrophoresis data showed an amplicon of a size that was beyond the usual range of fragment size of the specifi c VNTR locus. Corresponding alleles were designated as -2. In the hypothetical situation in which an amplicon without the repeat region was detected, it was designated as -1 (8). Null alleles were also included in the overall number of alleles in a specifi c VNTR locus.
Index of diversity (ID) (34) and typeability were calculated by using EpiCompare 1.0 software (Ridom GmbH, Würzburg, Germany). A minimum spanning tree (MST) was generated by using SeqSphere software 0.9 β (Ridom GmbH). All MLVA profi les that differed at <2 alleles were grouped as an MLVA cluster. To determine the cluster-defi ning profi le of clusters containing >2 MLVA profi les, the MST priority rule (that the profi le with the highest number of single locus variants is chosen) was applied.
Signifi cance of associations of MLVA profi les or clusters comprising >4 strains with clinical outcome (HUS vs. non-HUS) were calculated by using a χ 2 test with Yates correction (EpiInfo 6 software; CDC) when appropriate. p values <0.05 were considered signifi cant.
To determine whether amplifi cation failure caused by mutations in primer-binding regions or by complete deletions of the VNTR region were the reason for these null alleles or the insertion of fragments such as mobile genetic elements resulted in larger (and therefore by capillary electrophoresis) undetectable fragments, the respective fragments were analyzed by using standard gel electrophoresis. In some cases, large fragments (>1.3 kb) were detected. Sequence analysis and an NCBI nucleotide BLAST search (http://blast.ncbi.nlm.nih.gov/Blast.cgi) of randomly selected samples indicated the presence of insertion sequence elements of the IS3 family. The typeability of different VNTR loci ranged from 55.9% to 100% (  (15). TR loci are from Noller et al. (14). ‡Number is based on current EDL933 genome data. ORF encoding VNTR loci encoded either hypothetical proteins or proteins with unknown function. §Including null alleles. ¶Typeability determines the proportion of all alleles without null alleles. #Located on plasmid pO157 of reference strain EDL933.
All 61 strains in cluster 1 were SF EHEC O157:H -. The remaining 141 strains did not ferment sorbitol (Table 2, Figure). The phylogenetic relationship of 202 EHEC O157 strains based on the 141 MLVA profi les is shown in an MST in the Figure. The reference strain EDL933, which is also included in the MST, shares its MLVA profi le with 4 isolates from Germany obtained during 2007-2008.

Association of MLVA Profi les with HUS
To determine whether there was an association between MLVA profi les or clusters and the ability of these strains to cause HUS, we performed a signifi cance test. The 2 most common MLVA profi les were signifi cantly associated with HUS (p = 0.023). Testing for specifi c clusters resulted in a signifi cant association of HUS with cluster 1 (p = 0.009) (Figure).

Discussion
Using 8 VNTR loci of the current PulseNet MLVA O157 protocol (23), we analyzed a large collection of 202 EHEC O157:H7/Hstrains isolated over >2 decades in Germany to determine their molecular epidemiology. Of the 141 MLVA profi les detected, 81 were clustered into 19 groups of related profi les that differed at >2 loci. The remaining 60 profi les were not clustered. Our data demonstrate a great diversity of EHEC O157:H7 associated with human diseases in Germany over the past 2 decades. The wide distribution of strains within the MST based on MLVA typing refl ects frequent occurrence of genetic events outside the EHEC O157 core genome ( Table 2 Table 2) contained strains widespread in the period of 10-20 years. Only cluster 3 is defi ned by profi les starting from 2001, which indicates a later appearance than clusters 1, 2, 4, 5, and 6 ( Table 2).
The 2 most frequently identifi ed MLVA profi les are parts of cluster 1, which indicates a consensus profi le among SF EHEC O157:Hisolates over time within this cluster. The corresponding strains include strain 493/89, which was isolated during the fi rst documented outbreak caused by SF EHEC O157:H - (25). All other isolates that exhibited the 2 most common MLVA profi les also fermented sorbitol, which identifi ed strain 493/89 as a prototype of these strains. This fi nding corroborates the assumption of an epidemic bacterial population structure within a background population comprising a network between different genotypes, and that superimposed strains emerge from highly adaptive, ancestral genotypes and may be persistent for decades (35). Nodes from cluster 1, which represent the 2 most prevalent MLVA profi les, include strains from 1988-2008 and 1995-2008. This fi nding indicates a persistence of these successful clones, which supports this hypothesis. Moreover, evolutionary success and uniqueness of this SF clone was recently supported by whole genome single nucleotide polymorphism analysis, in which distinct branching of these clones was determined during evolution of the O157 serotype (22). Statistical analysis demonstrated that the 2 most common profi les and the entire cluster 1 are associated with HUS, which indicates that specifi c MLVA profi les are associated with severe disease. Cluster 1 comprised 61 of the strains and was distributed over more than a decade. Although not statistically signifi cant, 10 of 11 isolates in cluster 2 were also associated with HUS. Despite these similarities, they exhibited different MLVA profi les (Table 2; Figure). Extensive heterogeneity of EHEC O157:H7, in contrast to conservation of SF EHEC O157:H -, could be related to observed differences in the nature of the reser-voirs and vehicles for transmission. In addition, the epidemiology of SF EHEC O157:Hinfections differs markedly because these infections occur predominantly during cold (winter) months and in children <3 years of age (5). Moreover, although EHEC O157:H7 infections have zoonotic origins, SF EHEC O157:Hare rarely found in animals (36). Humans are plausibly the main reservoirs, as is the case with classical enteropathogenic E. coli and enteroinvasive E. coli. This relatively stable niche may lead to the conserved genome structure and high pathogenicity for the host (37).
Four strains isolated in 2007 and 2008 exhibited the same MLVA profi le as the reference strain EDL933 isolated in 1982 in the United States (38) (Table 2, Figure). Among the 3,200 entries in the CDC MLVA database, the EDL933 MLVA profi le was detected only during an outbreak in 1982 (E. Hyytiä-Trees, pers. comm.). There are 2 possible explanations for this phenomenon. This fi nding is coincidental because of genetic changes in the O157 genome or EDL933 shares a common MLVA profi le with other strains. The presence of such common profi les is known, especially in foodborne pathogens and other monomorphic species (39) and frequently seen by using other typing techniques, such as PFGE.
Analysis of the number of alleles of different VNTRs produced results similar to those of a previous study (8) an open reading frame did not infl uence the ID (Table  1). However, the frequency of null alleles differed markedly. A total of 98 (48.5%) of 202 strains exhibited null alleles in 4 of the 8 VNTR loci. Especially in VNTR-9 and VNTR-36, the frequency of null alleles was high (31.7% and 44.1%). Although null alleles were reported in other MLVA O157 studies (8,40), this high frequency of null alleles determined in our study might indicate a specifi c feature of EHEC O157 strains from central Europe or Germany. An explanation for the frequent occurrence might be that VNTR-36 and VNTR-9 are located in noncoding or hypothetical protein encoding regions of the EHEC genome (Table 1). Nevertheless, all strains had a high ID regarding the complete MLVA profi le. Our study had some limitations. Because of the limited number of isolates obtained during 1987-1995, clustering might be biased and a more year-specifi c clustering might be observable. However, cluster 1 represents 30.2% (61/202) of strains widespread during 1988-2008, which contradicts this thesis, and infers a certain genetic stability of such clusters over time. In contrast to phylogenetic studies based on whole genome sequencing data (22), we report a phylogeny based on 8 genetic loci that might be biased by larger recombinational events. However, all VNTR loci are >50 kb from the rfb-gnd segment, which was determined to be the only genomic region in EHEC O157 with a higher mutation rate (22).
Strains (66/202, 32.7%) that were not classifi able into any MLVA cluster complement the assumption of the highly dynamic EHEC O157 genome. This fi nding likely indicates that genetic changes in E. coli lead to adaptation to a host-specifi c environment (in this case human), especially during pathogenesis and host-specifi c immune responses.
Applying MLVA to this highly diverse strain collection resulted in new insights into the phylogeny of EHEC O157 in Germany since their fi rst description in 1987. In addition to its already demonstrated ability to differentiate outbreak and sporadic case strains, MLVA of O157 emerged as a major typing tool that can further characterize EHEC O157 subpopulations and associated strains. This tool can be used for studying phylogeny coherences and identifying successful clones.