Phylogenetic relationships of Shiga toxin-producing Escherichia coli isolated from Peruvian children

Received 30 September 2010 Accepted 28 January 2011 Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana Cayetano Heredia, Lima, Peru University of Texas School of Public Health, Houston, USA Centre de Recerca en Salut Internacional de Barcelona, Hospital Clinic/Institut d’Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain CIBERESP, Barcelona, Spain US Food and Drug Administration, Laurel, Maryland, USA Área de Microbiologı́a Molecular, Centro de Investigación Biomédica La Rioja, Logroño, Spain Universidad Peruana Cayetano Heredia, Lima, Peru Instituto de Investigación Nutricional, Lima, Peru Escuela de Medicina, Universidad Peruana de Ciencias Aplicadas, Lima, Peru Universidad Nacional Mayor de San Marcos, Lima, Peru Instituto Nacional de Salud del Niño, Lima, Peru US Naval Medical Research Center Detachment, Department of Bacteriology, Lima, Peru E. coli Reference Center, Department of Veterinary and Biomedical Sciences, Pennsylvania State University, Pennsylvania, USA

Although human STEC strains belong to a large number of serotypes, most outbreaks and sporadic cases of haemorrhagic colitis and HUS are caused by serotype O157 : H7.As non-O157 STEC strains are more prevalent in animals and as contaminants in foods, humans are probably exposed more often to these strains.STEC serogroups have been classified into five seropathotypes (A-E) according to incidence and association with HUS and outbreaks (Karmali et al., 2003).STEC can be classified into four phylogenetic groups (B1, A, D and B2) (Clermont et al., 2000;Escobar-Pa ´ramo et al., 2004;Girardeau et al., 2005).Based on multilocus sequence typing (MLST), Whittam and co-workers studied the clonal relationships of STEC strains (STEC Reference Center, http://www.shigatox.net/stec/index.html).Two EHEC clonal groups and 11 STEC groups have been identified.
In our experience, HUS is common in Peru.We had a retrospective case series of patients with HUS admitted during the past 10 years at one paediatric hospital in Lima; however, STEC was not looked for adequately (only routine stool cultures were performed) during that time period in Peruvian HUS patients (unpublished data).There is little information on the prevalence, virulence factors and phylogenetic distribution of STEC strains in Peru.The aims of this study were to: (i) determine the prevalence of STEC in diarrhoea and control samples from Peruvian children; (ii) determine the distribution of critical virulence factors (stx1, stx2, eae, ehxA and astA); and (iii) determine the phylogenetic distribution (by MLST and PFGE) of the isolated STEC strains.

METHODS
Bacterial strains.We determined the prevalence of STEC in 3219 samples from children with diarrhoea and 2695 samples from healthy controls without diarrhoea from four prospective cohort studies conducted previously in 2212 Peruvian children aged ,36 months.All studies were in the community setting: three in peri-urban communities of Lima [Villa el Salvador (N.Zavaleta, Instituto de Investigacio ´n Nutricional), Chorrillos (Ochoa et al., 2009) and Independencia (E.Chea-Woo, Universidad Peruana Cayetano Heredia)]; and one in the Andean region of the country [Huaraz (C.F Lanata, Instituto de Investigacio ´n Nutricional)] (Table 1).STEC strains were identified by the presence of stx1, stx2 and eae using a previously validated multiplex real-time PCR system (Guion et al., 2008).For all studies, five lactose-positive colonies isolated from MacConkey plates were used for the PCR assay.The strain STEC W147 (stx1 + sxt2 + eae + ehxA + ) provided by Dr C. Torres (Universidad de Rioja, Spain) was used as a positive control.
Detection of virulence factors.One stx1-and/or stx2-positive colony per patient was tested to identify the presence of virulence genes.The sequences of the primers and amplicon sizes are described in Table 2. PCR for the other virulence genes (ehxA and astA) was performed in a 25 ml reaction mixture containing 2.5 ml each dNTP (2.5 mM; Bioline), 1.5 ml 50 mM MgCl 2 , 0.5 ml each primer (10 mM; Isogen Life Science), 2.5 ml 106 NH 4 buffer (Bioline), 1.5 U Biotaq DNA Polymerase (Bioline) and 5 ml DNA template.For all amplification reactions, the mixture was heated to 94 uC for 10 min prior to thermocycling (iCycler; Bio-Rad).The mixture was held at 72 uC for 7 min after the final cycle before cooling at 220 uC.Amplified products were analysed by 1.5 % agarose gel electrophoresis and visualized by staining with ethidium bromide.
Serotyping.Serotyping was performed at the E. coli Reference Center (Pennsylvania State University, PA, USA) for O (Orskov et al., 1977) and H (Machado et al., 2000) antigen typing.STEC strains were assigned to one of the five seropathotypes (A-E), as described previously (Karmali et al., 2003).
EHEC haemolysin production.EHEC haemolysin production was detected using blood agar base (Difco) supplemented with 10 mM CaCl 2 and 5 % defibrinated sheep blood.Plates were incubated at 37 uC and examined after 24 and 48 h for zones of haemolysis around colonies (Vieira et al., 2001).
Clermont's phylogenetic group determination.STEC strains were assigned to Clermont's phylogenetic groups according to the presence or absence of the genes chuA, yjaA and tspE4C2 (Clermont et al., 2000).
MLST.MLST was performed on seven conserved housekeeping genes (aspC, clpX, fadD, icdA, lysP, mdh and uidA) as described elsewhere (http://www.shigatox.net/mlst).PCR products were purified using a Wizard SV Gel and PCR Clean-Up System (Promega).Sequencing was performed by Macrogen using an automatic DNA 3730xl sequencer (Applied Biosystems) and concatenated for phylogenetic analyses.Strains belonging to the same ST were considered to be the same clone; one member of each ST was used in the phylogenetic analyses.
Phylogenetic analyses.The MLST sequences of the strains were combined with those from 33 published E. coli and Shigella species genomes for comparison (see Supplementary Table S1, available in JMM Online).Sequences were aligned by CLUSTAL W using the MEGALIGN module of the Lasergene software (DNASTAR).Neighbourjoining trees were constructed using the Kimura two-parameter model of nucleotide substitution with MEGA4 software (Tamura et al., 2007), and the inferred phylogenies were each tested with 500 bootstrap replications.Phylogenetic network analysis was conducted with the SplitsTree 4 program (Huson & Bryant, 2006) using the neighbour-net algorithm (Bryant & Moulton, 2004) and untransformed distances (p-distances).The W w recombination test (Bruen et al., 2006) as implemented by SplitsTree 4 was used to distinguish recurrent mutation from recombination in generating genotypic diversity.The numbers of synonymous substitutions per synonymous site (d S ) and non-synonymous substitutions per non-synonymous site (d N ) were estimated by the modified Nei-Gojobori method using MEGA4.Allelic sequences were fitted to a nucleotide substitution model using the Datamonkey website (http://www.datamonkey.org/),and the single likelihood ancestor counting method was used to fit a codon model to detect selection on individual codons (Pond & Frost, 2005).
PFGE. Preparation of genomic DNA and PFGE were performed as described previously (Gautom, 1997).Samples were digested with 40 U XbaI (Promega), and DNA fragments were resolved in 1 % agarose gels using a CHEF-DR-II system (Bio-Rad Laboratories).
Lambda concatemers (New England Biolabs) with a molecular size range of 50-1000 kb were used as DNA size markers.Evaluation of PFGE profiles for similarity was performed using InfoQuest FP v.5 software (Bio-Rad).A UPGMA tree was constructed using Dice similarity indices, complete linkage and optimization: 1 %, position tolerance 1.3 % (Beutin et al., 2005).

Prevalence
We analysed 5914 samples in total.The prevalence of STEC was 0.4 % (14/3219) in diarrhoeal samples and 0.6 % (15/ 2695) in healthy controls (Table 1).The prevalence of STEC was significantly lower compared with other pathogens.The mean prevalence of the other isolated pathogens (using the same PCR methodology) was: enteroaggregative E. coli, 9.9 %; enteropathogenic E. coli, 8.5 %; enterotoxigenic E. coli, 6.9 %; and diffusely adherent E. coli, 4.8 % (T.J. Ochoa, A. Llanos, J. Lee and F. Lopez, unpublished data).To our knowledge, this is the first study of the prevalence of STEC in Peruvian children.This is important because STEC is not routinely looked for in clinical laboratories, even when the child presents with bloody diarrhoea or HUS.The small number of isolated STEC strains was one of the main limitations of this study.The age of the STEC-infected children was 4-36 months (mean 15 months).Among the STEC-positive diarrhoea samples, one was bloody (VES 230-5, isolation date 2 June 2004).Of the 29 STEC strains, 20 were available for further analysis (by MLST and PFGE).

Distribution of virulence genes
Analysis of the frequency of virulence factors and clonal distribution of STEC is pivotal to improve our understanding of epidemiological characteristics of pathogens that pose a risk to public health.Epidemiological studies, together with in vivo and in vitro experiments, have revealed that stx2 (and its variants) is the most important virulence factor associated with severe human disease.STEC producing stx2 is more commonly associated with serious disease than isolates producing stx1 or stx1 plus stx2 (Boerlin et al., 1999;Louise & Obrig, 1995;Paton & Paton, 1998).In the current study, the majority of strains were stx1-producing strains (24/29, 83 %); only 5/29 (17 %) strains carried stx2.This fact presumably explains the mild illness found in these infections.There were too few stx2positive isolates to assess its relationship to pathogenesis.
Severe diarrhoea (especially haemorrhagic colitis) and HUS are closely associated with STEC types carrying the eae gene for intimin (Boerlin et al., 1999), although a large number of locus of enterocyte effacement-negative STEC have also caused human disease (Bettelheim, 2007).In this study, the eae gene was present in 72 % (21/29) of the STEC strains: 19 strains were stx1 + eae + , five were stx1 + eae 2 , three were stx2 + eae 2 and two were stx2 + eae + .The distribution of frequency of eae + STEC among diarrhoea and control samples was similar.

MLST analysis
MLST loci were sequenced in 19 STEC strains.For phylogenetic analyses, the sequenced internal fragments of the seven housekeeping genes were concatenated to yield 3732 nt.MLST analysis resolved a mean of 19.4 variable nucleotide sites per locus, which defined a number of alleles, ranging from five to eight (Table 4).The d S value ranged from 3.57 % for mdh to 6.64 % for fadD, with a mean of 5.03 synonymous substitutions per 100 synonymous sites (Table 4).The d N value per 100 non-synonymous sites was generally an order of magnitude lower than that of d S , ranging from 0.00 for aspC, clpX, fadD, icdA and lysP to 0.41 for uidA.Tests for natural selection operating on the allelic variation at each MLST locus based on the single likelihood ancestor counting method found no individual sites to be under significant negative or positive selection, indicating that the MLST loci are evolving neutrally.
The distinct combinations of alleles across the MLST loci were used to define 13 multilocus genotypes or STs among the 19 strains.The 13 STs differed on average at 1.2 and 0.2 % of the nucleotide and amino acid sites, respectively.ST106 was the most common multilocus genotype (5/19, 26 % of strains) (Fig. 1).In the phylogenetic tree based on the genetic relationships of the STEC (Fig. 2a), we observed that our strains were closely related to others of the same serotype from other studies.In addition, STEC strains ST106, ST896 and ST898 (EHEC 2) were related in the network, similar to the results observed in the tree based on the PFGE results (Fig. 1).The same results were observed in clonal group STEC 12.
The correlation observed between Whittam's clonal groups, Clermont's phylogenetic groups and some of the serotypes in this study is of interest.EHEC 2 contains serotypes O26 : H11, O111 : H8 (O111 strains are often non-motile or of other H types) and O145 : H11, which are classified as seropathotype B and phylogenetic group B1, as observed by others (Karmali et al., 2003;Ziebell et al., 2008).
The splits network (Fig. 2b) revealed several parallel paths indicative of the presence of phylogenetic incompatibilities in the divergence of clones.Such incompatibilities could arise from recurrent mutation or recombination in MLST loci.To detect recombination, the W w test, which discriminates between recurrent mutation and recombination (Bruen et al., 2006), was used.When applied to the concatenated sequences of the 13 STs, the W w test found significant evidence of recombination (Table 4).Evidence for recombination was also detected among the alleles of fadD and icdA (Table 4).

PFGE
PFGE typing of 20 STEC strains resulted in 19 pulsed-field patterns.
Comparison of the patterns revealed 11 clusters (I-XI) with a general similarity of 70 % in the UPGMA tree.Each cluster included strains belonging to different serotypes (Fig. 1), with the exception of cluster VI, which exclusively contained seven STEC of clonal group EHEC 2, phylogenetic group B1 and seropathotype B. In addition, the strains of pulsed-field pattern 1 (cluster I) showed the same pattern, belonging to the same clonal group of STEC 2. Most of the strains in this study were from children in separate geographical areas taken on different dates, suggesting that these pathogenic clones may be widespread in Peru.
MLST and PFGE were performed to establish the clonal relationships between representative STEC strains in this study.Both techniques identified strains that shared similar clonal origins (PFGE group VI and EHEC 2; Fig. 1).PFGE was more discriminative than MLST, as each ST was represented by more than one pulsed-field pattern.
Differences between MLST and PFGE may be the result of the type of analysis.While PFGE detects multiple differences in the genome, MLST analyses only small fragments of conserved metabolic genes.Therefore, events such as the recent acquisition of virulence factors cannot be detected by MLST; genome sequencing was not carried out in this study.
In summary, STEC prevalence was low in children with diarrhoea in the community setting in Peru.Strains were phylogenetically diverse and associated with mild infections.There was a good correlation between the seropathotypes, clonal groups, PFGE groups and Clermont's phylogenetic groups.However, additional studies are needed in Peruvian children with bloody diarrhoea and HUS to determine the virulence genes and phylogenetic characteristics of more virulent strains.
Clermont's phylogenetic group distributionRecent phylogenetic studies have indicated that STEC/ EHEC strains fall principally into phylogenetic groups A,

Fig. 2 .
Fig. 2. Phylogenetic relationships among 13 STEC STs.(a) Unrooted phylogenetic tree constructed by a neighbour-joining algorithm based on the Kimura two-parameter model of nucleotide substitution.Bootstrap values greater than 75 % based on 500 replications are given at internal nodes.The serotypes for the published E. coli and Shigella species genome strains are given in parentheses.(b) Phylogenetic (splits) network based on a neighbour-net algorithm using a p-distance matrix.The 13 STEC STs are indicated by filled circles in (a) and (b).

Table 1 .
STEC prevalence in four cohort studies in Peruvian children ND, Not determined: control samples were not collected in this study.DNA sequence analyses.The sequences were reviewed and edited by visual inspection using Chromas Lite v.2.01 software (Technelysium Pty).After editing, the sequences were exported to BioEdit v.7.0.9 (http://www.mbio.ncsu.edu/BioEdit/BioEdit.html) and aligned with the CLUSTAL W module. Differences of a single nucleotide allowed us to classify the sequences as different alleles.The different alleles of each housekeeping gene were numbered, and allelic profiles or sequence types (STs) were determined based on the seven studied loci.ST designations were assigned in accordance with the numbering system used by the STEC Center at Michigan State University (MI, USA; http://www.shigatox.net/ecmlst/cgi-bin/index).

Table 2 .
Sequence of primers used in this study *F, Forward primer; R, reverse primer.

Table 3 .
Distribution of serotypes, virulence genes and phylogenetic groups of the STEC strains ND, Not determined; +, positive for the gene; 2, negative for the gene; NT, non-typable.*H + , Positive reaction: the group is novel and does not match with known reference standards; w, weak reaction.DProposed by Karmali et al. (2003) (A-E).dProposed by Clermont et al. (2000) (A, B1, B2 and D).

Table 4 .
Sequence variation in seven MLST loci from 13 STs *P value from the W w test for recombination.DConcatenated sequence data for all 13 STs.