An untypeable enterotoxigenic Escherichia coli represents one of the dominant types causing human disease

Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrhoea in children below 5 years of age in endemic areas, and is a primary cause of diarrhoea in travellers visiting developing countries. Epidemiological analysis of E. coli pathovars is traditionally carried out based on the results of serotyping. However, genomic analysis of a global ETEC collection of 362 isolates taken from patients revealed nine novel O-antigen biosynthesis gene clusters that were previously unrecognized, and have collectively been called unclassified. When put in the context of all isolates sequenced, one of the novel O-genotypes, OgN5, was found to be the second most common ETEC O-genotype causing disease, after O6, in a globally representative ETEC collection. It’s also clear that ETEC OgN5 isolates have spread globally. These novel O-genotypes have now been included in our comprehensive O-genotyping scheme, and can be detected using a PCR-based and an in silico typing method. This will assist in epidemiological studies, as well as in ETEC vaccine development.


INTRODUCTION
The first diarrhoeal illness that infants often experience in endemic areas is caused by enterotoxigenic Escherichia coli (ETEC) [1]. In 2010, annual mortality from illness due to ETEC was estimated at 157 000 deaths (9 % of all deaths attributed to diarrhoea) and approximately 1 % of all deaths in children 28 days to 5 years of age [2]. Additionally, ETEC is a primary cause of diarrhoea in travellers visiting developing countries. In our previous study, we sequenced 362 globally representative ETEC isolates collected between 1980 and 2011 from 20 countries, including isolates from children and adults in endemic areas, as well as from travellers visiting such areas [3]. The majority of the isolates were collected from patients with diarrhoea. Genome-wide analysis showed that, contrary to previous understanding, there are long-term stable associations of ETEC lineages with specific virulence factors, such as plasmid-encoded heat-labile toxin (LT) and/or heat-stable toxin (ST; including two subtypes, STh and STp), and colonization factors (CFs), and that these lineages are globally distributed.
O-serogrouping remains the gold standard for the subtyping of E. coli isolates, especially pathogenic E. coli, for taxonomical and epidemiological studies. Most of what we know about E. coli prevalence, outbreaks and surveillance is described in terms of the O-serogroup. Recently, sequence analyses show that phenotypic O-serogroup diversification can be correlated with differences in the gene content and genetic diversity of the O-antigen biosynthesis gene cluster (O-AGC) located on the chromosome [4]. In particular, sequences from O-antigen processing genes, such as wzx (encoding the O-antigen flippase), wzy (encoding the Oantigen polymerase), and the wzm and wzt genes (encoding components of the ABC transporter pathway) located on the O-AGCs are highly variable in sequence, and can be used as gene markers for the identification of O-serogroups via molecular approaches [4]. By applying an in silico BLAST analysis using a wzx/wzy and wzm/wzt sequence set extracted from O1 to O187 O-AGCs, we subtyped our 362 global ETEC isolates [3] into 48 O-genotypes, of which the top 6 were O6 (n=38), O25 (n=24), O27 (n=18), O114 (n=17), O115 and O159 (n=16, each). In addition to the ETEC isolates classified into 48 known O-genotypes, 55 isolates carried wzx/wzy genes that showed <50 % or no sequence identity to the previously defined sequences. These ETEC isolates were categorized as O-genotype untypeable (OgUT), but carried novel O-AGCs.
In this study, we focused on characterizing the OgUT ETEC isolates detected in our previous study [3]. Such ETEC isolates may not be recognized by public-health surveillance, because they cannot be assigned to any known O-genotype and/or O-serogroup. To determine the relative contribution of OgUT ETEC isolates, we defined the novel O-genotypes of these ETEC isolates and estimated their overall contribution to disease by screening our global collection of ETEC.

Sequence comparisons
Phylogenetic trees of wzx and wzy were constructed by using the neighbour-joining algorithm using MEGA5 software [9], following multiple alignments of nucleotide sequences by the CLUSTAL W program [10].

IMPACT STATEMENT
We identified nine novel O-genotypes (OgN) in a global collection of enterotoxigenic Escherichia coli (ETEC) isolates. The novel O-genotype OgN5 was found to be the second most common ETEC O-genotype globally, with no prior information regarding the contribution to the burden of disease. To gain more information about trends in ETEC OgN epidemiology, further studies of global OgN isolates are needed. The PCR method described in this study and an in silico typing method may help the surveillance and monitoring of the OgN groups.
were observed. No significant association between lineages and geographical origins was observed (see Table S1).
Six of eight OgN3 strains carried the fliC of H45, and were confirmed positive for STh and two CFs, CFA/I (a fimbria) and CS21 (a type IV pilus) (see Table S1). All OgN3 : H45 isolates were phylogenetically grouped into L6 of phylogroup A, and originated from Central and South America, including Mexico, Guatemala and Argentina (see Fig. S1, Table S1). Six of eight OSB16 strains carried the fliC of H32, and were confirmed positive for LT, LT+STh or LT+STp, and negative for all known CFs (see Table S1). All OSB16 : H32 and OSB16 : H2 isolates were phylogenetically grouped into L13 of phylogroup A, and originated from Guatemala and Argentina (see Fig. S1, Table S1). Additionally, three OgN13, two OgN15 and single OgN2, OgN4, OgN14, OgN16 and OgN17 strains were observed in our ETEC collection (see Table S1).
To aid in the identification of the predominant novel ETEC O-AGCs, we designed specific PCR primers for identifying OgN5, OgN3 and OSB16 targeting unique sequences on wzy genes (Table 1, Fig. 4), and their specificities were confirmed by using positive-control strains and all O-serogroup reference strains (O1-O188).

DISCUSSION
The results of the current study encapsulate problems with using a limited number of phenotypic markers for typing and tracking pathogenic bacteria of importance to human or animal health. It also highlights how, with whole-genome sequencing, there are still major discoveries to be made in identifying the isolates/lineages or types that are responsible for a significant burden of reported disease and yet untypeable by traditional methods. Since the O-serogroup is still used as a general marker for the subtyping of E. coli isolates, epidemiologically there can be an unawareness of the presence, emergence or spread of O-serogroup untypeable pathogenic E. coli groups. In this study, we used extracted O-AGC sequences from 362 ETEC genomes and by comparative genomics revealed nine novel O-genotypes that were previously unrecognized by the standard methods used for serotype detection and surveillance.
In recent surveillance studies, O6 has been shown to be the most common ETEC O-serogroup (mostly O6 : H16 serotype) associated with diarrhoeal patients in Egypt [11], Bolivia [12], Bangladesh [13], China [14] and Japan [15]. The O6-serogroup is mainly found in isolates with the CF profile CS1+CS2 and CS1+CS3, which are three of the most prevalent CFs identified in clinical isolates [16]. In our ETEC collection, the most prevalent O-serogroup was O6. O25 is also a well-known ETEC O-serogroup frequently isolated from ETEC patients, such as in Bosnia and Herzegovina [17], Bangladesh [13], China [14] and Japan [15]. Interestingly, in our representative ETEC collection, the novel O-genotype OgN5 was the second most common ETEC after O6 and before O25. Moreover, ETEC Novel O-genotype ETEC isolates were found in several distinct phylogenetic lineages, and sequences of wzx/wzy from O-AGCs were highly conserved within each O-genotype, indicating that these O-AGCs have been spread across this species by horizontal gene transfer. Phylogenetic analysis in our previous study [3] (1), not identified (7) O174 (5) 09 (6) OC15 (7) OgSB16 (8) OgN3 (8) O64 (8) O15 (8) O8 (11) O169 (11) O148 (11) O128 (11) O167 (13) O78 (14) O159   and their associated lineages will continue to be discovered. These novel O-genotypes identified in this study have now been included in our comprehensive O-genotyping scheme and can be detected using the PCR-based method described in this study and by the in silico typing method.

Conflicts of interest
The authors declare that there are no conflicts of interest.