Remarkable genomic diversity among Escherichia isolates recovered from healthy chickens

The genus Escherichia has been extensively studied and it is known to encompass a range of commensal and pathogenic bacteria that primarily inhabit the gastrointestinal tracts of warm-blooded vertebrates. However, the presence of E. coli as a model organism and potential pathogen has diverted attention away from commensal strains and other species in the genus. To investigate the diversity of Escherichia in healthy chickens, we collected fecal samples from antibiotic-free Lohmann Brown layer hens and determined the genome sequences of 100 isolates, 81 of which were indistinguishable at the HC0 level of the Hierarchical Clustering of Core Genome Multi-Locus Sequence Typing scheme. Despite initial selection on CHROMagar Orientation medium, which is considered selective for E. coli, in silico phylotyping and core genome single nucleotide polymorphism analysis revealed the presence of at least one representative of all major clades of Escherichia, except for E. albertii, Shigella, and E. coli phylogroup B2 and cryptic clade I. The most frequent phylogenomic groups were E. coli phylogroups A and B1 and E. ruysiae (clades III and IV). We compiled a collection of reference strains isolated from avian sources (predominantly chicken), representing every Escherichia phylogroup and species, and used it to confirm the phylogeny and diversity of our isolates. Overall, the isolates carried low numbers of the virulence and antibiotic resistance genes typically seen in avian pathogenic E. coli. Notably, the clades not recovered are ones that have been most strongly associated with virulence by other studies.


INTRODUCTION
Members of the genus Escherichia are common inhabitants of the gastrointestinal tracts of warm-blooded animals (Tenaillon et al., 2010;Jang et al., 2017). Although many isolates are believed to be beneficial or harmless components of a healthy microbiome, the genus has received a large share of research attention as some members-notably certain strains of Escherichia coli-cause disease and/or carry transferable antibiotic resistance genes (Kaper, Nataro & Mobley, 2004;Cummins et al., 2019).
Much effort has been expended to develop techniques for identifying and categorizing members of Escherichia and understanding the population structure of the genus. Classical phenotypic techniques identified species by morphology and biochemical tests and divided strains into pathotypes, based on site of infection, e.g., uropathogenic E. coli (UPEC) causing urinary tract infections; and serotypes, based on antibody recognition of variation in lipopolysaccharides, flagella and fimbriae (Fratamico et al., 2016). These methods have gradually given way to genome-based approaches classifying isolates with multi-locus enzyme electrophoresis studies (Selander et al., 1987) and multi-locus sequence typing (Maiden et al., 1998). Whole genome sequencing technologies have recently facilitated the comparison of strains in ever greater detail (Uelze et al., 2020), leading to new classification schemes based on average nucleotide identity (Konstantinidis & Tiedje, 2005) and whole-genome nucleotide polymorphism (Schürch et al., 2018).
The avian pathogenic E. coli (APEC) pathotype causes colibacillosis in chickens, turkeys and other avian species, and is responsible for significant morbidity and mortality in the worldwide poultry industry (Dziva & Stevens, 2008). Certain serotypes and virulence factors are frequently associated with APEC but it remains challenging to distinguish categorically between commensal and virulent strains (Mehat, van Vliet & La Ragione, 2021).
The large number of high-quality Escherichia genomes from diverse geographical and biological sources has forced a re-thinking of classification. Thus, although Shimwellia blattae was initially placed within Escherichia, it is now assigned to a separate genus (Priest & Barker, 2010). By contrast, although named as if a genus, Shigella is now thought merely to represent a series of pathovars of E. coli (Yang et al., 2005;Devanga Ragupathi et al., 2018). Nevertheless, the assignment of strains of E. coli to seven phylogroups (A, B1, B2, C, D, E and F) has proven a robust finding that holds for phylogenetic trees built from core genome alignments of large numbers of isolates. Phylogroups G and H have recently been proposed (Lu et al., 2016;Clermont et al., 2019), while five of what were initially called 'cryptic clades' (Walk et al., 2009) have been assigned to species: Clade I to E. coli (Clermont et al., 2013), Clade II to E. whittamii (Gilroy et al., 2021), Clades III and IV together to E. ruysiae (van der Putten et al., 2021), and Clade V to E. marmotae (Liu et al., 2015).
The accumulation of genomic data for Escherichia has been strongly biased towards clinical isolates from humans and economically important animals (Touchon et al., 2020). This hampers efforts to achieve a complete understanding of Escherichia diversity, ecology, and population genetics. Here, we report phylogenomic analyses on 100 Escherichia isolates cultured from fecal samples from a small flock of healthy adult layer chickens, 81 of which we determined to be unique (Fig. 1).

Sample collection and storage
The project was approved by the University of Surrey's NASPA Ethical Review Assessment Committee with project number NERA-2018-011. Fresh fecal samples from healthy adult Lohmann Brown layer hens on a farm in the south-east of England were collected to isolate presumptive E. coli. The birds (flock size = 24) were purchased from a single flock in April 2017 at 20 weeks old and kept in a large outdoor run with a substrate of stone chippings and small turf enrichment beds during the day and in a coop overnight. They were fed Farmgate Layer pellets and Mash (ForFarmers Ltd, Bury St Edmunds, UK), according to the manufacturer's instructions, and no antibiotics were used. Sampling was carried out between July and November 2018. Once a month, five freshly voided fecal samples were collected from different hens. 1:10 serial dilutions of each sample were plated (100 mL) onto MacConkey agar to obtain single colonies. Several colonies with different morphologies were selected and streaked to fresh MacConkey agar plates to isolate single colonies. Identification as presumptive E. coli was confirmed by a purple colony phenotype on CHROMagar Orientation Medium (CHROMagar, Paris, France) and a positive indole test. Multiple E. coli-positive isolates from each sample were tested by GTG 5 PCR (Mohapatra, Broersma & Mazumder, 2007) to minimize the collection of identical genotypes. Isolates were stored using the Microbank Bacterial and Fungal Preservation system (Pro-Lab Diagnostics, Richmond Hill, Canada) at −80 C.

Antimicrobial resistance screening
Isolates were screened for resistance against a panel of eight antibiotics (gentamicin, ampicillin, trimethoprim, chloramphenicol, nitrofurantoin, cefpodoxime, meropenem and ciprofloxacin) by the disk diffusion method, according to EUCAST guidelines (https://www.eucast.org/ast_of_bacteria). They were additionally subjected to a minimum inhibitory concentration assay for colistin by broth dilution, according to EUCAST guidelines.

DNA extraction and genome sequencing
A total of 100 isolates were submitted for whole genome sequencing, comprised of multiple isolates per fecal sample from the first 3 months and one isolate per sample for the remaining 2 months (Table S1). Genomic DNA was extracted from overnight 1 mL cultures in Lysogeny Broth (LB; Miller's formulation), maintained at 37 C statically in sterile deep-well plates prepared from single colonies taken from overnight Lysogeny Agar plates, according to a previously described 96-well plate lysate method (Foster-Nyarko et al., 2020). Briefly, cultures were pelleted at 3,500 × g then resuspended and pre-treated with lysozyme, proteinase K and RNase A before lysing with 10% (w/v) sodium dodecyl sulphate in Tris-EDTA (pH 8.0). The DNA was retained on AMPure XP solid-phase reversible immobilization beads (Beckman Coulter, High Wycombe, UK), eluted in Tris/HCl, pH 8.0 and quantified using the Qubit high-sensitivity double-stranded DNA assay (Invitrogen, MA, USA).
For sequencing, samples were normalized to 0.5 ng.µL −1 with 10 mM Tris-HCl prior to library preparation, which was carried out with a modified Nextera XT DNA protocol (Foster-Nyarko et al., 2020). The pooled libraries were run at a final concentration of 1.8 pM using the Illumina NextSeq 500 platform (Illumina, CA, USA) following the manufacturer's recommended denaturation and loading procedures, which included a 1% PhiX spike, with 150 bp paired-end reads.

Genome assembly
Raw sequence data were demultiplexed and converted to fastq format by bcl2fastq v.2.20 (Illumina, CA, USA). Reads from multiple lanes were concatenated into single forward and reverse read files then uploaded to Enterobase (www.enterobase.warwick.ac.uk) for automated quality control and assembly via the QAssembly pipeline (Zhou et al., 2020). All assemblies exceeded the minimum quality requirements for Enterobase (genome length between 3.7-6.4 Mbp; N 50 > 20 kb; number of contigs ≤ 800; proportion of Ns < 3%; >70% of contigs assigned correctly by Kraken 2 (Wood, Lu & Langmead, 2019)). The final assemblies were downloaded for further analyses, which were all run on the Cloud Infrastructure for Microbial Bioinformatics (Connor et al., 2016).

Phylogenomic analysis
Initial phylogenomic analysis was carried out in Enterobase for in silico serotyping, fimH allele typing (Roer et al., 2017), multi-locus sequence typing according to the Warwick seven-gene scheme and phylotyping (Beghain et al., 2018), and to construct neighbor-joining trees with the NINJA algorithm in GrapeTree (Zhou et al., 2018), based on the Hierarchical Clustering of Core Genome MLST (HierCC) scheme (Zhou et al., 2020). HierCC was also used to de-replicate the strain collection by selecting a single representative isolate from every identical cluster of genomes at the HC0 level. Representative isolates were determined according to the quality score-based system employed as part of dRep v2.5.4, reliant on genome completion, contamination and N50 metrics (Olm et al., 2017). The assemblies for the 33 strains involved in clustering were used to construct a core genome alignment using Snippy v.4.3.2 (https://github.com/ tseemann/snippy). A matrix of pairwise single nucleotide polymorphisms (SNPs) was then compiled with snp-dists v.0.7.0 (https://github.com/tseemann/snp-dists). E. coli MG1655 (NC_000913.3) was used as the reference.
To confirm the identities and phylogenetic relationships of the final isolate collection, a second core genome alignment was reconstructed, containing all strains. In addition, we collated a panel of reference genomes (Table S2) for every phylogroup of E. coli and all other Escherichia species, including Shigella flexneri, all of which had been isolated from avian hosts (primarily chickens). These reference genomes were downloaded from the NCBI assembly archive and included in the core genome alignment, which was then used to reconstruct a core SNP phylogenetic tree with Salmonella bongori and Salmonella enterica as outgroups. IQ-TREE v.2.0.3 (Nguyen et al., 2015) was used with 1,000 bootstrap replications for maximum-likelihood inference of phylogenetic relationships with the best fitting model (TVM+F+ASC+R4) automatically selected by ModelFinder (Kalyaanamoorthy et al., 2017). The resulting tree was visualized and combined with other data using iTOL v.5 (Letunic & Bork, 2021).

Analysis of gene and plasmid content
To process shared gene content across our selected genome catalogue, we used the pangenomics pipeline (Delmont & Eren, 2018) as implemented in anvi 'o v.7.0 (Eren et al., 2015) with open reading frames predicted using prodigal v.2.6 (Hyatt et al., 2010) and annotation using the NCBI's Clusters of Orthologous Groups database (Tatusov et al., 2003). Using NCBI BLAST, the similarity between gene pairs was quantified and subsequently the Markov Cluster algorithm determined clusters of homologous genes, with a minbit heuristics threshold of 0.5 to eliminate weak matches and an MCL inflation of 10 for closely related genomes. FastANI (Jain et al., 2018) was applied for the calculation of average nucleotide identity between genomes. The program anvi-displaypan provided an interactive visualization of the pangenome, with imported average nucleotide identity values being depicted as part of this visualization.
ABRicate v.1.0.1 (https://github.com/tseemann/abricate) was used to search assemblies for genes related to antibiotic resistance and virulence and for plasmid replicons by comparison to the NCBI AMRFinderPlus (https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA313047), ecoli_VF (https://github.com/phac-nml/ecoli_vf) and PlasmidFinder (Carattoli et al., 2014) databases, respectively. A custom database was prepared for detection of 24 APEC-related virulence genes based on a literature search (File S1). In each case, identification was defined by minimum coverage of 90% and minimum identity of 80% of the respective nucleotide sequences. To detect plasmid-encoded virulenceassociated genes we first identified contigs derived from plasmids using platon v.1.6 in accuracy mode (Schwengers et al., 2020), then searched those contigs against our custom database using ABRicate as described above.

High phylogenomic diversity of isolates recovered
One hundred isolates provisionally identified as E. coli were collected from healthy layer hens over a 5-month period. An indication of the diversity of Escherichia recovered was provided by the isolation of 1-7 isolates from up to six phylogroups per fecal sample (Table 1). When cultured on MacConkey agar, 15 isolates were non-lactose-fermenters but positive for other E. coli-specific attributes (Nicoletti et al., 1988).
Following short read whole genome shotgun sequencing and assembly, the HierCC feature within Enterobase identified 33 out of the 100 isolates as belonging to one of 14 clusters at the HC0 level, meaning that the members of each cluster are indistinguishable from each other at every core genome locus interrogated ( Table 2). The 14 HC0 clusters all contained isolates taken from different fecal samples, spanning the duration of the study, except cluster HC0:148574, which contained three isolates that all came from a single sample taken in September 2018. A core genome alignment of the 33 strains involved in clustering was used to construct a matrix of pairwise SNP distances, confirming the close relationships between clustered strains, with 2-257 SNPs detected ( Table 2, Table 3). Since we were primarily interested in assessing species diversity within our samples, we therefore removed 19 isolates from the analysis by keeping only the best quality assembly from each HC0 cluster as a representative (Table S3).
In silico evaluation of Warwick seven-gene MLSTs for the remaining 81 isolates revealed the presence of 45 different sequence types, of which 30 were represented by a single isolate each. ST48 (nine isolates), ST10 (seven isolates) and ST2456 (five isolates) were the most numerous. HC0 isolate clusters correlated with MLSTs, supporting the close relationship between strains in each cluster (Table S1). Predicted phylogroups also suggested considerable genetic diversity among the isolates, with at least one representative from all seven E. coli phylogroups except B2, every 'cryptic clade' except clade I, and five  (Table S1). Isolates outside of E. coli sensu stricto had increased incidence of 'undetermined' results for O antigen and fimH, suggesting that current databases do not cover the whole diversity of these features for the genus. The diversity of strains did not have a significant temporal component as isolates  collected in different months were evenly distributed throughout a neighbor-joining phylogenetic tree based on the HierCC results (Fig. S1).
A core SNP maximum likelihood tree containing the 81 isolates and reference genomes for every clade of Escherichia was in good agreement with the neighbor-joining tree, and the predicted phylogroups clustered with their respective reference genomes (Fig. 2). This validates the reference genome collection, indicating its value for providing a scaffold phylogeny for future studies. Three isolates that were identified as 'group E or clade I' by in silico phylotyping were assigned to phylogroup D based on the core SNP alignment. Isolate Surrey070 was also reassigned, as it clustered more closely with phylogroup E than its predicted phylogroup of D.
The pangenome of isolates reflects phylogenomic diversity More than 500,000 genes were identified within the 81 study isolates and 33 reference genomes before organization into 14,421 gene clusters, formed by grouping homologous genes according to amino acid similarity, with a minbit heuristics threshold of 0.5 and an MCL inflation of 10. The core genome (genes present in all genomes) contained 2,449 gene clusters and the accessory genome (genes present in >1 but not all genomes) contained 8,331 gene clusters, while 3,641 singleton clusters (present in only one genome) were identified. All accessory and core genes were hierarchically clustered according to distribution pattern (Fig. 3). Functional hits were assigned to gene clusters according to NCBIs Clusters of Orthologous Groups database, with 2,766 unique groups annotated (Table S4). A total of 94% of core gene clusters were annotated compared to the considerably lower proportion annotated for both accessory (50%) and singleton (30%) gene clusters. Functional pathways assigned to annotated gene clusters were primarily associated with translation, ribosomal structure, and biogenesis (9.5% of annotated core gene clusters) with accessory gene clusters associated with prophages and transposons  Figure 2 Core SNP maximum likelihood tree of the final 81 Escherichia isolates, including reference sequences for every Escherichia species and all phylogroups of E. coli. Names of our isolates and their reference strains are coloured according to their species/phylogroup. Names of reference strains for clades not found in our samples are coloured black. Reference strains are named with their species and strain name, with phylogroups given in parentheses. Salmonella bongori and Salmonella. enterica ser. Typhi are included as outgroups.
Full-size  DOI: 10.7717/peerj.12935/ fig-2 Plasmid incompatibility groups are shared throughout Escherichia A total of 25 different plasmid replicons were identified by comparison with the PlasmidFinder database (Table S5). Eight strains did not contain a plasmid replicon, while two strains (Surrey037 and Surre074) contained seven different replicons (Fig. 4).  The overall rate of plasmid carriage was low, with a mode of 1 replicon and median of 2. There were 14 instances of an isolate containing multiple replicons of the same identity, although these were confined to the Col(MG828)_1, Col440I_1 and ColRNAI_1 groups. The most common replicons were ColRNAI_1 (46 replicons in 35 isolates) and p0111_1 (36 replicons in 36 isolates). Of the broad plasmid incompatibility groups, Col plasmids were most frequently found (101 replicons), partly because of their propensity for multiple replicons per isolate, with IncF (58 replicons) and p0111_1 (36 replicons) also highly represented (Fig. 5). There were no clear associations between plasmid incompatibility groups and phylogenomic groups, which is consistent with frequent transfer of genetic material within the genus (Shaw et al., 2021).

Overall low virulence potential amongst isolates
To assess the potential for our isolates to cause disease in chickens we looked for genomes containing ≥4 of the virulence-associated genes iutA, hlyF, iss, iroN and ompT. This panel has been proposed to identify APEC more accurately than classical serotyping methods   Johnson et al., 2008). Within our isolates, Surrey013, Surrey017 and Surrey034 carried all five genes, while Surrey010 and Surrey050 carried all except iutA. However, genes are more strongly correlated with pathogenicity if they are carried on a plasmid rather than the chromosome (Johnson et al., 2008). Both Surrey013 and Surrey034 had plasmid-encoded copies of four of the genes, qualifying them as APEC. Surrey017 and Surrey050 carried four plasmid-encoded virulence-associated genes each, but not all were from the panel of five genes so are not counted as APEC under these criteria (Table S6). A further 16 strains carried one or two virulence-associated genes on plasmids. An alternative diagnostic strategy defines APEC by categorization into any of four associations of virulence (Schouler et al., 2012). By these criteria 5/81 strains are categorized as APEC: Surrey034 (iutA+, P(F11)−, frz orf4 +), Surrey001, Surrey002, Surrey014 and Surrey015 (iutA−, sitA+, aec26+). Surrey013 and Surrey017 might additionally be considered marginal as they were only discounted due to their predicted O8 serotype, rather than O78. Therefore, the two diagnostic measures combined indicate that 6/81 strains are APEC and 3/81 have increased virulence potential but fall short of the cut-off. Overall, only a minority of our isolates are likely to be APEC according to their complement of genetic virulence determinants.
APEC fall into the wider pathotype of Extraintestinal Pathogenic E. coli (ExPEC), which includes named groups causing disease in humans: Neonatal Meningitis E. coli (NMEC), Sepsis-Associated E. coli (SEPEC) and Uropathogenic E. coli (UPEC) (Sarowska et al., 2019). APEC may be closely related to human ExPEC and may act as a reservoir for ExPEC virulence-associated genes (Cummins et al., 2019). Therefore, to assess the wider pathogenicity potential of our strains, we compiled a panel of 24 virulence-associated genes. These included the APEC-associated genes above and other typical ExPEC virulence-associated genes ( Table 4). Sixteen of these genes were present in at least one isolate (Fig. 5). Three isolates (Surrey031, Surrey045 and Surrey062) contained none of the genes and Surrey013 contained the most with 14/24 detected. The modal isolate contained two genes from the panel, and the median contained three genes.
Other commonly identified virulence-associated genes from comparison with the ecoli_VF database (a much larger database covering E. coli in general) included fimA-I, yagV-Z, ompA, entA-F/S, fepA-D/G and fes. Genes associated with the general secretory pathway (gspC-M) (Francetic & Pugsley, 1996) and type III secretion systems 1 and 2 (espL, R, X, Y; eivA, C, E-G, I, J) were also frequently identified (Franzin & Sircili, 2015;Fox et al., 2020). Many of these genes are associated with intestinal E. coli infections. Full predicted virulence-associated gene content of all strains, from comparison with the ecoli_VF database are available in Table S7. Figure 5 (continued) phylogroups. Virulence-associated genes shown here are from the custom panel of APEC-associated genes. Members of the panel not represented here were not present in any isolates. Plasmid replicon and predicted antibiotic resistance gene data have been condensed to show major classes. Therefore, presence indicates ≥1 members of that class were detected. Full gene and plasmid detection data are available in Tables S5-S7. Full-size  DOI: 10.7717/peerj.12935/ fig-5 Low carriage of predicted antimicrobial resistance genes From our comparison with the NCBI AMRFinderPlus database, we predicted low levels of antimicrobial resistance. Among the isolates, 62/81 had no predicted resistance genes (Table S8). However, five isolates carried three predicted resistance genes, one carried two genes, and 13 carried one gene. Fourteen isolates carried a gene for predicted tetracycline resistance, nine carried a predicted aminoglycoside resistance gene, three carried a predicted β-lactamase class bla-TEM gene, two carried a predicted trimethoprim resistance gene, and predicted genes for fosfomycin and sulfonamide resistance were found in a single isolate each (Fig. 5). Every isolate except three carried a single copy of a β-lactamase class blaEC gene (one of blaEC, blaEC 8 , blaEC 13, blaEC 15 , blaEC 18 or blaEC 19 ). These genes are frequently detected in genomes of β-lactam-susceptible isolates and therefore were discounted from the analysis. We did not experimentally verify the predicted resistance phenotypes; however, a preliminary screening following colony isolation also revealed low levels of phenotypic resistance to nine antibiotics (Table S1).  & Nolan, 2011;Schouler et al., 2012) papC Outer membrane usher 0 (Dodson et al., 1993;Schouler et al., 2012) papG P fimbrial tip adhesin 0 (Johnson & Brown, 1996;Schouler et al., 2012) sfa/foc S fimbrial adhesin 0 (Johnson et al., 2003) iss Host evasion Increased serum survival lipoprotein 30 (Johnson et al., 2008) kpsM Polysialic acid transport protein 15 (Stromberg et al., 2017) ompT Outer membrane protease 32 (Johnson et al., 2008) traT Complement resistance protein 19 (Cummins et al., 2019) chuA Iron acquisition TonB-dependent heme receptor 24 (Spurbeck et al., 2012) fyuA Siderophore yersiniabactin receptor 9 (Stromberg et al., 2017) ireA  (Schouler et al., 2012) icmH/ aec26 Secretion type IVB secretion system protein 23 (De Pace et al., 2010;Schouler et al., 2012) frz orf4 Sugar transport PTS fructose transporter subunit IIC 1 (Rouquet et al., 2009;Schouler et al., 2012) astA

DISCUSSION
Our data reveal remarkable diversity within the genus Escherichia from such a commonplace setting and small sample size. After minimal selection during sample collection, we were able to recapitulate almost the entire phylogeny of Escherichia. However, our initial selection for E. coli identity may still have excluded some isolates with phenotypes that differ from typical E. coli. On the other hand, the isolation of so many other Escherichia species raises doubts over the specificity of CHROMagar Orientation Medium for Escherichia coli sensu stricto. This finding may have implications for the use of CHROMagar Orientation Medium in clinical research and diagnostics, and warrants further investigation into the growth and morphology of different members of the Escherichia genus on this medium. We confirmed the diversity and identity of the isolates by several different methods, including PCR, in silico phylotyping and MLST, and core genome analysis. Our final core genome-derived phylogenetic tree was also in agreement with previously published phylogenies for the genus (Zhou et al., 2020;Abram et al., 2021;Gilroy et al., 2021).
The generally low carriage of virulence-associated genes within the isolates suggests that the majority are non-pathogenic and more likely represent commensal members of the gut microbiota (Wigley, 2015). By extension, this also supports the idea that although many strains are capable of opportunistic infection, most frank pathogens in the Escherichia genus belong to relatively well-defined pathotypes (Johnson et al., 2008). In support of this conclusion, three key phylogroups, B2, G and clade I, which have been associated with virulence in poultry (Papouskova et al., 2020;Mehat, van Vliet & La Ragione, 2021), were missing from our collection. This could be due to the relatively small sample size and small flock or might reflect that we only sampled healthy birds. Equally, the most common phylogroup in this study was phylogroup A, which has been linked to commensalism in chickens and other omnivores (Smati et al., 2015). E. albertii and Shigella were also absent from our collection. They are less well studied in birds, but have also both been linked to virulence in poultry (Shi et al., 2014;Gomes et al., 2020). We cannot rule out the possibility that APEC strains were simply lower in abundance and so were overlooked during culturing. As we only sampled healthy birds, it seems likely that this is a true reflection of the prevalence of these phylogroups among our study population. The limitations of our study prevent us from inferring any general conclusions about the distribution of Escherichia strains within chickens in other settings. Nevertheless, if such striking diversity is present in a few birds in a single location, it surely suggests that similar diversity (although perhaps with different population structures) has so far been overlooked elsewhere.
Although we did not track individual birds temporally and so cannot verify stable colonization of isolates within birds, most fecal samples yielded multiple distinct isolates. There were no clear indications of niche adaptation based on the pangenome and plasmid content of the strains, as gene clusters and plasmid replicons were spread fairly evenly among phylogenomic groups, and most phylogroups contained few isolates. The use of only short-read data also limited the completeness of our assemblies and our ability to identify plasmids and plasmid-encoded genes (Arredondo-Alonso et al., 2017). Further work involving more isolates and using both short-and long-read data will be necessary to identify unique, characteristic genetic components for each species and phylogroup, and to establish their spatial and functional niches within the chicken gut.
Recent studies have found similarly diverse populations of Escherichia after sampling of healthy pigs (Ahmed, Olsen & Herrero-Fresno, 2017;Bok et al., 2020), cattle and sheep (Shaw et al., 2021), and non-human primates (Foster-Nyarko et al., 2020), although it is notable that representatives of the 'cryptic clades' are scarce within these data. The reasons for this scarcity are unclear but are likely to include the use of different selection methods and the focus on virulent/AMR strains and E. coli sensu stricto. Escherichia diversity varies among samples from healthy humans (Bok et al., 2018;Foster-Nyarko et al., 2021a). High Escherichia species diversity has been detected within chickens previously (Vounba et al., 2019a;Foster-Nyarko et al., 2021b), although not from such a narrow pool of host birds as in the current study. Members of the cryptic clades have also previously been found in relatively high abundance in wild birds of various species (Clermont et al., 2011) but a definitive link to chickens has not been established. A recent metagenomics study included 60 different fecal samples from the same birds used here. Culturing of 3 of those samples yielded isolates of E. marmotae, E. whittamii and phylogroups A, B1, C, D and E, reflecting the diversity found in this study (Gilroy et al., 2021). However, most studies in chickens have taken samples from diseased birds and/or have focused on antimicrobial resistance and pathogenicity of the isolates recovered (Braga et al., 2016;Cummins et al., 2019;Vounba et al., 2019b;Papouskova et al., 2020;Kubelová et al., 2021).

CONCLUSIONS
The dearth of whole-genome sequencing studies that include all members of Escherichia (and sufficient metadata) skews our understanding of the genus and makes it impossible to discern patterns within or between sampling environments, geographical regions, diets, health status and so on. Consequently, there is much still to learn about the population structure and ecology of Escherichia (Lagerstrom & Hadly, 2021). We were surprised to find that isolates representing almost the entire known phylogeny of Escherichia were recovered from fecal samples from a small flock of healthy layer chickens, including species that have only recently been recognized (previously known as cryptic clades) and for which chicken-associated isolates have not been widely reported. These isolates had low carriage rates of antimicrobial resistance genes and virulence factors, suggesting that similar isolates might often be overlooked in studies that focus on these traits. Our findings highlight the surprising diversity of Escherichia harbored by even an individual chicken and emphasize the need to broaden the focus of research to encapsulate the full variety of species.

Animal Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): The study was approved by the University of Surrey's NASPA Ethical Review Assessment Committee with project number NERA-2018-011.

DNA Deposition
The following information was supplied regarding the deposition of DNA sequences: The raw sequencing reads and assemblies used in this work are available at NCBI SRA: PRJNA757375.
The individual accession numbers of the sequencing reads and assemblies for the isolates are available in Table S1.

Data Availability
The following information was supplied regarding data availability: The raw data is available in the Supplemental Files.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.12935#supplemental-information.