One Health Genomic Surveillance of Escherichia coli Demonstrates Distinct Lineages and Mobile Genetic Elements in Isolates from Humans versus Livestock

The increasing prevalence of E. coli bloodstream infections is a serious public health problem. We used genomic epidemiology in a One Health study conducted in the East of England to examine putative sources of E. coli associated with serious human disease. E. coli from 1,517 patients with bloodstream infections were compared with 431 isolates from livestock farms and meat. Livestock-associated and bloodstream isolates were genetically distinct populations based on core genome and accessory genome analyses. Identical antimicrobial resistance genes were found in livestock and human isolates, but there was limited overlap in the mobile elements carrying these genes. Within the limitations of sampling, our findings do not support the idea that E. coli causing invasive disease or their resistance genes are commonly acquired from livestock in our region.

Chemotherapy from 11 hospitals across England (n ϭ 1,093) between 2001 and 2011 (locations shown in Fig. 1a, and full isolate listing in Table S1) (17,18). A potential limitation of this human isolate collection is that they might overrepresent hospitalacquired isolates, while a comparison of E. coli from livestock would require a comparison of community-acquired bacteria. Two analyses were undertaken to evaluate this possibility. First, we defined where the bloodstream infection was acquired for 1,303 cases for whom we had this information. This demonstrated that 886/1,303 (66%) cases were community associated. We then constructed a maximum likelihood tree of the invasive disease genomes to compare the phylogeny of isolates associated with community-versus health care-associated disease (Fig. S1). This demonstrated that genomes from the two categories were intermixed and distributed across the phylogeny, with no evidence of clustering by origin of infection. We concluded that our invasive collection was likely to include strong representation of E. coli carried by people in the community.
Phylogenies based on single nucleotide polymorphisms (SNPs) in the core (conserved) genomes of isolates representing CC10 (n ϭ 149) and CC117 (n ϭ 64) demonstrated that human isolates were intermixed with livestock isolates in CC10, but were generally distinct from livestock isolates in CC117 ( Fig. S2 and S3). Pairwise SNP analysis demonstrated that the most closely related human/livestock isolate pairs were 85 and 96 SNPs different for CC10 and CC117, respectively. The estimated mutation rate for E. coli is one SNP/core genome/year (19,20), and so CC10 and CC117 isolates in humans and livestock were not associated with recent transmission between the two groups. Combining the study CC117 isolates with 7 publicly available ST117 genomes (NCBI SRA accession numbers ERR769196, ERR769195, ERR769183, ERR769169, SRR1314275, SRR3410778, and SRR3438297) in a Bayesian phylogenetic analysis provided further evidence for the lack of recent transmission between human and livestock hosts in our study. The dated phylogeny revealed a UK cluster of 47 CC117 isolates (containing 44 turkey, 1 chicken, and 2 human isolates), for which the estimated time of most recent common ancestor (TMRCA) was 1989 (95% highest posterior density interval [HPD], 1979[HPD], to 1996, coinciding with the first global report of bla CTX-M-1 (21). Of the 47 isolates, 36 (77%) carried bla CTX-M-1, which was uncommon in the rest of the bacterial population. All 36 isolates were from turkeys, representing a bla CTX-M-1 poultryassociated lineage, for which the TMRCA was 2011 (95% HPD, 2010 to 2013) (Fig. S4), suggesting acquisition of bla CTX-M-1 by this lineage between 1989 and 2011.
We then compared the genetic relatedness of the 431 livestock/meat E. coli isolates with the 1,517 E. coli isolates associated with human bloodstream infections. A maximum likelihood phylogenetic tree of the 1,948 genomes based on 277,533 core gene SNPs demonstrated high genetic diversity overall, with limited phylogenetic intermixing between isolates from humans and livestock (Fig. 1c). Pairwise SNP analysis between human-and livestock/meat-associated isolates demonstrated a median SNP distance of 41,658 (range, 10 to 47,819; interquartile range [IQR], 34,730 to 42,348), with 5 and 1 human isolates falling within 50 SNPs of livestock and meat, respectively (Fig. S5). Network analysis based on a range of SNP cutoffs captured just 2 (0.1%) human isolates (from hospitals in the South East and North West) that were within 15 SNPs of livestock isolates (2 pig isolates and 1 turkey isolate from three different farms [ Fig. 2]). In contrast, we observed highly related isolates (0 to 5 SNPs) from the same animal species on different farms (Fig. 2). The results shown are limited to those isolate pairs identified in a pairwise comparison that differed by Յ15 or less SNPs in the core genome. The place of origin for each isolate pair are connected by lines, and the style of the line reflects the SNP distance. The asterisk indicates one ST69 human isolate from hospital 10 linked to two ST69 turkey isolates from farm 23 that differed by 10 and 12 SNPs, respectively. The number or hash sign indicates one ST1081 human isolate from hospital 5 linked to one ST1081 pig isolate from farm 4 (differed by 10 SNPs) and 2 ST1081 (probably duplicate) pig isolates from farm 2 that differed by 14 SNPs.
The E. coli isolated during this study are likely to be an underrepresentation of the diversity of E. coli in the wider UK livestock population and the meat sold in supermarkets, which could reduce our power to detect a transmission event between livestock and humans or vice versa. To explore this further, we undertook additional analyses using UK livestock isolates in the public domain; specifically, the published livestock genomes held in Enterobase (http://enterobase.warwick.ac.uk), which comprised 51 genomes of isolates cultured between 1999 and 2013. The 24 STs in this collection were compared with the STs assigned to our invasive isolate collection from across the UK. A single ST in this new data set was also present in our bloodstream collection (ST398), but this had already been identified in our livestock collection. This provides further support that sharing of invasive E. coli lineages between humans and livestock in the UK is uncommon.
We evaluated and compared the accessory (non-conserved) genome of the 1,948 study isolates using principal-component analysis (PCA). Principal components 1 (PC1) and 2 (PC2), which accounted for 50.5% and 8.3% of the variation within the data, respectively, separated the collection into two main clusters (referred to as group 1 or group 2, respectively). Group 1 predominantly contained human isolates, and group 2 contained a mixture of human and livestock isolates (Fig. S6a). PCA also showed that isolates from the same STs clustered together and formed distinct subclusters within groups 1 and 2 (Fig. S6b). Table S3 lists the top 100 genes from PC1 and PC2 that were most strongly associated with group 1 or 2.
Genetic analysis of antimicrobial resistance genes and associated mobile genetic elements. Screening of the 1,948 isolates for accessory genes encoding antibiotic resistance revealed that 41 different resistance genes were present in isolates from both humans and livestock (Fig. 3a). The prevalence of resistance genes in the two groups varied considerably, with some predominating in the human or livestock reservoir only, while others were common in both (Fig. 3b). The seven most frequently shared genes One Health Genomic Surveillance of Escherichia coli ® (each present in Ͼ300 isolates) conferred resistance to beta-lactams (bla TEM-1 ϭ 882), sulfonamides (sul2 ϭ 530, sul1 ϭ 522), aminoglycosides (strA ϭ 509, strB ϭ 478), and tetracyclines (tetA ϭ 423, tetB ϭ 335). The predominant genes conferring resistance to extended-spectrum cephalosporins were bla CTX-M-15 (human ϭ 87, livestock ϭ 32) and bla CTX-M-1 (human ϭ 1, livestock ϭ 82, meat ϭ 13). No carbapenemase or colistin resistance genes were detected.
Further characterization of bla CTX-M plasmids was undertaken using long-read sequencing. Two livestock-human isolate pairs positive for bla CTX-M-1 or bla CTX-M-15 were selected for sequencing using the PacBio RSII instrument. Illumina reads for the entire study collection were then mapped to the complete plasmid assemblies of these four isolates. The single bla CTX-M-1 -positive human isolate contained an IncI1 bla CTX-M-1 plasmid that was highly similar (Ͼ99% identity and Ն98% coverage) to 28 livestock isolates (chicken ϭ 18, chicken meat ϭ 8, pig ϭ 2) belonging to four different STs. In contrast, the bla CTX-M-15 plasmid in the livestock (E01) and human (D01) isolate pair were dissimilar (17% sequence shared at 99% identity [ID]) and had different replicon types (E01 ϭ IncHI2, D01 ϭ IncFIA and FII fusion). The human bla CTX-M-15 plasmid (D01) was not identified in any other isolate (human or livestock), while the livestock bla CTX-M-15 plasmid (E01) was found in other livestock isolates from the same pooled fecal sample from 1 beef farm.
We then investigated whether bla CTX-M-15 could be shared on a smaller transposable element. A 7,926-bp region encoding bla CTX-M-15 that was identical to a Tn3 transposon previously identified from E. coli plasmid GU371928 (22) was detected in 22/32 (69%) livestock isolates (pig ϭ 11, dairy cattle ϭ 11) from 4 farms, and 3/87 (3%) of bla CTX-M-15 -positive human isolates from 2 hospitals, one of which was located in the East of England. This 7,926-bp region was flanked by 5-bp direct repeats of TTTTA, indicating its potential for transfer between isolates.

DISCUSSION
We investigated the prevalence and genetic relatedness of E. coli from livestock, meat, and humans in the East of England using a "One Health" approach. ESBL-E. coli was isolated from 55% of livestock farms, with a frequency of ESBL-E. coli in different livestock species that was consistent with previous findings (23). In addition, ESBL-E. coli bacteria were found in 18% of prepackaged fresh meat products. The high prevalence of ESBL-E. coli in chicken meat (16/30 [53%]) is similar to previous studies conducted in the UK and the Netherlands (12, 24, 25). However, E. coli from livestock were not closely related to isolates causing human disease in our region, suggesting that livestock are not a direct source of infecting isolates and that human invasive E. coli are not being shared with livestock. E. coli phylogroup B2 was most frequently associated with human invasive samples (68%) as previously reported (26), but was rarely identified in livestock (1%), providing further evidence for distinct populations associated with invasive human disease and livestock. In contrast, highly related isolates were identified between the same livestock species on different farms. Previous studies in the Netherlands that compared isolates from clinical and livestock sources using MLST indicated that the same ST could be isolated from humans and livestock (12-14, 27). We replicated this finding for CC10 and CC117, but using the more discriminatory sequence-based analysis identified that isolates from the two reservoirs were genetically distinct. A study of cephalosporin-resistant E. coli in the Netherlands (15) reported genetic heterogeneity between human and poultry-associated isolates but closely related isolates from farmers and their pigs. Here, we included ESBL-positive and non-ESBL E. coli, an important feature of the study since the majority of E. coli human infections in the UK are due to non-ESBL E. coli (17).
Screening of E. coli isolates from livestock, meat, and humans with serious infections revealed the frequency of antimicrobial-resistant genes in each reservoir and confirmed the presence of similar antimicrobial resistance genes in both livestock and humans, including bla TEM-1 , sul2, sul1, strA, strB, tetA, tetB, bla CTX-M-15 , and bla CTX-M-1 . These genes confer resistance to four antibiotic classes, all of which are used in both livestock and humans (28). This confirms their ubiquitous distribution but does not provide evidence for recent transfer of genes between the two reservoirs. To address this, we hypothesized that recent sharing would be associated with transmission via the same or highly related mobile genetic elements (MGEs), as previously suggested for ESBL genes (15).
One Health Genomic Surveillance of Escherichia coli ® Previous studies have highlighted the challenge in reconstructing plasmids and other mobile elements encoding resistance genes from whole-genome sequencing (29,30), hindering our understanding of the transmission dynamics of resistance genes. We developed an approach to detect and genetically compare mobile elements across our large study collection, with validation of findings for ESBLs using long-read sequencing. The findings from this were consistent with predominantly distinct mobile elements between livestock and humans, with an estimated 69/1,517 (5%) human isolates potentially sharing closely related antimicrobial resistance-associated mobile elements with those found in livestock.
Our study has several limitations. We acknowledge that the E. coli from humans predated the surveys of farms and retail meat but took account of this by identifying relatedness based on a 0 to 15 SNP cutoff given the estimated E. coli mutation rate of 1 SNP/core genome/year (19,20). We did not include all possible sources of E. coli for humans (for example, vegetables, fruits, and pets), although a recent study found no E. coli with bla CTX-M-15 (the dominant human ESBL type) in retail meat, fruit, and vegetables in five UK regions (24). Additional studies are required to understand whether our findings will be reproduced in other geographical areas, to determine other sources of invasive lineages such as wastewater or recreational waters, to better understand within-host diversity in livestock, to differentiate between historical and recent transmission events by collecting data over a longer time period, and to identify whether livestock are a source for other types of infection in humans such as urinary tract infections.
In conclusion, this study has not generated evidence to indicate that E. coli causing severe human infections in our region were derived recently from livestock, with host-specific E. coli lineages identified from hospitals versus farms. We identified limited sharing of antimicrobial resistance genes between livestock and humans based on long-read sequencing and analysis of mobile genetic elements. Further investigations are required to pursue the identification of the source of E. coli and resistance genes in isolates associated with severe human disease.

Sampling of livestock feces and retail meat. A cross-sectional survey was performed between
August 2014 and April 2015 to isolate E. coli at 20 livestock farms (10 cattle and 10 pig) in the East of England. A pooled sample of approximately 50 g of freshly passed fecal material was collected from each major area in a given farm (such as different pens) using a sterile scoop (Sterilin X400; Thermo Fisher Scientific, Loughborough, United Kingdom). Each pooled sample was placed into a dry sterile 150-ml container (Sterilin polystyrene containers; Fisher Scientific). A median of 4 samples (range, 1 to 5) were taken from each cattle farm, and a median of 4.5 samples (range, 3 to 9) were taken from each pig farm, resulting in a total of 85 pooled samples (34 cattle and 51 pig). In addition, cecal contents were collected from 2 deceased pigs at the time of necropsy.
Poultry reared at nine farms (4 chicken and 5 turkey) in the East of England were sampled at two abattoirs between February and April 2015. Two sample types were taken for each farm: (i) pooled feces with a total weight of approximately 50 g from 10 to 20 transportation crates immediately after the livestock were removed; (ii) pools of cecal material from up to 10 birds after slaughter. Each sample was taken using a sterile scoop, and a sterile surgical scalpel was used for each cecal dissection. A median of 4 (range, 2 to 4) cecal pools and 4 (range, 3 to 4) fecal pools were collected from livestock from each chicken farm, and a median of 1.5 (range, 1 to 2) cecal pools and 2.5 (range, 2 to 3) fecal pools were collected from livestock from each turkey farm. This resulted in a total of 49 pooled samples (29 chicken and 20 turkey). All samples were immediately refrigerated at 4°C upon return to the laboratory and processed on the same day.
In April 2015, 97 retail meat samples (beef [15], chicken [30], pork [42], turkey [7], venison [1], veal [1], mixed minced pork and beef [1]) were purchased from 11 supermarkets in Cambridge, UK, with 5 to 16 meat products collected from each supermarket that were selected to capture diversity in the products available. The country of origin for each meat product was recorded, and where multiple countries/ regions were stated on the packaging, all names were recorded.
Microbiology. Pooled fecal samples were diluted 1:1 with sterile phosphate-buffered saline and mixed vigorously, and 100-l aliquots were plated onto Chromocult coliform agar (VWR, Leuven, Belgium) and Brilliance ESBL agar (Oxoid, Basingstoke, UK), which are selective chromogenic agars that support the growth of coliforms and ESBL-producing organisms, respectively. Agar plates were incubated at 37°C for 48 h in air prior to inspection. Enrichment cultures were also used to detect ESBL-producing E. coli by adding 1 ml of fecal preparation to 9 ml of tryptic soy broth containing 20 g cefpodoxime and incubating for 24 h in a shaking incubator (150 rpm) at 37°C in air, before 100 l was plated onto Brilliance ESBL agar and incubated for 48 h in air. Numerous E. coli colonies were picked from primary cultures of positive pooled stool samples based on diversity in colonial morphology. Up to 32 colonies of presumptive E. coli based on colony morphology were picked from samples taken on each farm.
Preparation and culture of meat samples followed the European standard ISO 6887-2:2003. All exterior packaging was disinfected with alcohol prior to removal of meat. A 5-g sample of meat was aseptically removed, added to 45 ml peptone broth, and homogenized using a Stomacher paddle blender (Stomacher80 Laboratory System, Seward Ltd., UK) for 2 min. Samples were transferred into 50-ml Falcon tubes and incubated in a shaking incubator for 24 h at 150 rpm at 37°C. After incubation, all samples were plated onto Brilliance ESBL agar and incubated at 37°C for 48 h. In addition, swabs were obtained from whole chicken carcasses and incubated in 3 ml brain heart infusion (BHI) broth (FlO-QSwabs; Copan Italia spa, Brescia, Italy) in a shaking incubator for 24 h at 150 rpm at 37°C. Following incubation, 100 l was plated onto Brilliance ESBL agar and incubated as described before. One colony of presumptive ESBL E. coli was picked from each Brilliance ESBL agar for further evaluation, with the exception of two meat samples, where two colonies were picked to represent each of two distinct colony morphologies.
All bacterial colonies suspected to be E. coli were identified to the species level using matrix-assisted laser desorption ionization-time of flight mass spectrometry (Bruker Daltonik, Bremen, Germany). Antimicrobial susceptibility was defined for each E. coli colony using the Vitek2 system (bioMérieux, Marcy l'Etoile, France) with the AST-N206 card and calibrated against EUCAST breakpoints (http://www .eucast.org/clinical_breakpoints/). DNA sequencing. Bacterial genomic DNA was extracted using the QIAxtractor (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Library preparation was conducted according to the Illumina protocol and sequenced on an Illumina HiSeq2000 (Illumina, San Diego, CA, USA) with 100-cycle paired-end runs. Sequence data were retrieved for a further 1,517 open access E. coli isolates associated with bloodstream infections (17,18). Of these, 424 were isolated between January 2006 and December 2012 at the Cambridge University Hospitals NHS Foundation Trust, and 1,093 were submitted to the British Society for Antimicrobial Chemotherapy Bacteraemia Resistance Surveillance Project by 11 UK hospitals between 2001 and 2011 (for details, see www. bsacsurv.org and Table S1 in the supplemental material) (31). Previous description and analysis of these genomes (17,18) did not include comparisons with isolates from livestock or meat.
Pan-genome analysis. The pan-genome was calculated for all 1,948 isolates using Roary (40), with a 90% ID cutoff and genes classified as "core" if they were present in at least 99% of isolates. A maximum likelihood tree was created using RAxML (41) based on single nucleotide polymorphisms (SNPs) in the core genes. Principal-component analysis was performed across the 1,948 isolates based on the accessory genes from Roary using R. A Spearman rho correlation analysis was performed on the principal components, the gene absence/presence data, and the ST and source of isolation metadata.
Phylogeny-based analysis of individual lineages. Lineage-specific analyses were performed by mapping the sequence reads for isolates belonging to clonal complex 10 (CC10) and CC117 to an E. coli reference genome from the same clonal complex using SMALT 0.7.4 (http://www.sanger.ac.uk/resources/ software/smalt/). E. coli MG1655 K-12 (ENA accession number U00096.2) was used as the reference genome for CC10, and a de novo assembly of the CC117 study isolate (ENA accession number ERR1204146) with the lowest number of contigs was used as the reference genome for CC117 as no reference genomes were available. To create a "core" genome, mobile genetic elements (MGEs) were identified using gene annotation, PHAST (phast.wishartlab.com), and BLAST (https://blast.ncbi.nlm.nih .gov) and removed, together with contigs less than 500 bp in length. Recombination was removed using Gubbins (42). A maximum likelihood phylogeny was created using RAxML (41) with 100 bootstraps and a midpoint root. Genetic diversity was calculated based on pairwise differences in SNPs in the core genomes using an in-house script. Visualization of phylogenetic trees was performed using iToL (http://itol.embl.de) (43) and FigTree v 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). Bayesian Evolutionary Analysis Sampling Trees (BEAST). All published genomes for CC117 in the Enterobase online database (https://enterobase.warwick.ac.uk, accessed 10 June 2016) that had been generated on an Illumina instrument and had the country, year, and source of isolation available were identified (ERR769196, ERR769195, ERR769183, ERR769169, SRR1314275, SRR3410778, and SRR3438297). These were mapped to the CC117 reference using SMALT, combined with the CC117 study isolates, and mobile elements and recombination were removed as before. Dating of this lineage was completed using BEAST v. 1-8 (44). BEAST v.1-8 was run using the Hasegawa, Kishino, and Yano (HKY) and gamma substitution model. We compared combinations of three population size change models (constant, exponential, and Bayesian skyline plot) and three molecular clock models (strict, exponential, and uncorrelated lognormal). A Bayesian skyline population model and an uncorrelated lognormal molecular clock were selected based on Bayes factors calculated from path sampling and stepping stone sampling (44,45).
One Health Genomic Surveillance of Escherichia coli ® Detection of antimicrobial resistance and mobile elements. Acquired genes encoding antibiotic resistance were identified using Antibiotic Resistance Identification By Assembly (ARIBA), comparing the study genomes against an in-house curated version of the Resfinder database (46)(47)(48) consisting of 2,015 known resistance gene variants. Genes were classified as present using an identity of 90% nucleotide similarity. Genes reported as fragmented, partial, or interrupted were excluded.
For all isolates positive for the bla CTX-M-1 , bla CTX-M-15 , bla TEM-1 , sul1, strA, strB, sul2, tetA, and tetB genes, whole-genome assemblies were screened to identify the contig carrying the antimicrobial resistance (AMR) gene using the blastn application (49) with the AMR gene sequence as the query sequence. The identified contigs were then aligned against a previously curated database of complete Enterobacteriaceae plasmids (50) in order to filter out sequences representing E. coli chromosome fragments. For each AMR gene, a database containing unique AMR-carrying contigs was created using cd-hit-est (51) based on 90% identity cutoff. To determine contig carriage, for each of the eight AMR genes, all isolates positive for that gene were mapped against the respective gene-specific database of contigs using short-read sequencing typing (SRST2) using a minimum 90% coverage cutoff. The contig presence/absence data were converted into a distance matrix, and hierarchical clustering was performed using R function hclust and the ward.D2 method.
To examine plasmids carrying the ESBL genes bla CTX-M-15 and bla CTX-M-1 , two pairs of livestock and human isolates were selected for sequencing on the PacBio RSII instrument (Pacific Biosciences, Menlo Park, CA, USA) (n ϭ 4), and in silico PCR was used to perform plasmid incompatibility group/replicon typing (52). These two genes were selected, as they were the most prevalent ESBL genes found in the 1,948 isolates. The pair of bla CTX-M-15 -positive isolates was selected, as they contained 5 identical genes encoding antimicrobial resistance. The single human bla CTX-M-1 -positive isolate in the collection was selected, and a livestock isolate with the most similar resistance gene profile was selected, with both isolates containing 3 identical genes encoding antimicrobial resistance. DNA was extracted using the phenol-chloroform method (53) and sequenced using the PacBio RS II instrument. Sequence reads were assembled de novo with HGAP v3 (54) within the SMRT Analysis version 2.3.0 software (https://www .pacb.com/products-and-services/analytical-software/smrt-analysis/), circularized using Circlator v1.1.3 (55) and Minimus 2 (56), and polished using the PacBio RS_Resequencing protocol and Quiver v1 (54). Fully assembled plasmids were compared using WebACT (http://www.webact.org) and BLASTn (https:// blast.ncbi.nlm.nih.gov).
Ethical approval. The study protocol was approved by the Cambridge University Hospitals NHS Foundation Trust Research and Development Department (reference A093285) and the National Research Ethics Service East of England Ethics Committee (reference 12/EE/0439 and 14/EE/1123).
Data availability. Sequence data for all isolates have been submitted to the European Nucleotide Archive (www.ebi.ac.uk/ena) under study accession number PRJEB4681 (all human E. coli), PRJEB8774 (non-ESBL-producing E. coli from livestock), and PRJEB8776 (ESBL-producing E. coli from livestock and meat), with the accession numbers for individual isolates listed in Table S1 in the supplemental material.

ACKNOWLEDGMENTS
We thank the Wellcome Sanger Institute core library construction, sequence and informatics teams, and the Pathogen Informatics team. We thank the staff at farms and abattoirs for assistance in sample collection and Elizabeth Lay for laboratory support during the meat survey. We thank Olivier Restif for statistical advice.