Despite Shared Geography, Campylobacter Isolated from Surface Water Are Genetically Distinct from Campylobacter Isolated from Chickens

Campylobacter jejuni causes gastroenteritis in humans, and chickens and environmental water are leading sources of infection. We tested the hypothesis that Campylobacter isolated from chicken ceca and river water in an overlapping geographic area would share genetic information. ABSTRACT We tested the hypothesis that Campylobacter isolated from chicken ceca and river water in an overlapping geographic area would share genetic information. Isolates of C. jejuni from chicken ceca were collected from a commercial slaughter plant and isolates of C. jejuni were also collected from rivers and creeks in the same watershed. Isolates were subjected to whole-genome sequencing and the data were used for core genome multilocus sequence typing (cgMLST). Cluster analysis showed that there were four distinct subpopulations, two from chickens and two from water. Calculation of fixation statistic (Fst) showed that all four subpopulations were significantly distinct. Greater than 90% of the loci were differentiated by subpopulation. Only two genes showed clear differentiation of both chicken subpopulations from both water subpopulations. Sequence fragments of the CJIE4 bacteriophage family were found frequently in the main chicken subpopulation and the water outgroup subpopulation but were sparsely found in the main water population and not at all in the chicken outgroup. CRISPR spacers that targeted the phage sequences were common in the main water subpopulation, only once in the main chicken subpopulation, and not at all in the chicken or water outgroups. Restriction enzyme genes also showed a biased distribution. These data suggest that there is little transfer of C. jejuni genetic material between chickens and nearby river water. Campylobacter differentiation according to these two sources does not show clear evidence of evolutionary selection; the differentiation is probably due to geospatial isolation, genetic drift, and the action of CRISPRs and restriction enzymes. IMPORTANCE Campylobacter jejuni causes gastroenteritis in humans, and chickens and environmental water are leading sources of infection. We tested the hypothesis that Campylobacter isolated from chicken ceca and river water in an overlapping geographic area would share genetic information. Isolates of Campylobacter were collected from water and chicken sources in the same watershed and their genomes were sequenced and analyzed. Four distinct subpopulations were found. There was no evidence of sharing genetic material between the subpopulations. Phage profiles, CRISPR profiles and restriction systems differed by subpopulation.

group (6 isolates). One isolate from chicken, C2H8, was amid the main water group and one isolate from water, SFBRC-13, was in a clade with chicken outgroup isolates.
The complete results of a locus-by-locus fixation statistic (Fst) analysis, with pairings of each subpopulation that was distinguished, is shown in Table S2. An Fst statistic indicates population differentiation when the statistic is significantly higher than zero, as indicated by the P value. The Fst P values are shown in Table S2. A total of 1,343 loci were tested. The number of alleles for each locus ranged from 2 to 40, with an average of 19.2 alleles per locus and a median of 19 alleles per locus. Comparisons were made for all four subpopulations together for which 98.5% of the loci showed significant differentiation. Of the 20 loci FIG 1 Neighbor joining tree from concatenated sequences and associated metadata derived from cgMLST of chicken and river water C. jejuni isolates. The numbers in the circle just outside the tree (i.e, 106xxx) are the numeric ID codes assigned to each isolate by the PubMLST database. The IDs inside the outer ring are isolate identifications. Strains in blue were recovered from river water and the rest were recovered from chickens. The outermost colored ring is coded as follows: black = chicken main group; green = chicken outgroup; blue = water main group; yellow = water outgroup. The concentric circles, which are numbered on the left side, are marked with dots for the corresponding CRISPR spacers (red dots) and their targets (blue dots) that were found in each isolate. that did not show differentiation in the four-subpopulation comparison, only one had greater than 11 alleles (median of 5 alleles per locus). In main versus main or main versus outgroup subpopulation comparisons, 90.4 to 98.4% of the loci that were analyzed showed significant differentiation, whereas in outgroup versus outgroup subpopulation comparisons, only 62.6% of those loci showed significant differentiation. To identify loci that specifically differentiate both main chicken subpopulations from both main water subpopulation isolates, a series of steps were taken as illustrated in Table S2. First, loci that differentiate the main chicken and chicken outgroup subpopulations were removed, then loci that differentiated the main water from the water outgroup subpopulations were removed, and then the loci that did not differentiate the main chicken and main water subpopulations were all removed from the table. By removing these loci, we were left with only three loci that differentiated chicken and water isolates in all circumstances. These were CAMP0776 (Cj0841c, molybdenum cofactor guanylyltransferase protein B), CAMP1231 (Cj1313, pseH, N-acetyltransferase specific for PseC product, UDP-4-amino-4,6-dideoxy-beta-L-AltNAc), and CAMP1808 (Cj1070, rpsF, 30S ribosomal protein S6).
Only one isolate (C3I6 from the chicken outgroup, rep21_1_rep(pWBG758) GQ900396 [97% identity] and rep 7a_20_Sep203(pSE1228p02) AE015931 [97% identity]) showed evidence of known plasmids by scanning for replicons. Also, one other isolate (a water isolate, SFBRC-66) showed evidence of a plasmid by MGEfinder. MGEfinder found a total of 456 putative MGEs in the total population of analyzed isolates. This study focused only on prophages because of their potential for carrying genes between strains. Seventy-one prophage fragments were found (Table S3); 47 transposon-related fragments were also found (data not shown), but they were not further characterized because none were identified as CRISPR targets (see below).
A total of 84 genomes (77%) in our collection contained a single CRISPR array. The consensus direct repeat sequence was: 59 GTT TTA GTC CCT TTT TAA ATT TCT TTA TGG TAA AAT 39 (36 nucleotides) and all but four out of 297 spacer sequences were 30 nucleotides in length. Across the entire population, 109 distinct spacer sequences were identified. These spacer profiles (Table S4) track well with the cgMLST phylogeny, with closely related genomes sharing the same CRISPR profiles. The CRISPR spacer sequences determine the target specificity. They were all tested for targeting against a database that included all the MGEs that were found in the test population by MGEFinder. We identified putative targets for 34 of the spacers, 11 of which had complementary targets carried by any member of the population. Four spacers had targets that were known to be plasmids from the Campylobacter genus, but none of the plasmid targets were found in the test population (a complete listing of CRISPR arrays and targets is shown in Table S4). The 11 spacer sequences and the identities of the MGEs that they target are listed in Table 1. All the genes that were targeted by CRISPRs found within the test population were phage components. Spacers with test-population targets were arbitrarily assigned numbers 1 through 11 (corresponding to rings 1 to 11 in Fig. 1). The distribution of spacers in the test population is illustrated in Fig. 1 where the red dots indicate the presence of CRISPR spacers, and the blue dots indicate the presence of the putative prophage (not necessarily the whole phage) that is targeted by that spacer. Some isolates, for instance SFBRC-66 (to the left of the top Fig. 1), had both the spacer and the target for some of the specificities (red dot and blue dot are together in Fig. 1).
Most of the spacers found were in isolates recovered from river water. Only two chicken isolates, C2H8 and C1A9, had a spacer with a Campylobacter target. C2H8 clustered amid the water isolate clade ( Fig. 1) and shared one of its four spacers, which targeted known plasmids from Campylobacter not found in the test population, with four water isolates. C1A9 had a spacer against target no. 7 in the test population, and that isolate is distinct, being at the tip of a long branch by itself on the dendrogram (Fig. 1). Not all water isolates had spacers or targets, but spacers with targets within the study population (red dots in Fig. 1) were significantly (chi square P , 0.01) more likely in water main group isolates than in chicken main group isolates (Table 2). Interestingly, chicken main group isolates were significantly more likely than water main group isolates to have targets (blue dots in Fig. 1). The water outgroup had no spacers that had any Campylobacter target and had more targets per isolate than the rest of the water isolates. The chicken outgroup was almost barren of spacers or targets. Of note, the water isolate SFBRC-13 had a CRISPR array containing two spacers that was identical to arrays to the ones borne by the eight chicken isolates found in the same clade.
The results for classifying restriction enzyme genes carried by each isolate is illustrated in Table S6 and shows that each of the four subpopulations is distinct. This was confirmed by Fst analyses which showed each subpopulation was significantly differentiated from every other subpopulation (Table 3). Chicken isolates had, on average, more (5.49 for the main chicken group and 6.09 for the chicken outgroup) restriction enzyme genes than the water isolates (4.09 for the main water group and 4.14 for the water outgroup). The recognition site for most enzymes in the analysis are unknown. See Table S6 for indications of known recognition sites. Note that recognition sites can differ within the homology groups (indicated by the color codes in row 4 of Table S6).

DISCUSSION
Deciding members of a population to fit with a phylogram involves making decisions based on the length of branches and placement of clades while considering how rotating a clade at a node would affect the placement. In this analysis, the decisions were influenced by the desire to compare chicken isolates with water isolates. This may introduce errors  that would result in a more conservative conclusion. The difference between the chicken outgroup and the water outgroup was the weakest of the population pair assignments. However, the fixation statistic showed that the differentiation of the two outgroups was significant, and the assignments were supported. Cluster analysis based on cgMLST of the Campylobacter jejuni study population yielded an almost complete differentiation of isolates sourced from chickens versus environmental water. It was surprising to find smaller subpopulations of both groups that were well differentiated from the main groups and less differentiated from each other. Of the loci that shared alleles between chicken and water isolates (i.e., did not differentiate them), there was a bias toward ribosomal protein genes reflecting the strong conservation of those genes. This is supported by the observation of fewer alleles (lower allelic richness) among loci that did not differentiate the subpopulations. The pattern of allelic distribution is characteristic of subpopulations that have recombination barriers between the different subpopulations.
The design of this experiment used isolates collected in the same general geographical area. It may be claimed that there are reasons that strains of Campylobacter do not cross over between chickens and environmental water. However, of the 106 isolates that were analyzed for this study, there was one isolate from a chicken (C2H8) that was placed in the main water cluster and one water isolate (SFRBC-13) that was placed in the chicken outgroup cluster. Thus, the geographic barrier is at best incomplete, suggesting that there was some potential for crossing the environments, thus creating an opportunity for exchange. But we found no evidence that such exchange occurred, i.e., there was no recent sharing of alleles between the subpopulations, suggesting the barrier was not just geographic isolation.
Given the length of branches in the dendrogram that separates chicken subpopulations from water subpopulations, it appears that the separation has been longstanding. So, it is plausible that habitat-specific adaptations would have accumulated in the subpopulations. This was tested by performing the locus-by-locus Fst test for differentiation. It was happenstance that there were two distinct subpopulations for both the chicken and water isolates. We suspect that meaningful differentiation of the chicken and water subpopulations had to occur in both the main group and outgroup subpopulations. There were only three loci that met that criterion for differentiation of both water subpopulations from both chicken subpopulations. One was CAMP1808 (also known as Cj1070, 30S ribosomal protein S6). It is difficult to offer any explanation beyond chance for this result, but it is possible that alleles for the subpopulations allowed for better expression of the gene to give an advantage for one or the other, or both, of the habitats. CAMP0776 (also known as Cj0841c, putative molybdopterin-guanine dinucleotide biosynthesis protein) was another gene that showed differentiation. This is notable since Mourkas et al. (17) listed the putative molybdopterin-guanine dinucleotide biosynthesis protein as a candidate adaptive gene for differentiation of ST-61 complex C. jejuni from the ST-21 clonal complex. C. jejuni CC-61 is a cattle specialist that may have emerged from the host-generalist C. jejuni CC-21. In Escherichia coli, putative molybdopterin-guanine dinucleotide biosynthesis protein expression was found to be regulated by motility-dependent surface sensing (18), and thus may have a role in flagellum-mediated motility.
The third differentiating gene was CAMP1231 (also known as Cj1313, N-acetyltransferase specific for PseC product, UDP-4-amino-4,6-dideoxy-beta-L-AltNAc). The enzyme product is part of the biosynthetic pathway for pseudaminic acid, which is used in posttranslational modification of flagellin and essential for motility of H. pylori and C. jejuni (19,20). Thus, CAMP0776 and CAMP1231 have documented association with virulence properties and their differentiation may well reflect selection for increased fitness in chickens.
Having found the dramatic differentiation of the four subpopulations, we sought to determine what barriers may exist. The pattern of allelic diversity seen in these subpopulations is characteristic of allopatric diversification, i.e., lack of sharing of alleles due to geographic barriers such as seen by Pascoe et al. (21) among trans-Atlantic subpopulations of C. jejuni, despite our study subpopulations not being from widely different geographic areas. The outgroups were direct geospatial neighbors of their associated main groups. There are numerous mechanisms that can play a role in preventing exchange of DNA between strains of bacteria. However, there appears to be sharing of DNA within the subpopulations, pointing to specificity of accepting DNA. Because of the paucity of plasmids in these subpopulations (one in chicken isolates and one in water isolates), it was concluded that plasmids did not play a substantial role in the differentiation of the subpopulations from each other or in the subpopulation cohesion, i.e., by sharing of genes within the group. Other important mechanisms for specific barriers to DNA exchange are CRISPR-Cas directed digestion of DNA (10,12) and restriction/modification systems (14,15). There are also physiological barriers to transfer of DNA, such as modification of receptor sites for phage.
Only 11 CRISPR spacer sequences complemented targets in any of the analyzed subpopulations. Five of the spacers targeted genes found in CJIE4 phage. This finding supports earlier reports that CJIE4-like phages are the most common type of phages in C. jejuni (22,23). CJIE2-like prophages were also targeted by five of the CRISPR spacers and the remaining spacer targeted CJIE1-like prophages. Using MGEfinder, 31 unique fragments of prophages were identified (Table S5). Some of these varied only by single nucleotide polymorphisms or by slightly different start and/or end sites. These fragments occurred in the test population in differing mixes (Table S3) resulting in 29 different MGEs. Only one of the phage-like MGEs, c399_g24_s399, had no CRISPRs that targeted it; however, c399_g24_s399 is an unusual fragment that is much shorter than the other MGEs and has pieces of sequence that are homologous to several fragments scattered in the Campylobacter jejuni strain MTVDSCj16 chromosome (GenBank number CP017033). All the CRISPRs identified as targeting phages had at least one isolate among the test population that carried its target. There were only two spacers that were shared between chicken and water isolates. These spacers were in a single array in all the members of the mixed clade containing isolate SFBRC-13 (Fig. 1, right) and their target specificities could not be determined (Table S6).
Targets for restriction enzymes are more difficult to define since the sequences they recognize are small enough to occur randomly and most recognition sequences are unknown. It is well established that restriction enzymes are defense mechanisms that bacteria use against phages (24); restriction enzymes are accompanied by modification systems that usually work by blocking the recognition sites through methylation of key nucleotide bases. Phages with their restriction recognition sequences modified will survive in a resistant host (e.g., as a prophage) and infect a new host that has the same restriction system. Therefore, a phage is much more likely to succeed in colonizing a new host that either has no restriction enzymes or has the same ones that were in the donating host. The current data show that C. jejuni isolates from chickens or river water in general have at least one restriction/modification system. The overall profiles of restriction enzymes differ for each of the analyzed subpopulations. Therefore, it is unlikely that these subpopulations are actively exchanging DNA.
In summary, this study showed that there were four well-differentiated subpopulations of C. jejuni in chickens and river water in the Broad River watershed area of Northeast GA, USA. The alleles for cgMLST were shared predominantly only within the subpopulations suggesting that there were no barriers for sharing alleles within a subpopulation. Phage profiles, CRISPR profiles, and restriction systems differed by subpopulation. These alone have the capacity to create barriers between the subpopulations, though this was not tested here. While we cannot tell which came first, the segregation/isolation of the subpopulations, or the segregation of the phage, CRISPRs, and restriction enzyme genes, once these segregations have started it is likely that they are self-amplifying.

MATERIALS AND METHODS
The 39 surface water isolates of C. jejuni used in this study were collected in 2012 and 2013 from the South Fork of the Broad River in Northeast GA, USA, and have been described by Meinersmann et al. (25). At that time, they were typed by a 7-gene MLST scheme (26). For the purposes of this study, only one example of any given MLST type from each location and sample day was included in the analysis.
The 67 chicken C. jejuni isolates were recovered from chicken ceca at a commercial broiler slaughter plant to which live birds are trucked from farms in the area of the Broad River and northern Oconee River watersheds. Ceca were collected and Campylobacter cultured using the procedure outlined in Berrang et al. (27). On a given day, a single cecal sample was collected at the slaughter plant such that each chicken isolate represents a unique flock, defined as one house of market age broiler chickens, in the period from 2013 to 2015. No records are available as to the exact location of each farm/chicken house within the watershed or whether litter was disposed from these farms by spreading on fields that impacted the studied watershed.
DNA was prepared from each isolate using Qiagen DNeasy kits (Qiagen, Germantown, MD, USA), and libraries were prepared for sequencing using Illumina Nextera XT kits (Illumina, Inc., San Diego, CA, USA) and then shotgun sequenced on an Illumina MiSeq using MiniSeq high output reagent kits (2 Â 300 cycles, Illumina, Inc.). Genomic sequences were assembled using SPAdes version 3.13.2 (28,29). Genomic assemblies were deposited into PubMLST (30), where they were analyzed for allelic profiles of the C. jejuni core genome and ribosomal MLST estimation of species (31). Allelic profiles were used for cluster analysis using the BIGSdb Microreact plugin (30,32) implemented on PubMLST. Locus-by-locus Analysis of Molecular Variance (AMOVA) analyses for significance of fixation statistic (Fst) scores in a hierarchical island model (33) were conducted in Arlequin (34).
All genomes were searched for mobile genetic elements (MGE) with MGEFinder (35). MGEfinder identifies DNA sequences of MGE insertions without the need for a reference database. MGEfinder works by loading many genome sequences of the same species at once and looking for pieces of DNA sequence that have changed their context as indicated by changes in the flanking sequences. MGEfinder identifies sequences that are absent from a reference sequence and/or change flanking sequences in different isolates. One output file is a list of the putative MGEs each isolate has for all the members of the test population, and a second output file is a FASTA file of every putative MGE for the whole test group of sequences. To annotate sequences, this FASTA file was used to search the National Center for Biotechnology Information (NCBI) by BLASTN. The returns are often a whole or partial prophage, transposon, insertion sequence, or even an antibiotic resistance gene. All genomic sequences were scanned for plasmid replicons using the database from PlasmidFinder (36) using the Classify function in Geneious Prime (Biomatters, Inc.).
The CRISPR content of the Campylobacter subpopulations was defined using the online tool, CRISPRDetect (http://crispr.otago.ac.nz/CRISPRDetect/predict_crispr_array.html) (37) and verified using CRISPRViz (38). Targets of the CRISPRs were then identified with CRISPRTarget (http://crispr.otago.ac.nz/CRISPRTarget/crispr_analysis .html) (39). This program uses BLASTn to search for matches to the spacer sequences against reference databases with default settings described at http://crispr.otago.ac.nz/CRISPRTarget/CRISPRTarget_help.html. CRISPRTarget uses a resident database of known MGEs as a reference to search against and can incorporate user-supplied reference data. The output file of MGEfinder was added to provide a database that covers the Campylobacter isolates used in the current study. Thus, we defined for the current subpopulations their cgMLST profile, the MGEs of each isolate, and the identity of the isolates with CRISPRs that target those MGEs.
To identify what restriction enzyme genes were carried by each isolate, we used a subset of the genes in the REBASE restriction enzyme database (40) that is maintained by New England Biolabs (http://rebase.neb .com/rebase/rebase.html). The database was filtered to only include genes that were found in the genus Campylobacter without regard to species. The program Cdhit (41) was used to identify redundant sequences and one representative of each sequence was chosen to be part of our database. This database was imported into Geneious Prime (Biomatters, Inc.) and used to classify each C. jejuni genomic sequence. When multiple hits occurred for a given contig, the alignment of hits with the contig were inspected and each hit was classified by choosing the most similar match for each position with a match. REBASE includes both gene and protein sequences for restriction/site-modification enzymes and R/M-associated specificity proteins.