Prevalence and Characterization of CRISPR Locus 2.1 Spacers in Escherichia coli Isolates Obtained from Feces of Animals and Humans

We characterized spacers of CRISPR locus 2.1 in E. coli isolates obtained from the feces of various sources. Phage-derived CRISPR spacers are mainly acquired from the order Caudovirales, and plasmid-derived CRISPR spacers are mostly from the Enterobacteriaceae family. ABSTRACT The clustered regularly interspaced short palindromic repeat (CRISPR) has been studied as an immune system in prokaryotes for the survival of bacteriophages. The CRISPR system in prokaryotes records the invasion of bacteriophages or other genetic materials in CRISPR loci. Accordingly, CRISPR loci can reveal a history of infection records of bacteriophages and other genetic materials. Therefore, identification of the CRISPR array may help trace the events that bacteria have undergone. In this study, we characterized and identified the spacers of the CRISPR loci in Escherichia coli isolates obtained from the feces of animals and humans. Most CRISPR spacers were found to stem from phages. Although we did not find any patterns in CRISPR spacers according to sources, our results showed that phage-derived spacers mainly originated from the families Inoviridae, Myoviridae, Podoviridae, and Siphoviridae and the order Caudovirales, whereas plasmid-derived CRISPR spacers were mainly from the Enterobacteriaceae family. In addition, it is worth noting that the isolates from each animal and human source harbored source-specific spacers. Considering that some of these taxa are likely found in the gut of mammalian animals, CRISPR spacers identified in these E. coli isolates were likely derived from the bacteriophageome and microbiome in closed gut environments. Although the bacteriophageome database limits the characterization of CRISPR arrays, the present study showed that some spacers were specifically found in both animal and human sources. Thus, this finding may suggest the possible use of E. coli CRISPR spacers as a microbial source tracking tool. IMPORTANCE We characterized spacers of CRISPR locus 2.1 in E. coli isolates obtained from the feces of various sources. Phage-derived CRISPR spacers are mainly acquired from the order Caudovirales, and plasmid-derived CRISPR spacers are mostly from the Enterobacteriaceae family. This is thought to reflect the microbiome and phageome of the gut environment of the sources. Hence, spacers may help track the encounter of bacterial cells with bacterial cells, viruses, or other genetic materials. Interestingly, source-specific spacers are also observed. The identification of source-specific spacers is thought to help develop the methodology of microbial source tracking and understanding the interactions between viruses and bacteria. However, very few spacers have been uncovered to track where they originate. The accumulation of genome sequences can help identify the hosts of spacers and can be applied for microbial source tracking.

characterization and identification of spacers of CRISPR locus 2.1. In this study, spacers of CRISPR locus 2.1 in 141 E. coli isolates obtained from humans and animals were characterized to demonstrate the usability of CRISPR spacers for source tracking.
Identification of protospacers from CRISPR systems of E. coli isolates. Identification of spacers of CRISPR locus 2.1 revealed that the majority of these spacers originated from phages (39%) and plasmids (14%), while 32% and 14% originated from unknown and multiple sources, respectively (Fig. 1). Host-identified spacers were most frequently found in ducks (n = 381), followed by pigs (n = 185), humans (n = 81), beef cows (n = 55), dairy cows (n = 3 2), patients (n = 16), and chickens (n = 11) ( Table 2). Except for the "unidentified family," the Myoviridae family was most frequently found (n = 1 to 46 [9.1 to 43.8% of the total]), while spacers from unknown phages ranged from 3 to 69 (0 to 21%). Occurrence of spacers from animals and humans. The occurrence of spacers in animal and human sources was also investigated (Fig. 2). Some common spacers were grouped separately due to their sequence dissimilarities. This is thought to result from the capture and processing of different sequence fragments of the same spacer from foreign DNA sequences. Spacers from Myoviridae, Siphoviridae, and unidentified families of Caudovirales were found in all of the source animals and humans. Spacers from the Podoviridae family were found in groups F and G. Duck E. coli contained the most diverse plasmid-derived bacterial spacers (group F). In addition, we did not observe the co-occurrence of spacers among pigs, cows, and humans (group H), as well as in  2) 0 (0) 9 (4.9) 0 (0) 11 (2.9) 0 (0) 0 (0) Salmonella phage S137 Phaeobacter piscinae strain P13 plasmid pP13_a pigs and cows (group K). Among the spacers from bacterial plasmids, E. coli strain LD91-1 plasmid pLD91-1-76kb-derived spacers commonly occurred among all sources. Network analysis showed that most plasmid-and phage-originated spacers were highly associated with E. coli from animals and humans (Fig. 3). The isolates from ducks (n = 8) carried spacers from bacterial plasmids (Fig. 3a), followed by pigs (n = 7), humans (n = 7), beef cows (n = 4), dairy cows (n = 3), patients (n = 3), and chickens (n = 3). Among the host phages (Fig. 3b), the most diverse spacers were obtained from ducks (n = 5), followed by pigs (n = 4), humans (n = 4), beef cows (n = 3), dairy cows (n = 3), patients (n = 3), and chickens (n = 2). Isolates from ducks were found to harbor spacers from specific host plasmids and phages, such as Phaeobacter piscinae strain P13 plasmid pP13_a and phage Inoviridae.
To investigate the occurrence patterns of spacers, those from each E. coli isolate were arranged according to animal and human sources (Fig. 4). E. coli isolates from beef cows and ducks seemed to encounter host phage-originated spacers more commonly. The CRISPR array of E. coli isolates from all source animals and humans contained a variable portion of the spacer derived from phages and plasmids. Phage-derived spacers were relatively abundant in the CRISPR array of E. coli isolates from beef cows, milk cows, humans, patients, and pigs. A similar portion of plasmid-and phage-originated spacers was distributed in the CRISPR array among chicken isolates. Compared to the spacers among the sources, fewer spacers from the unknown host were found in the chicken isolates. Detection of source animals and human-specific spacers. The most diverse source-specific spacers were found in ducks (n = 88), followed by human patients (n = 58), dairy cows (n = 53), beef cows (n = 26), pigs (n = 19), and humans (n = 2) in E. coli isolates (Table 3). In beef cow and human sources, the 26 and 2 spacers,  respectively, did not overlap. In ducks, D12 and D13 source-specific spacers were most frequently found at n = 4, followed by D11, D32, D97, D104, and D105 (n = 3), another 14 spacers (n = 2), and 67 other spacers (n = 1). In dairy cows, MC294 and MC296 source-specific spacers were most frequently found at n = 5, followed by MC278, MC289, and MC295 (n = 4), MC288 (n = 3), another 10 spacers (n = 2), and the other 37 spacers (n = 1). In human patients, the P433 source-specific spacer was most frequently found as n = 3, followed by another other 10 spacers (n = 2) and the other 46 spacers (n = 1). Among the pig sources, 10 pig source-specific spacers were the most frequent as n = 3, followed by Pig51, Pig352, Pig353, and Pig354 (n = 2) and the other five spacers (n = 1).

DISCUSSION
The CRISPR-Cas system is known as an immune system in prokaryotes through the storage of spacers from foreign DNA sequences (30), which means that the presence of spacers in CRISPR loci indicates an encounter with the invasion of bacteriophages or other genetic materials. With the storage of spacers, prokaryotes logically have the potential to defend themselves against subsequent invasions from bacteriophages. Thus, the identification of spacers in CRISPR loci will help to understand the history of bacterial isolates exposed to bacteriophages or other genetic materials (31) such as mobile genetic elements, antibiotic resistance genes, and virulence genes. Thus, the documentation of a series of spacers may help develop tools for microbial source tracking, with which several studies have reported spacers of CRISPR loci in E. coli isolates from animal and human guts (26,(32)(33)(34). In this study, we characterized the spacers of CRISPR locus 2.1 and investigated their prevalence in the feces of animals and humans for the application of spacers in source tracking. The current study showed that 24.8% of the 569 E. coli isolates harbored CRISPR locus 2.1, and the occurrence of the CRISPR system was highly variable by source (beef cows, chickens, ducks, humans, dairy cows, patients, and pigs). In animals, humans, and environmental waters, 49% of E. coli strains harbor the CRISPR 2.1 regions (21). Analysis of the NCBI and CRISPRdb databases showed that CRISPR systems are not common among Klebsiella pneumoniae strains (35). Another study showed that ;37% of Klebsiella pneumoniae strains carried CRISPR systems according to complete chromosomal sequences from GenBank (36). Similarly, the occurrence rate of CRISPR systems varies among bacterial isolates. The current study showed various occurrences of CRISPR systems among E. coli isolates from animal and human sources. Thus, further investigations are required to understand the distribution of CRISPR systems in E. coli.
The current study showed that spacers of CRISPR locus 2.1 in E. coli isolates were mainly derived from Myoviridae, Podoviridae, Siphoviridae, and unidentified families of the Caudovirales order in all animal and human sources. Previous studies have reported the interactions between bacteriophages and E. coli isolates in the gut environment. CRISPR systems have been studied for bacteriophage therapy against pathogenic (13,37) and antibiotic-resistant E. coli strains (38). Those bacteriophages have been isolated from slaughterhouse, poultry sewage, intestines of chicken and beef offal, and wastewater (15,(39)(40)(41). In addition, fecal bacteriophageome of human gut showed that the most of bacteriophage contigs identified belonged to the families of order Caudovirales (42). These studies indicate that most gut bacteriophageomes belong to the order Caudovirales, suggesting that the presence of Caudovirales-derived spacer sequences may indicate fecal origin. This is also likely due to the broad host range of Caudovirales in the closed environment of animal guts (43). However, the majority of spacers remain unidentified, because few reads from viral metagenomics of the human gut are aligned with the viral genomic reference (44).
Plasmid-derived spacers were also observed in this study and were mainly assigned to plasmids of Enterobacteriaceae. We found that the majority of spacer sequences were classified as plasmid pLD91-1-76kb, as previously reported in E. coli LD91-1 (45). Plasmid pLD91-1-76kb of E. coli LD91-1 was isolated from the feces of a Père David's deer in China, carrying mcr-1 (the mobilized colistin resistance gene). In addition, it was reported that plasmid pFORC14 in the foodborne pathogen Vibrio parahaemolyticus FORC014 was isolated from toothfish in South Korea (46). The plasmid of Pantoea sp. strain CCBC3-3-1 was also isolated from a Cotinus coggygria branch in China (47).
All host-identified spacers were identified in the duck E. coli CRISPR loci. We did not observe specific occurrence patterns of spacers, likely because of the lack of bacteriophage genomic data. This study, however, showed that most of the phage-derived spacers are from a few families of the order Caudovirales, and most of the plasmidderived spacers are from a few genera of the Enterobacteriaceae family, suggesting tight associations with the intestinal environment. Notably, characterization of CRISPR spacers may provide fundamental information to track sources of E. coli: thus, investigation of these CRISPR spacers may offer a novel approach for fecal pollution source tracking. Interestingly, some spacers were specifically stored in the CRISPR arrays of E. coli from each source. The CRISPR profile of Salmonella enterica has already been proposed as an approach for source tracking (48). In addition, the CRISPR system also provides genetic evidence of the spread of antibiotic resistance genes carried by Staphylococcus (49). Analyses of the spacer profile of the CRISPR array of E. coli isolates from animals, humans, and environmental waters also suggested that a combination of methods with CRISPR analyses will prove useful in developing microbial source tracking (MST) tools (21). Accordingly, we suggest that the occurrence of source-specific spacers may help to develop a potential tool for MST.
In conclusion, we investigated the distribution of CRISPR systems and characterized CRISPR spacers within E. coli isolates obtained from animal and human feces. Our study showed that some spacers were specifically found in each source. In particular, we found that some source-specific spacers (Phaeobacter piscinae strain P13 plasmid pP13_a and phage Inoviridae) were bracketed in the CRISPR system of duck isolates. This suggests that more source-specific spacers could be detected by increasing the number of isolates used for CRISPR analysis. Considering the host-identified spacers, we revealed that some spacers from diverse hosts of phages and plasmids were commonly spread in the CRISPR system of E. coli isolates, and a few spacers were specifically associated with the isolates from each source. Thus, we suggest the identification of spacers in the CRISPR array of E. coli isolates as a potential approach for MST. This study could help advance further analysis of the interactions between viruses and bacteria, and MST.

MATERIALS AND METHODS
E. coli isolates and DNA extraction. A total of 569 isolates of E. coli were obtained from the feces of humans and animals (50). Fecal samples from healthy humans (termed "human" in this study) were collected during annual health checkups at a hospital located in Gwangju, South Korea, in 2008. Fecal samples from human patients with diarrhea (termed "patient" in this study) were also collected at the same hospital. Genomic DNA was extracted by boiling in 0.05 N NaOH at 95°C for 15 min (17). After boiling, 1:10 dilutions of the supernatants with sterilized distilled water were immediately used as DNA templates for PCR amplification.
Detection and sequencing of CRISPR locus 2.1. E. coli contains two subtypes of the CRISPR system: I-E and I-F (25). The CRISPR I-E type consists of three cassettes: CRISPR 2.1, CRISPR 2.2, and CRISPR 2.3 (26). Among them, due to the highest frequency in E. coli CRISPR systems (26), CRISPR 2.1 was selected for amplification and sequencing in this study. CRISPR locus 2.1 of the E. coli isolates was amplified as previously described (51). Amplicons were visualized using a 1% agarose gel at 100 V for 15 min and captured using the Gel Doc system (Bio-Rad, USA). Variable amplicon sizes were purified using the QIAquick PCR purification kit (Qiagen, USA) and sent to Macrogen (Seoul, South Korea) for sequencing.
Identification of spacers of CRISPR locus 2.1. Sequences of presumptive CRISPR locus 2.1 were analyzed using CRISPRFinder (52), and protospacer and repeat sequences were manually employed and arranged in Microsoft Excel. Sequences of protospacers of CRISPR locus 2.1 were identified and predicted using CRISPRTarget (http://crispr.otago.ac.nz/CRISPRTarget/crispr_analysis.html). A cutoff score of 29 was determined as the threshold in CRISPRTarget, and the protospacers with the highest score were chosen for downstream analysis. Source-specific spacers are those present only in CRISPR arrays of one specific source among beef cows, ducks, humans, milk cows, patients, and pigs.
Data processes. In this study, the terms "source" and "host" indicate where E. coli isolates were obtained and where the protospacers originated, respectively. The protospacers were arranged using phages, plasmids, and an unknown source. The protospacers, identified as multiple sources, were also suspected as "unknown" in the analysis of spacer profiles. The network of spacer sources (humans and animals) and hosts was visualized using the Gephi software (53). The spacer profiles were manually visualized according to the host of the protospacers in Microsoft Excel.
Data availability. The sequences of CRISPR locus 2.1 in Escherichia coli isolates obtained from feces of animals and humans have been arranged by repeat and spacer sequences and can be found in Data Set S1 in the supplemental material.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.1 MB.