Protospacer-Adjacent Motif Specificity during Clostridioides difficile Type I-B CRISPR-Cas Interference and Adaptation

ABSTRACT CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems provide prokaryotes with efficient protection against foreign nucleic acid invaders. We have recently demonstrated the defensive interference function of a CRISPR-Cas system from Clostridioides (Clostridium) difficile, a major human enteropathogen, and showed that it could be harnessed for efficient genome editing in this bacterium. However, molecular details are still missing on CRISPR-Cas function for adaptation and sequence requirements for both interference and new spacer acquisition in this pathogen. Despite accumulating knowledge on the individual CRISPR-Cas systems in various prokaryotes, no data are available on the adaptation process in bacterial type I-B CRISPR-Cas systems. Here, we report the first experimental evidence that the C. difficile type I-B CRISPR-Cas system acquires new spacers upon overexpression of its adaptation module. The majority of new spacers are derived from a plasmid expressing Cas proteins required for adaptation or from regions of the C. difficile genome where generation of free DNA termini is expected. Results from protospacer-adjacent motif (PAM) library experiments and plasmid conjugation efficiency assays indicate that C. difficile CRISPR-Cas requires the YCN consensus PAM for efficient interference. We revealed a functional link between the adaptation and interference machineries, since newly adapted spacers are derived from sequences associated with a CCN PAM, which fits the interference consensus. The definition of functional PAMs and establishment of relative activity levels of each of the multiple C. difficile CRISPR arrays in present study are necessary for further CRISPR-based biotechnological and medical applications involving this organism.

IMPORTANCE CRISPR-Cas systems provide prokaryotes with adaptive immunity for defense against foreign nucleic acid invaders, such as viruses or phages and plasmids. The CRISPR-Cas systems are highly diverse, and detailed studies of individual CRISPR-Cas subtypes are important for our understanding of various aspects of microbial adaptation strategies and for the potential applications. The significance of our work is in providing the first experimental evidence for type I-B CRISPR-Cas system adaptation in the emerging human enteropathogen Clostridioides difficile. This bacterium needs to survive in phage-rich gut communities, and its active CRISPR-Cas system might provide efficient antiphage defense by acquiring new spacers that constitute memory for further invader elimination. Our study also reveals a functional link between the adaptation and interference CRISPR machineries. The definition of all possible functional trinucleotide motifs upstream protospacers within foreign nucleic acid sequences is important for CRISPR-based genome editing in this pathogen and for developing new drugs against C. difficile infections.
has been studied, the specificity of the adaptation and interference machineries for PAM is overlapping but not fully identical (32)(33)(34)(35).
CRISPR-Cas systems are highly diverse and classified in accordance with their cas operon architecture into two different classes that are further subdivided into six types and 33 subtypes (26,35,36). Class 1 systems, which include types I, III, and IV, are characterized by effector complexes composed of multiple Cas proteins. Class 2 systems, which include types II, V, and VI, possess a single multidomain effector Cas protein such as Cas9. C. difficile possesses an interference-proficient type I-B CRISPR-Cas system (37)(38)(39)(40) characterized by an unusually large set of actively expressed arrays. Genome sequencing and transcriptome sequencing (RNA-seq) analysis of the reference 630 strain and hypervirulent R20291 C. difficile strain identified 12 and 9 CRISPR arrays, respectively, from which crRNAs are produced (38). Analysis of 217 C. difficile genomes revealed, on average, 8.5 CRISPR arrays per genome, some located in prophages (38)(39)(40). Another specific feature is the presence of two or even three (in 027 ribotype strains) type I-B cas operons in most sequenced C. difficile strains (38). We recently showed that the C. difficile 630Derm CRISPR-Cas system is capable of interference (37,38). These studies demonstrated that individual crRNAs corresponding to different CRISPR arrays are expressed at very different levels, raising a question of differential contributions of various CRISPR arrays to defense (37,38). We also predicted 3-nucleotide 59 motifs CCA and CCT as PAMs and experimentally confirmed them for C. difficile 630Derm (38). Based on this knowledge, we have recently developed a new method for genome editing in C. difficile using its native CRISPR-Cas system (41). However, a global view of PAM efficiency for CRISPR interference and adaptation is still missing in C. difficile. Despite intensive studies devoted to various CRISPR-Cas systems, many questions remain unanswered on the functional features of individual CRISPR-Cas subtypes. In particular, no data are available on the adaptation process for bacterial type I-B CRISPR-Cas system. Uncovering the molecular characteristics of the CRISPR-Cas system in C. difficile is of particular importance to better understand its survival in phagerich gut communities and for the harnessing of this efficient system for new antibacterial and genome editing applications. In the present work, we provide the first experimental evidence for spacer acquisition in C. difficile and explore the PAM requirements of C. difficile CRISPR interference and adaptation machineries.

RESULTS
Experimental determination of YCN as a PAM for C. difficile CRISPR-Cas system interference. To identify the PAM consensus for CRISPR interference in C. difficile 630Derm and R20291 strains, we performed conjugation depletion assays with plasmid PAM libraries (Fig. 1A). To generate potential PAM sequence variety, four randomized nucleotides were inserted upstream of a target protospacer sequence in the pRPF185Dgus plasmid. Plasmids harboring the PAM library were conjugated into C. difficile, and pooled transconjugants were subjected to high-throughput sequencing. Comparison of the compositions of the input PAM library and sequences found in transconjugants allows identification of functional PAMs, since plasmids harboring functional PAM sequences would be cleared by the CRISPR interference. For construction of plasmid PAM libraries, we selected protospacers corresponding to first spacers within the CRISPR 3 (identical to CRISPR 16) array of strain 630Derm and CRISPR 13 array from strain R20291. These arrays are actively expressed, and crRNAs derived from strain 630Derm CRISPR 3/16 are capable of interference (38). After the transformation of plasmid libraries into Escherichia coli cells, ;8,000 clones for the 630Derm plasmid library and ;9,500 clones for the R20291 library were obtained, enough to provide .10-fold coverage of the 4-nucleotide library. Plasmids from pooled transformants were used to prepare input libraries (referred to as "PAM libraries before conjugation" in Fig. 1A). After conjugation, ;4,000 and ;2,000 transconjugants were obtained for 630Derm and R20291 strains, respectively. An additional subculturing step in brain heart infusion (BHI) liquid medium supplemented with antibiotics was used to eliminate remaining E. coli cells. Cells from resulting liquid cultures were collected for DNA extraction and PCR amplification to prepare output libraries (referred to as "PAM libraries after conjugation" in Fig. 1A).
For each strain, input and output libraries were subjected to high-throughput sequencing. Recovered sequences were compared using Pearson's chi-square test to reveal PAM sequences significantly depleted after conjugation (with a P value of less than 10 212 ). This analysis suggested that the 24 position of PAM is not relevant for interference by the C. difficile CRISPR-Cas system (see Fig. S1 in the supplemental The YCN PAM consensus agrees with the CCW PAM bioinformatically predicted and experimentally validated in the C. difficile 630 strain (38). To validate the functionality of PAMs determined by the PAM library depletion analysis, we constructed plasmids containing protospacers used for plasmid PAM libraries and flanked by individual YCN PAMs on the 59 end. The set of PAM-protospacer-carrying plasmids was conjugated into C. difficile cells, and conjugation efficiency was determined. An empty pRPF185Dgus vector was used as a control ( Fig. 2A). Plasmids carrying CCN PAMs gave no transconjugants in both strains (Fig. 2B). This confirms that the CCN PAM is functional for interference in C. difficile. Control experiments showed that plasmids carrying a GAG trinucleotide or AAT, a sequence from the 39 end of a CRISPR repeat, in the position of a PAM conjugated as efficiently as the pRPF185Dgus vector (Fig. 2B). Mutation at the first position of the protospacer led to reduced interference against a plasmid carrying the CCA PAM in R20291, as expected (42).
In contrast to results obtained with CCN PAMs, detectable conjugation was observed with plasmids carrying TCN PAMs. Compared to that of the control, the conjugation efficiency was decreased 50-to 500-fold for TCA and TCG motifs, respectively. The TCC/T PAMs were the least effective (Fig. 2C). These results indicate that TCN sequences are generally less functional as interference PAMs than CCN.
Varied protection efficiencies of different C. difficile 630Derm CRISPR arrays. A specific feature of the C. difficile CRISPR-Cas system is the presence of multiple C. difficile CRISPR-Cas PAM Specificity ® actively expressed CRISPR arrays. The functionality of all 12 CRISPR arrays in the C. difficile 630Derm strain was investigated using plasmids containing protospacers corresponding to selected spacers from each of C. difficile 630Derm CRISPR arrays and flanked by a functional CCA PAM on the 59 end. The cotranscribed CRISPR 3-4 and CRISPR 16-15 arrays located in homologous prophages phiCD630-1 and phiCD630-2 are identical to each other and are thus indistinguishable by this experimental strategy (37). Plasmids carrying protospacers corresponding to spacers of CRISPR 3/16, CRISPR 4/15, and CRISPR 8 gave no transconjugants ( Fig. 3A and Fig. S2A). Strongly reduced conjugation efficiency was observed with plasmids carrying protospacers corresponding to spacers from CRISPR 9, CRISPR 12, and CRISPR 17 arrays. Surprisingly, the conjugation efficiency for plasmids carrying protospacers corresponding to spacers of CRISPR 6 and CRISPR 7 was close to that of the control vector (Fig. 3A). Thus, not all C. difficile 630Derm CRISPR arrays are equally functional for interference.
We wondered whether these differences in interference could be correlated with the relative expression levels of corresponding crRNAs. Those most active for interference CRISPR 3/16, CRISPR 4/15, CRISPR 8, and CRISPR 9 arrays (Fig. 3A) are also the most highly expressed in the 630Derm strain (37,38). Within the same CRISPR array, the most-abundant sequence reads mapped to the leader-proximal regions, i.e., closer to promoters from which arrays are transcribed (37,38). The decreasing gradient in the amounts of crRNAs observed for the first, third, and sixth spacer from the CRISPR 12 array is consistent with the interference levels provided by these crRNAs (Fig. S2), supporting a notion that the abundance of crRNAs produced from different arrays or from within the same array affects the level of interference.
We next compared the interference levels provided against protospacer plasmids (Fig. 3A) and the expression levels of each C. difficile 630Derm CRISPR array as measured by reverse transcription-quantitative PCR (qRT-PCR) (Fig. 3B). As expected, the arrays that were most active for interference (CRISPR 3/16 and CRISPR 4/15) were the most highly expressed ( Fig. 3A to C). However, CRISPR 6 and CRISPR 7 arrays, which provided low protection against plasmid conjugation, were expressed at levels similar to those of other active arrays, i.e., CRISPR 8 and CRISPR 9. These results suggest that besides CRISPR array expression levels, additional mechanisms, potentially related to sequences of individual crRNAs, control the activity of CRISPR arrays in C. difficile.
Active spacer acquisition by the C. difficile CRISPR-Cas I-B system. We next investigated the ability of C. difficile CRISPR-Cas to take up new spacers. All attempts to detect expansion of CRISPR arrays in C. difficile cells grown under laboratory conditions were unsuccessful. We hypothesized that endogenous expression levels of Cas proteins could be insufficient for adaptation. Therefore, we constructed two plasmids carrying cas genes from the adaptation module under the control of an inducible P tet promoter (see Table S1). The first plasmid (pCas1-2) carried the cas1 and cas2 genes encoding universal CRISPR adaptation proteins, whereas the second plasmid (pCas1-2-4) carried cas1, cas2, and cas4 genes, thus encoding the complete adaptation module. We next tested the ability of C. difficile 630Derm transformed with these plasmids to take up new spacers. Cells carrying an empty vector were used as a control. Transconjugants were cultivated in a medium supplemented with an inducer of cas gene expression. No growth of the strain carrying pCas1-2 was observed, presumably due to a toxic effect of overexpression of Cas1 and Cas2. In contrast, cells carrying pCas1-2-4 grew well after induction. We tested each of the 12 C. difficile 630Derm CRISPR arrays for signs of spacer acquisition by using PCR with one primer annealing to the leader and another annealing to a preexisting spacer within the array (Fig. 4A). PCR products corresponding to expanded arrays with one additional spacer were observed only for CRISPR 8 and CRISPR 9 in cells overexpressing the adaptation module but not in control cells (Fig. 4B).
High-throughput analysis of newly acquired spacers. DNA products corresponding to expanded CRISPR arrays were subjected to high-throughput sequencing to identify the sources of new spacers. Overall, 299,674 unique newly acquired spacers were extracted and mapped to the three spacer sources (the chromosome, the pCas1-2-4 plasmid, and the endogenous pCD630 plasmid present in the C. difficile 630Derm strain [43]) (Table S4). We analyzed the lengths of acquired spacers, the distribution of corresponding protospacers along the chromosome and plasmids, and associated PAM sequences. Almost all newly acquired spacers, irrespective of their source, were 34 to 40 bp in length (36 to 37 bp average). This size distribution matches well with that of spacers preexisting in C. difficile 630Derm arrays (see Fig. S3). The largest numbers of spacers (98% of uniquely mapped spacers) were derived from the pCas1-2-4 plasmid (Table S4; Fig. S4). The majority of pCas1-2-4-derived spacers (96%) originated from sequences associated with the CCN PAM (Fig. 5). Spacers mapped to pCas1-2-4 nonuniformly (Fig. 6A). The most frequent spacers were acquired from the traJ gene encoding a regulator of conjugative gene expression, the oriT region for plasmid transfer, and the ori region for plasmid replication (Fig. 6A). The bias in protospacer distribution along the pCas1-2-4 plasmid could be explained, at least partially, by nonuniform density of the CC dinucleotide (Fig. 6B), since the number of protospacers selected from different pCas1-2-4 regions correlated with the number of CC dinucleotides in these regions (R = 0.72).
Among endogenous DNA sources, 0.07% of uniquely mapped spacers originated from the 7,881-bp endogenous pCD630 plasmid (Table S4; Fig. S4C). The CCA motif was found at the highest frequency upstream of pCD630-derived protospacers, suggesting that, similarly to pCas1-2-4, interference-proficient spacers can be selected from pCD630, preserved in the cell population, and presumably lead to plasmid loss (Fig. 5). The adaptation enrichment regions for pCD630 were localized at the region FIG 4 Adaptation experiment in C. difficile 630Derm. (A) Experimental workflow of naive adaptation assays in C. difficile 630Derm using pCas1-2-4 overexpressing Cas1, Cas2, and Cas4 proteins. Two reseeding steps after cultivation of transconjugants in BHI plus Tm plus ATc medium and following two rounds of PCR amplification ("1st PCR" and "2nd PCR") were performed for the detection of extended CRISPR arrays. PCR amplification was performed using pairs of primers for each CRISPR array of C. difficile 630Derm. Forward primers annealed to leader regions of arrays, and reverse primers annealed to the first or the second spacer (CRISPR 10 array) of a native array. Tm, thiamphenicol; ATc, anhydrotetracycline. (B) PCR analysis of naive adaptation in C. difficile 630Derm. First and second PCR results after the II reseeding step are presented. Numbers bellow PCR bands denote C. difficile 630Derm CRISPR arrays (CRISPR 3-4, CRISPR 6, etc.); 89-bp PCR bands correspond to native arrays (*155 bp for CRISPR 10 array); 155-bp PCR bands correspond to one acquired spacer (*221 bp for CRISPR 10 array). Sequences of CRISPR 3-4 and CRISPR 16-15 arrays are identical; therefore, they are presented at the same lanes. PCR bands corresponding to new spacer acquisition are marked with arrows. Lane m, molecular mass markers. near the p70 gene encoding a putative enzyme of the helicase family and close to the region of the p80 gene encoding a hypothetical protein.
Only a small fraction of uniquely mapped spacers (1.69%) was acquired from the 4,290,252-bp bacterial genome (Table S4). In contrast to pCas1-2-4 and pCD630 protospacers, no overrepresented trinucleotide motif was found upstream of protospacers derived from the chromosome (less than 10% were associated with the CCN or TCN PAMs). The differences in PAM specificity between genome-and plasmid-derived protospacers must be caused by lethality of spacers derived from genomic protospacers with interference-proficient PAMs (Fig. 5). In other words, the genome-derived spacers that we observed must be aberrant and correspond to rare events of selection of non-  C. difficile CRISPR-Cas PAM Specificity ® functional spacers. Comparison of the distributions along the genome of functional and nonfunctional protospacers (see Fig. S5) suggests that selection of nonfunctional spacers occurs predominantly from regions that could serve as a source of functional spacers as well, but these functional PAM-associated spacers are depleted because of autoimmunity. Chromosomal regions that were preferentially used as donors of spacers for both CRISPR 8 and CRISPR 9 arrays included the terC replication termination site and loci carrying the Tn1549-like transposon genes (Fig. 7). This result is consistent with preferential spacer acquisition from regions prone to the generation of free DNA termini, as was also reported for other CRISPR-Cas systems (44)(45)(46).

DISCUSSION
CRISPR-Cas systems provide prokaryotes with adaptive immunity by recognizing and specifically eliminating invaders, such as viruses and plasmids. The CRISPR-Cas systems are highly diverse, and detailed studies of individual CRISPR-Cas subtypes continue to reveal features important for understanding of various aspects of microbial physiology as well as for the potential biotechnological and medical applications. In this study, we provide the first experimental evidence for type I-B CRISPR-Cas system adaptation in an important human pathogen, C. difficile, and reveal a functional link between the adaptation and interference machineries by demonstrating preferential selection of newly acquired spacers from protospacers associated with interferenceproficient PAMs.
PAM library experiments allowed us to determine a general PAM consensus sequence (YCN) for the C. difficile CRISPR-Cas system in both the laboratory 630Derm strain and the hypervirulent R20291 strain. These results are in accordance with CCW PAM identification data obtained in silico by the alignment of existing spacers and matching protospacers and with experimental data on plasmid conjugation efficiency in C. difficile 630Derm (38). Our global approach allowed us to add TCN sequences to the list of functional C. difficile PAMs, albeit such sequences support lower levels of interference than CCN sequences. Multiple PAM sequences are recognized by type I-B systems from other sources (47). For example, in the I-B CRISPR-Cas of Haloferax volcanii, seven trinucleotide 59-PAM motifs (TAA, TAG, TAT, TAC, TTC, ACT, and CAC) were shown to support efficient interference (48)(49)(50). The recognition of multiple PAMs is suggested to be an advantageous strategy to cope with the diversity and mutational evasion of viral invaders.
The PAM requirements and proteins involved in motif recognition differ during interference and adaptation (47). In H. volcanii type I-B CRISPR-Cas, only the TAC PAM was associated with spacer acquisition, suggesting a PAM recognition mechanism that is stricter for adaptation than for interference (45). The PAM requirements for spacer acquisition remained unexplored in bacterial I-B CRISPR-Cas systems.
In general, the spacer acquisition process in various CRISPR-Cas systems has been extensively studied, including recent reports on naive, self-targeting-induced, and primed adaptation for archaeal type I-B CRISPR-Cas (45,46,(51)(52)(53). However, no data on the bacterial type I-B CRISPR-Cas spacer acquisition have been reported so far. In the present study, no spacer acquisition in C. difficile was detected under the native conditions of Cas adaptation protein expression. Overexpression of Cas1, Cas2, and Cas4 proteins from a plasmid was necessary to observe the CRISPR array expansion. Cas1 and Cas2 are universal adaptation proteins. Cas4 is a part of the adaptation complex of the CRISPR-Cas systems, in which it is present (30). This protein participates in the selection and processing of prespacers, defines the correct PAM, and provides the correct orientation for new spacers during their integration into the CRISPR array (27)(28)(29)(30)(31). Recent studies in Pyrococcus furiosus and H. volcanii showed that the overexpression of Cas1, Cas2, and Cas4 elevates basal adaptation levels (45,46). However, overexpression of the adaptation proteins resulted in PAM-independent acquisition in H. volcanii (45), which is clearly distinct from our observations in C. difficile.
The majority of new spacers in C. difficile 630Derm are acquired from protospacers containing the CCN PAMs, which are most efficient for interference. In the H. volcanii I-B C. difficile CRISPR-Cas PAM Specificity ® CRISPR-Cas, only TAC PAM was associated with acquired spacers (45), suggesting that in this organism, like in C. difficile, PAM recognition requirements during spacer adaptation are stricter than those during interference. A CCN motif critical for both invader targeting and spacer DNA uptake was found upstream of protospacers from which newly acquired spacers were derived in the hyperthermophilic archaeon Pyrococcus furiosus I-B CRISPR-Cas (46).
Systematic monitoring of plasmid conjugation efficiency targeted by crRNAs from each of the 12 C. difficile 630Derm CRISPR arrays demonstrated that almost all CRISPR arrays are active for interference. However, the defense levels differed for individual CRISPR arrays, with most protective arrays generally being the most highly expressed (38). New spacer acquisition was observed only for arrays CRISPR 8 and CRISPR 9, which are among the most active for interference. The leader sequence was shown to be important for adaptation (21)(22)(23). The alignment of leader sequences of C. difficile 630Derm CRISPR arrays revealed two conserved inverted repeat motifs (see Fig. S6 in the supplemental material). These motifs were present upstream of every array, and we did not find any distinguishing features in CRISPR 8 and CRISPR 9 array leaders that would explain their adaptation proficiency (Fig. S6). Whether specific conditions could induce the expression of other CRISPR arrays and their activity for interference and spacer acquisition remain to be explored.
The origins for spacer DNA uptake differ greatly in various CRISPR-Cas systems. For the type I-E system in E. coli, the acquisition of new spacers has been observed mainly from plasmids rather than from chromosomal DNA (44). In contrast, in P. furiosus, the majority of spacers were acquired from the chromosomal DNA and not from the introduced plasmids (46). In the present study, in C. difficile, the majority of new spacers were derived from the pCas1-2-4 plasmid used to express the adaptation module genes. The remaining spacers were acquired from the chromosome and the pCD630 plasmid native to C. difficile 630Derm. It is interesting that in the case of E. coli, plasmids expressing cas1 and cas2 tend to be preferred substrates for new spacer selection irrespective of plasmid replication origin/copy number, suggesting that some in-cis mechanisms may be responsible for the observed bias.
In contrast to spacers derived from plasmids, functional PAMs were not overrepresented in spacers derived from the C. difficile chromosome. This and the low number of the chromosome-derived spacers compared to the number of spacers derived from plasmids suggest that cells in which most of the spacers were acquired from chromosomal protospacers with functional PAMs were lost due to autoimmunity. Both the occurrence of PAMs and the generation of free DNA termini have emerged as features important for adaptation. For the E. coli type I-E CRISPR-Cas system, spacer acquisition was shown to preferentially occur around the chromosomal replication terminus and active CRISPR arrays where the formation of double-stranded breaks is expected (44). Similarly, in P. furiosus and H. volcanii, "hot spots" of spacer acquisition were located at sites with transposon or recombination activity, at active CRISPR loci, or in highly transcribed regions (45,46). The situation appears to be similar in C. difficile, at least based on the distribution of the subset of genome-derived spacers that do not cause autoimmunity.
Our observations of low adaptation efficiency in C. difficile under normal conditions raised the question about the mechanisms that could exist to control new-DNA uptake capacities of its CRISPR-Cas system. We can hypothesize that new spacer acquisition should be limited to avoid deleterious self-targeting. This is in line with the suggestion that the adaptation machinery should be repressed under standard conditions to prevent accidental or random spacer acquisition from the genome (52). In C. difficile, we have recently shown that the cas operons belong to the general stress response sigma B regulon, suggesting that their expression could be induced under stressful conditions (54,55). It is intriguing that overexpression of only Cas1 and Cas2 was highly toxic to C. difficile. A possible reason could be an indiscriminate high level of adaptation of self-targeting spacers. Clearly, this observation warrants further investigation, which, however, is complicated by the inability to obtain the needed transconjugants.

MATERIALS AND METHODS
Bacterial strains and growth conditions. Bacterial strains used in this study are listed in Table S1 in the supplemental material. C. difficile strains were grown in brain heart infusion (BHI) (Difco) medium at 37°C under anaerobic conditions (5% H 2 , 5% CO 2 , and 90% N 2 ), within an anaerobic chamber (Jacomex). When needed, thiamphenicol (Tm) at the final concentration of 15 mg/ml was added to C. difficile cultures. Cefoxitin (Cfx) and D-cycloserine (Cs) were used for counterselection of E. coli donor cells during conjugation into C. difficile. E. coli strains were grown in LB medium (56), supplemented with ampicillin (Amp) (100 mg/ml) and chloramphenicol (Cm) (15 mg/ml) when it was necessary. The nonantibiotic analogue anhydrotetracycline (ATc) at a concentration of 250 ng/ml was used for induction of the inducible P tet promoter of pRPF185 vector derivatives in C. difficile (57).
Construction of plasmids and conjugation into C. difficile. Plasmids and oligonucleotides used in this work are presented in Table S1 and Table S2, respectively. To construct plasmid PAM libraries, we used the pRPF185Dgus vector. Single-stranded synthetic oligonucleotides containing four random nucleotides on the 59 end, a selected protospacer sequence corresponding to the first spacer of CRISPR 3 (identical to CRISPR 16) or CRISPR 13 arrays for C. difficile 630 and R20291 strains, respectively, and regions overlapping the pRPF185Dgus vector (37) were synthesized. Subsequently, these single-stranded synthetic oligonucleotides were amplified by PCR using short complementary primers to generate the double-stranded fragments (Table S2). To generate the PAM libraries, the double-stranded fragments were cloned into SacI and BamHI sites of pRPF185Dgus using a Gibson assembly reaction (New England BioLabs) (58).
For CRISPR-Cas interference assays, the synthetic complementary (59!39 and 39!59) single-stranded oligonucleotides containing SacI and BamHI restriction sites and different PAM and protospacer sequences were used to construct conjugative plasmid vectors carrying PAM-protospacer sequences. The single-stranded oligonucleotides were annealed to each other, and the resulting double-stranded fragments were ligated into SacI and BamHI sites of the pRPF185Dgus vector.
To create plasmids overexpressing Cas proteins for naive adaptation assays, C. difficile 630Derm cas1-cas2 and cas4-cas1-cas2 gene regions, including ribosome-binding sites (221 to 11252 relative to translational start site of cas2 gene and 237 to 11773 relative to translational start site of cas4 gene, respectively) were amplified by PCR and introduced into SacI and BamHI sites of pRPF185Dgus under the control of the ATc-inducible P tet promoter, resulting in pCas1-2 and pCas1-2-4 plasmids (Table S1).
Conjugation with PAM libraries and high-throughput sequencing. Plasmid PAM libraries were transformed into E. coli NEB10 beta cells (New England BioLabs). A sufficient number of Cm-resistant colonies (8,000 to 9,000) was selected and used for plasmid DNA extraction. This DNA served as a template for PCR with primers carrying Illumina adaptors, giving the control DNA sample for input libraries (named "PAM libraries before the conjugation").
For output library preparation, the plasmid PAM libraries were transformed into E. coli HB101 RP4 cells for further conjugation into C. difficile cells (approximately 4.9 Â 10 10 plasmid copies for the 630Derm library and 2.8Á10 10 plasmid copies for the R20291 library). A sufficient number of Tm-resistant transconjugants (up to 4,000) was selected. All the transconjugants were then transferred to liquid BHI medium supplemented with antibiotics to eliminate remaining E. coli cells. Tm was used to maintain plasmids within C. difficile cells, while Cfx and Cs were used to counterselect E. coli cells sensitive to these antibiotics. Cells from the resulting liquid cultures were collected and used for the preparation of InstaGene (Bio-Rad) extracts that served as a template for PCR amplification with primers carrying Illumina adaptors, giving the DNA sample for sequencing named "PAM libraries after the conjugation." The DNA samples "PAM libraries before the conjugation" and "PAM libraries after the conjugation" were sequenced using an Illumina NextSeq 500 system with 2-million-read coverage. Sequence reads were aligned with reference sequences using BWA software (59). All unmapped reads were discarded from the analysis. Randomized PAM regions in selected reads were extracted using a custom-written Python script (version 3.4).
The numbers of each PAM counts were compared for two libraries (Table S3A and B). Significantly depleted PAM sequences were determined using Pearson's chi-square test. P values adjusted using standard multiple testing corrections kept all possible PAM variants as depleted. Therefore, we used a P value of 10 212 to filter the highly depleted PAMs. The depleted sequences were assembled in a special data set, where the number of counts for each PAM was normalized to that of the lowest depleted PAM. The consensus of resulting sequence subsets was then visualized using the WebLogo tool (60). For the additional PAM sequence visualization, PAM wheels were constructed according to Leenay et al. using KronaExcelTemplate (https://github.com/marbl/Krona/releases) (61). For each individual PAM sequence, a depletion score was estimated as the ratio of the normalized read count in output PAM libraries to the normalized read count in the control. In cases where PAM happened to be enriched in the "after the conjugation" library, the depletion score was changed to zero. The depletion scores were then used as the input for the Krona plot (61).
Plasmid conjugation efficiency assays. To evaluate the conjugation efficiency, conjugative plasmids carrying PAM-protospacer were transformed into the E. coli HB101 (RP4) strain and transferred to the C. difficile 630Derm or C. difficile R20291 strain by conjugation. The ratio of C. difficile transconjugants to the total number of CFUs was estimated by subculturing conjugation mixtures on BHI agar supplemented with Tm, Cs, and Cfx and comparing the CFU to the number of CFUs obtained after plating serial dilutions on BHI agar plates containing Cfx only.
CRISPR adaptation assay and high-throughput sequencing of newly acquired spacers. After overnight growth in BHI medium supplemented with Tm and ATc, pCas1-2-4-containing cells were twice transferred to BHI medium supplemented with ATc without Tm (I and II reseeding) (Fig. 4A). These additional steps were necessary to enrich the bacterial culture with cells that acquired new spacers. After each reseeding, two rounds of PCR were performed to detect spacer acquisition. For amplification, we used a specific set of primers for each array. Forward primers annealed to leader regions of CRISPR arrays and reverse primers annealed to the first or the second spacer (CRISPR 10 array) of native CRISPR arrays (Fig. 4A). Primers are listed in the Table S2. PCR products corresponding to expanded CRISPR arrays were extracted from the gel and used for nested PCR with primers containing Illumina adapters for further high-throughput sequencing and bioinformatic analysis. The amplicons were sequenced using the Illumina NextSeq 500 system with 2-million-read coverage. Sequence reads were analyzed in R using ShortRead and Biostrings packages (62) as described previously (63,64). Graphical representation of results was performed using ggplot2 package (65) and the EasyVisio tool, developed by E. Rubtsova.
Newly acquired spacers of 10 to 79 bp in length were mapped to the reference genomes of Clostridium difficile 630 (NCBI reference sequence NC_009089.1), pCD630 (NCBI reference sequence NC _008226.2), and pCas1-2-4, with one mismatch allowed. Three nucleotides upstream of the first protospacer position were considered a PAM sequence. Spacers that aligned to multiple positions within the same molecule were removed from the analysis. Spacers that aligned to a single DNA molecule were referred to as "unique," and spacers that aligned to several molecules (but to a single position within each molecule) were referred to as "nonunique" and analyzed separately (Table S4). "Shifters" and "flippers" were removed from analysis (66).
In total, for CRISPR 8 and CRISPR 9, we found 5,077 spacers that mapped to 1,380 individual genomic protospacer positions (Table S4). One percent of all positions (14 protospacers) contributed most to sequenced spacers and corresponded to 27% of sequenced genomic spacers. The genomic coordinates of these seemingly "hot" protospacers were different for CRISPR 8 and CRISPR 9; therefore, it is unlikely that these positions represent true "hot" protospacers. We assumed that these seemingly "hot" protospacers could arise due to early acquisition of corresponding spacers followed by their spread in the population during prolonged cultivation. Alternatively, they could be the result of heterogeneity in amplification during two subsequent rounds of PCR. To avoid unwanted biases caused by these seemingly "hot" protospacers, we removed them from subsequent analyses of spacer lengths, protospacer distributions along the genome, and frequencies of associated PAM motifs.
RNA extraction and qRT-PCR. Total RNA was isolated from the C. difficile 630Derm strain after 4, 6, and 10 h of growth in tryptone-yeast extract (TY) medium corresponding to early exponential, late exponential, and stationary phases, respectively, as previously described (67). cDNA synthesis by reverse transcription and quantitative real-time PCR analysis was performed as previously described (68) using a Bio-Rad CFX Connect real-time system. The expression levels of CRISPR arrays were calculated relative to that of the 16S RNA gene (69).
Data availability. Raw sequencing data have been deposited in the National Center for Biotechnology Information Sequence Read Archive under BioProject identifier (ID) PRJNA719030.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.

ACKNOWLEDGMENTS
High-throughput sequencing for this project was performed at Waksman Genomics Core Facility, Rutgers University. This work was supported by Agence Nationale de la Recherche ("CloSTARn," ANR-13-JSV3-0005-01 to O.S.), the Institut Universitaire de France (to O.S.), the University Paris-Saclay, the Institute for Integrative Biology of the Cell, the DIM-1HEALTH regional Ile-de-France program (LSP grant no. 173403 to O.S.), CNRS-RFBR PRC 2019 (grant no.