Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems

The CRISPR-Cas-derived RNA-guided Cas9 endonuclease is the key element of an emerging promising technology for genome engineering in a broad range of cells and organisms. The DNA-targeting mechanism of the type II CRISPR-Cas system involves maturation of tracrRNA:crRNA duplex (dual-RNA), which directs Cas9 to cleave invading DNA in a sequence-specific manner, dependent on the presence of a Protospacer Adjacent Motif (PAM) on the target. We show that evolution of dual-RNA and Cas9 in bacteria produced remarkable sequence diversity. We selected eight representatives of phylogenetically defined type II CRISPR-Cas groups to analyze possible coevolution of Cas9 and dual-RNA. We demonstrate that these two components are interchangeable only between closely related type II systems when the PAM sequence is adjusted to the investigated Cas9 protein. Comparison of the taxonomy of bacterial species that harbor type II CRISPR-Cas systems with the Cas9 phylogeny corroborates horizontal transfer of the CRISPR-Cas loci. The reported collection of dual-RNA:Cas9 with associated PAMs expands the possibilities for multiplex genome editing and could provide means to improve the specificity of the RNA-programmable Cas9 tool.


Bacterial strains and culture conditions
Supplementary Table S1 lists bacterial strains used in this study. Streptococcus pyogenes, Streptococcus mutans, Campylobacter jejuni, N. meningitidis, Escherichia coli and Francisella novicida were grown as previously described (15,16). Brain Heart Infusion (BHI, Becton Dickinson) agar and BHI broth medium supplemented with 1% glucose and 1% lactose were used to culture S. thermophilus at 42 C in a 5% CO 2 environment (16). Pasteurella multocida and Staphylococcus aureus were grown at 37 C on BHI agar plates and in BHI broth with shaking. Cell growth was monitored by measuring the optical density of cultures at 620 nm (OD 620 ) using a microplate reader (BioTek PowerWave).

Bacterial transformation
Escherichia coli was transformed with plasmid DNA according to standard protocols (35). Transformation of S. pyogenes was performed as previously described (36) with some modifications. Streptococcus pyogenes precultures were diluted 1:100 in fresh THY medium and grown at 37 C, 5% CO 2 until OD 620 reached 0.3. Glycine was added to the medium to 10% final concentration and growth was maintained for an additional hour. Cells were spun down at 4 C at 2500 Â g and washed three times with electroporation buffer (5 mM KH 2 PO 4 , 0.4 M D-sorbitol, 10% glycerol, pH 4.5), finally suspended in the same buffer and equalized to the same OD 620 . For electroporation, 1 mg of plasmid was incubated with the competent cells on ice for 10 min. The conditions were 25 mF, 600 V and 1.5 V using 1 mm electroporation cuvettes (Biorad). After a regeneration time of 3 h, bacteria were spread on agar medium supplemented with kanamycin (300 mg/ml). Transformation assays were performed at least three times independently with technical triplicates. The efficiencies were calculated as colony-forming units (CFU) per mg of plasmid DNA. Positive and negative control transformations were done with backbone plasmid pEC85 and sterile H 2 O, respectively.

DNA manipulations
DNA manipulations including DNA preparation (QIAprep Spin MiniPrep Kit, Qiagen), polymerase chain reaction (PCR) (Phusion Õ High-Fidelity DNA Polymerase, Finnzyme), DNA digestion (restriction enzymes, Fermentas), DNA ligation (T4 DNA ligase, Fermentas), DNA purification (QIAquick PCR Purification Kit, Qiagen) and agarose gel electrophoresis were performed according to the standard techniques or manufacturers' protocols with some modifications (35). Site-directed mutagenesis was done using QuikChange II XL kit (Stratagene) or PCR-based mutagenesis (37). Synthetic oligonucleotides (Sigma-Aldrich and Biomers) and plasmids used and generated in this study are listed in Supplementary Table S1. The integrity of all constructed plasmids was verified by enzymatic digestion and sequencing at LGC Genomics.

Construction of plasmids for complementation studies in S. pyogenes
The backbone shuttle vector pEC85 was used for complementation study (38,39). The RNase-III encoding genes (rnc genes) of S. pyogenes, S. mutans, S. thermophilus, C. jejuni, N. meningitidis, P. multocida, F. novicida, E. coli and S. aureus, and the genes encoding truncated and inactive RNase III variants (truncated and inactive (D51A) rnc mutants) of S. pyogenes were cloned in pEC483 (pEC85 containing the native promoter of S. pyogenes rnc) using NcoI and EcoRI restriction sites (Supplementary Table S1, Supplementary Figure S6). The orthologous and mutant cas9 genes were cloned in pEC342 (pEC85 containing a sequence encoding tracrRNA-171 nt (16) and the native promoter of the S. pyogenes cas operon) using SalI and SmaI restriction sites (Supplementary Table S1). Note that in a previous study, we observed low abundance of tracrRNA in the cas9 deletion mutant. For this reason, plasmids used in cas9 complementation studies were designed to encode tracrRNA in addition to cas9 (16). The generated rnc and cas9 recombinant plasmids were introduced in S. pyogenes Árnc and Ácas9 deletion strains, respectively (Supplementary Table S1). Plasmid integrity in all complemented strains was checked by plasmid DNA extraction and digestion.

Construction of plasmids for transformation studies in S. pyogenes
Plasmid pEC85 was used as backbone vector for transformation studies. A DNA fragment containing WT speM protospacer sequence was cloned in the PstI site of plasmids containing coding sequences of WT or mutated cas9 from S. pyogenes (Supplementary Table S1).

Construction of plasmids for protein purification
The overexpression vector pET16b (Novagen) was modified by inserting three additional restriction sites (SalI, SacI, NotI) into the NdeI restriction site, generating pEC621. The genes coding for the orthologous Cas9 proteins were PCR amplified from genomic DNA of the corresponding strains using primers containing a SalI and a NotI restriction site (Supplementary Table S1). The S. pyogenes cas9 mutant genes were PCR amplified from the complementation plasmids mentioned above. All orthologous and mutant cas9 genes were cloned into the SalI and NotI sites of pEC621.

Construction of substrate plasmids for in vitro cleavage assays
Plasmid pEC287 that contains the speM protospacer sequence was used as a vector to construct all substrate plasmids. The PAM sequence located in 3 0 just next to the crRNA-targeted sequence of the speM protospacer (GGG on this plasmid) was modified by PCR-mediated sitedirected mutagenesis (37) using one standard oligonucleotide (OLEC3140 or OLEC3194) that either introduced or removed a XbaI restriction site for screening purposes, and a second mutagenic oligonucleotide to exchange the protospacer adjacent sequence (Supplementary Table S1).

RNA preparation
Total RNA from S. pyogenes SF370 WT, deletion mutants and complemented strains was prepared from culture samples collected at the mid-logarithmic phase of growth using TRIzol (Sigma-Aldrich). The total RNA samples were treated with DNase I (Fermentas) according to the manufacturer's instructions. The concentration of RNA in each sample was measured using NanoDrop.

Protein purification
Escherichia coli Rosetta2(DE3) and E. coli NiCo21(DE3) (New England Biolabs) were transformed with overexpression plasmids coding for S. pyogenes WT and mutant or orthologous Cas9, respectively. Cells were grown at 37 C to reach an OD 600 of 0.7-0.8, protein expression was induced by adding IPTG to a final concentration of 0.5 mM and cultures were further grown at 13 C overnight. The cells were harvested by centrifugation and the pellet was resuspended in lysis buffer (20 mM HEPES, pH 7.5, 500 mM KCl [1 M for S. thermophilus* Cas9], 0.1% Triton X-100, 25 mM imidazole) and lysed by sonication. The lysate was cleared by centrifugation (>20 000 Â g) and incubated with Ni-NTA (Qiagen) for 1 h at 4 C. After washing the Ni-NTA with lysis buffer and wash buffer (20 mM HEPES, pH 7.5, 300 mM KCl, 0.1% Triton X-100, 25 mM imidazole), the recombinant protein was eluted with elution buffer (20 mM HEPES, pH 7.5, 150 mM KCl, 0.1 mM DTT, 250 mM imidazole, 1 mM (EDTA)) and the fractions were analyzed by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). In the case of S. pyogenes Cas9 WT and mutants, the protein containing eluates were pooled and further purified via HiTrap SP FF (GE Healthcare) cation-exchange chromatography. Briefly, the protein was loaded on the column equilibrated with buffer A (20 mM HEPES pH 7.5, 100 mM KCl) using an FPLC system (Ä kta, GE Healthcare). Cas9 was eluted with a gradient of buffer B (20 mM HEPES pH 7.5, 1 M KCl) over 12 ml. 1 ml fractions were collected and analyzed by (SDS-PAGE). The protein containing fractions were pooled and dialyzed overnight (20 mM HEPES, pH 7.5, 150 mM KCl, 50% glycerol). For Cas9 orthologs, the eluates from Ni-NTA purification were checked for purity by SDS-PAGE. In case of contaminants, a second purification over chitin beads was performed as described in the manual for NiCo21(DE3) cells from New England Biolabs. Briefly, 1 ml chitin beads (New England Biolabs) equilibrated with buffer A was incubated with the Ni 2+ -IMAC eluates for 1 h at 4 C. Afterwards, the beads were added onto a column and the Cas9 containing flowthroughs were collected and again checked for purity by SDS-PAGE (Supplementary Figure S1). The purified proteins were dialyzed overnight. The protein concentration was calculated by measuring the OD 280 using the extinction coefficient. The detailed characteristics of purified proteins are summarized in Supplementary Figure S1A.
In vitro transcription RNA for in vitro DNA cleavage assays was generated by in vitro transcription using the AmpliScribe TM T7-Flash TM Transcription Kit (Epicentre) according to the manufacturer's instructions. PCR products or synthetic oligonucleotides used as templates are listed in Supplementary Table S1. The synthesized tracrRNA and repeat region of crRNA from each bacterial species correspond to the mature forms of RNAs as determined by deep RNA sequencing (15) or bioinformatics predictions. The spacer region of all crRNAs used in this study targets the speM protospacer (encoding superantigen; targeted by spacer 2 of S. pyogenes SF370 CRISPR array, Spyo1h_002 (16)). Transcribed RNAs were precipitated and further purified from 10% polyacrylamide 8 M urea denaturing gel. The RNA concentration was determined by measuring the OD 260 and the molarity was calculated. Equimolar amounts of crRNA and tracrRNA were mixed in 5X RNA annealing buffer (1 M NaCl, 100 mM HEPES, pH 7.5), heated up to 95 C for 5 min and slowly cooled to room temperature before use.

In vitro DNA cleavage assays
For the cleavage assays using Cas9 mutant proteins, 25 nM of Cas9 were incubated with equimolar amounts of prehybridized S. pyogenes dual-RNA in cleavage buffer (20 mM HEPES, pH 7.5, 150 mM KCl, 10 mM MgCl 2 , 0.5 mM DTT, 0.1 mM EDTA) for 15 min at 37 C. Plasmid DNA (5 nM) containing speM (NGG PAM) was added and further incubated for 1 h at 37 C. The reaction was stopped by addition of 5X loading buffer (250 mM EDTA, 30% glycerol, 1.2% SDS, 0.1% (w/v) bromophenol blue) and analyzed by 1% agarose gel electrophoresis in 1X TAE. Cleavage products were visualized by ethidium bromide staining. All other cleavage assays were carried out using the same conditions with the following modifications: KGB (43) (100 mM potassium glutamate, 25 mM Tris/acetate, pH 7.5, 10 mM Mg-acetate, 0.5 mM 2-mercaptoethanol, 10 mg/ml bovine serum albumin) was used as cleavage buffer and different concentrations of dual-RNA:Cas9 complex were analyzed. The concentration of plasmid DNA was kept constant in all experiments, i.e. 5 nM.

Search for PAM motifs
Spacer sequences of the selected bacterial species were extracted from the CRISPRdatabase (http://crispr.u-psud. fr/crispr/) and used to find cognate protospacer candidates using megaBLAST (http://blast.ncbi.nih.gov/Blast). Protospacer candidates were defined as containing a sequence with !90% similarity to the crRNA spacer sequence and originating from phage, plasmid or genomic DNA related to the bacterial species of the targeting CRISPR-Cas. For the investigated CRISPR-Cas loci, the orientation of transcription was determined previously by RNA sequencing or northern blot analysis (15,16). It was also shown before that in type II CRISPR-Cas, the PAM sequence is located in 3 0 of the protospacer, juxtaposed to the sequence targeted by cognate crRNA on the non-target strand (14,18,23,44). To identify possible PAMs in each bacterial species, 10 nt sequences on the non-target strand directly downstream of each protospacer sequence were aligned. A logo plot (http://weblogo.berkeley.edu/) showing the most abundant nucleotides was created and PAM sequences were predicted. In the cases of CRISPR-Cas loci for which no suitable protospacer sequences could be identified (S. mutans UA159, C. jejuni NCTC 11168, P. multocida Pm70, F. novicida U112), closely related strains of the same species were selected (Supplementary Table S2). The spacer contents of the type II CRISPR arrays in selected strains were analyzed (http://crispr.upsud.fr/Server/). The spacer sequences were then used to select cognate protospacer sequences as described above.

Protein sequence analysis
Position-Specific Iterated (PSI)-BLAST program (45) was used to retrieve orthologs of the Cas9 family in the NCBI nr database. Sequences shorter than 800 amino acids were discarded. The BLASTClust program (46) set up with a length coverage cutoff of 0.8 and a score coverage threshold (bit score divided by alignment length) of 0.8 was used to cluster the remaining sequences (Supplementary Table  S2). This procedure produced 82 clusters. In the case of sequences reported in this study, one or several representatives from each cluster were selected and aligned using the MUSCLE program (47) with default parameters, followed by a manual correction on the basis of local alignments obtained using PSI-BLAST (45) and HHpred programs (48). The confidently aligned blocks (Supplementary Figure S2) with 285 informative positions were used for maximum likelihood tree reconstruction using the FastTree program (49) with the default parameters: JTT evolutionary model, discrete gamma model with 20 rate categories. The same program was used to calculate the bootstrap values. Cas1 sequences were selected from the corresponding cas operons (Supplementary Table S2). A few incomplete sequences were substituted by other Cas1 sequences from the same Cas9 cluster (Supplementary Table S2). Several Cas1 proteins from subtypes I-A, B, C and E were included as an outgroup. Cas1 sequences were aligned using the same approach described above and 252 informative positions (Supplementary Figure S3) were used for maximum likelihood tree reconstruction using the FastTree program. RNase III multiple sequence alignment was prepared using the MUSCLE program.

Diversity of Cas9 orthologs
To investigate the evolution and diversity of dual-RNA:Cas9 systems, we subjected publicly available genomes to multiple rounds of BLAST search using previously retrieved Cas9 sequences as queries (15). Cas9 orthologs were identified in 653 bacterial strains representing 347 species (Supplementary Table S2). After removing incomplete or highly similar sequences, we selected 83 diverse, representative Cas9 orthologs for multiple sequence alignment and phylogenetic tree reconstruction ( Figure 1A, Supplementary  Supplementary Figures S3 and S4) and fully supports the previously described classification of type II CRISPR-Cas into three subtypes, II-A (specified by csn2), II-B (characterized by long and most diverged cas9 variants (formerly csx12) and cas4) and II-C (three-cas gene operon) (15).
Analysis of the composition of cas genes, transcription direction of the CRISPR arrays with respect to that of the cas operon, and location and orientation of tracrRNAs resulted in the division of subtypes into groups with distinct locus characteristics, especially within the subtype II-A (Figure 1, clusters marked with different colors) (15). We selected Cas9 enzymes representative of the major type II groups. Cas9 orthologs of S. pyogenes, S. thermophilus* (CRISPR3) and S. mutans were chosen for type II-A systems associated with shorter, $220 amino acid Csn2 variants (Csn2a). Cas9 of S. thermophilus** (CRISPR1) represents a distinct group of type II-A sequences associated with longer, $350 amino acid version of Csn2 orthologs (Csn2b). Cas9 of F. novicida was selected for type II-B. The closely related Cas9 orthologs of P. multocida and N. meningitidis and the distinct, short Cas9 of C. jejuni were chosen for type II-C ( Figure 1B). Expression of associated tracrRNAs and crRNAs in S. pyogenes, S. mutans, F. novicida, N. meningitidis and C. jejuni was already validated by deep RNA sequencing (15,16). The RNAs in S. thermophilus and P. multocida were predicted bioinformatically based on the sequences from related species within the same type II group. Figure 1B shows the organization of the eight selected type II CRISPR-Cas loci and highlights our previous findings demonstrating that the type II loci architectures are highly variable among subtypes, yet conserved within each group (15). These variations are in good agreement with the clustering derived from the Cas9 and Cas1 phylogenetic trees ( Figure 1A, Supplementary Figure S4).

Bacterial RNases III are interchangeable in dual-RNA maturation
As described in S. pyogenes and S. thermophilus, RNase III plays an essential role in the biogenesis of dual-RNA:Cas9 systems by coprocessing tracrRNA and pre-crRNA at the level of antirepeat:repeat duplexes (16,17). We analyzed the interchangeability of S. pyogenes RNase III with RNases III from selected bacterial species in the coprocessing of S. pyogenes tracrRNA:pre-crRNA, including strains that lack type II CRISPR-Cas (S. aureus COL, E. coli TOP10). Northern blot analysis shows that all RNases III studied here can coprocess the RNA duplex (Figure 2, Supplementary Figure S5), indicating that there is no species-specificity for tracrRNA:pre-crRNA cleavage by RNase III. Multiple sequence alignment of RNase III orthologs demonstrates conservation of the catalytic aspartate residue and the  Figure  S6) that are both required for RNA coprocessing (Figure 2, Supplementary Figure S5). These data imply that the conservation of tracrRNA:pre-crRNA coprocessing by bacterial RNase III provides a degree of flexibility allowing the functionality of dual-RNA:Cas9 systems in multiple species upon horizontal transfer.
Cas9 HNH and split RuvC domains are the catalytic moieties for DNA interference Comparison of Cas9 sequences revealed high diversity in amino acid composition and length (984 amino acid for C. jejuni to 1648 amino acids for F. novicida), especially in the linker sequence between the highly conserved N-terminal RuvC and central RuvC-HNH-RuvC regions and in the C-terminal extension (Supplementary Figure  S2). Several studies demonstrated the importance of the nuclease motifs for dsDNA cleavage activity by mutating one aspartate in the N-terminal motif of the RuvC domain and one or several residues in the predicted catalytic motif of the HNH domain of the Cas9 enzyme (14,22,23). To investigate the relevance of all catalytic motifs for tracrRNA:pre-crRNA processing and/or DNA interference, alanine substitutions of selected residues were created ( Figure 3A). In addition to the already published catalytic amino acids, we created Cas9 point mutants of conserved amino acid residues in the central RuvC motifs (14) (Figure 3A, Supplementary Figure S2). Northern blot analysis of S. pyogenes cas9 deletion mutant complemented with each of the cas9 point mutants revealed the presence of mature tracrRNA and crRNA forms, demonstrating that none of the catalytic motifs is involved in dual-RNA maturation by RNase III. This is in agreement with previous data showing that RNase III is the enzyme that specifically cleaves tracrRNA:pre-crRNA duplex (16). Cas9 seems to have a stabilizing function on dual-RNA. We show that the catalytic motifs are not involved in RNA duplex stabilization ( Figure 3B, Supplementary Figure S7).
To investigate the involvement of the conserved motifs of Cas9 in DNA interference in vivo, we used a previously described plasmid-based read-out system that mimics infection with invading protospacer-containing DNA elements (16). Transformation assays were done in S. pyogenes WT or a cas9 deletion mutant using plasmids containing the speM protospacer gene (complementary to the second spacer of S. pyogenes SF370 type II CRISPR array (16)) and WT or mutant cas9 ( Figure 3C). In this assay, Cas9 expressed following plasmid delivery in bacterial cells catalyzes its own vector cleavage, when active. Control experiments showed that the speM protospacer-containing plasmid was not tolerated in WT S. pyogenes, demonstrating activity of WT CRISPR-Cas. Similarly, a plasmid containing the speM protospacer and encoding WT Cas9 could not be maintained in the cas9 deletion mutant, demonstrating that Cas9 is able to cleave the plasmid from which it is expressed. Except for Cas9 N854A, all plasmids encoding Cas9 mutants were tolerated in the cas9 deletion strain, indicating abrogation of Cas9 interference activity for these variants.
The in vivo DNA targeting data were confirmed with in vitro DNA cleavage assays. Purified WT and mutant Cas9 proteins were incubated with tracrRNA:crRNA targeting speM and subjected to cleavage of plasmid DNA containing the speM protospacer. WT and N854A Cas9 show dsDNA cleavage activity, whereas the other Cas9 mutants cleave only one strand of the dsDNA substrate, yielding nicked open circular plasmid DNA ( Figure 3D). This corroborates the results obtained in vivo showing the importance of the conserved nuclease motifs for DNA interference by Cas9. In addition to the previously published data demonstrating the importance of the N-terminal RuvC motif and the catalytic motif of HNH, we thus defined new catalytic residues in the central RuvC motifs.
Only Cas9 from closely related CRISPR-Cas systems can substitute for S. pyogenes Cas9 in tracrRNA-directed pre-crRNA maturation by RNase III Beside the conservation of the HNH and split RuvC domains involved in DNA cleavage (14,15), the length of Cas9 orthologs and the amino acid sequences of Cas9 are highly variable among the different groups of type II CRISPR-Cas systems ( Figure 4A, Supplementary Figure  S2). Hence, we investigated whether this variability plays a role in the specificity of Cas9 with regard to tracrRNA:pre-crRNA duplex and mature crRNA stabilization. We complemented S. pyogenes cas9 deletion mutant with Cas9 from selected bacterial species representative of the various type II groups and analyzed  (15). The CRISPR-Cas locus architecture of P. multocida was predicted based on its close similarity to that of N. meningitidis and further confirmed by bioinformatics prediction of tracrRNA based on a strongly predicted promoter and a transcriptional terminator as described in (15). Type II CRISPR-Cas loci can differ in the cas gene composition, mostly with cas9, cas1 and cas2 being the minimal set of genes (type II-C, blue), sometimes accompanied with a fourth gene csn2a/b (type II-A, yellow and orange) or cas4 (type II-B, green). The CRISPR array can be transcribed in the same (type II-A, yellow and orange) or in the opposite (types II-B and C, blue and green) direction of the cas operon. The location of tracrRNA and the direction of its transcription differ within the groups (compare type II-A of S. thermophilus** with type II-A from the other species indicated here (yellow) and compare type II-C of C. jejuni with type II-C of N. meningitidis and P. multocida (blue)).
tracrRNA:pre-crRNA processing by northern blot. Cas9 proteins from S. mutans and S. thermophilus* can substitute for the stabilizing role of S. pyogenes Cas9 in RNA processing by RNase III (Figure 4B, Supplementary   Figure S8). In contrast, Cas9 from S. thermophilus**, C. jejuni, N. meningitidis, P. multocida and F. novicida could not complement the lack of RNA processing in the cas9 mutant of S. pyogenes. In these strains, the  75-nt processed form of tracrRNA is observed as a very weak signal of background level of dual-RNA processed by RNase III in the absence of Cas9. Overall, only Cas9 from closely related systems of S. pyogenes in the type II-A cluster can substitute endogenous Cas9 role in dual-RNA stabilization and subsequent maturation by RNase III.
Cas9 orthologs require their specific PAM sequence for DNA cleavage activity In S. pyogenes and S. thermophilus* types II-A, PAMs were identified as NGG and NGGNG, respectively. In these two species, mutating the PAM abrogates DNA interference by dual-RNA:Cas9 (14,22,23). To identify the functional PAMs for Cas9 from bacterial species other than S. pyogenes and S. thermophilus, we searched for potential protospacers matching spacer sequences in the selected CRISPR arrays using BLAST. For S. mutans UA159, C. jejuni NCTC 11168, P. multocida Pm70 and F. novicida U112, we were unable to identify potential protospacers. Therefore, we searched for strains that harbor a closely related variant of Cas9 (Supplementary Table S2) and analyzed their spacer sequences following the same approach (Supplementary  Table S3). We aligned the identified 10 nt sequences located directly downstream of the protospacer sequence and delineated the most common nucleotides that could represent PAM sequences. Based on the data visualized as a logo plot ( Figure 5A), we designed plasmid DNA substrates containing the speM protospacer followed by different adjacent sequences either comprising the predicted PAM or not ( Figure 5B). The Cas9 orthologous proteins were purified (Supplementary Figure S1) and dual-RNA orthologs were designed based on deep RNA sequencing data (15), with the spacer sequence of crRNA targeting  (B) DNA substrates designed for specific PAM verification. Based on the logo plot for each species, plasmid DNA substrates were designed to contain the speM protospacer and the indicated sequence downstream, either comprising (PAM+) or not (PAMÀ) the proposed PAM. The predicted PAMs were verified by cleavage assays narrowing down the necessary nucleotides for activity (data not shown); therefore the sequence used differs slightly from the logoplot shown in (A). The high abundance of other nucleotides not being part of the PAM can be explained by redundancy of the coding sequences containing the protospacers, and by the limited number of found protospacer targets. The last column shows the PAM sequence for each species, which was already published (no symbol) or derived from this work ( # ). (C) In vitro plasmid cleavage assays by dual-RNA:Cas9 orthologs on plasmid DNA with the 10-bp protospacer adjacent sequence (summarized in (B)). Each Cas9 ortholog in complex with its cognate dual-RNA cleaves plasmids containing the corresponding species-specific PAM (PAM+). No cleavage is observed with plasmids that did not contain the specific PAM (PAMÀ). li: linear cleavage product, sc: supercoiled plasmid DNA.
speM. To determine the protospacer-adjacent sequences critical for efficient DNA targeting, the purified Cas9 orthologs and their cognate dual-RNAs were used in DNA cleavage assays with different plasmid substrates ( Figure 5C, Supplementary Figure S9). The previously published PAMs for Cas9 from S. pyogenes (NGG), S. mutans (NGG), S. thermophilus* (NGGNG) and N. meningitidis (NNNNGATT) (27,28,53,54) were confirmed by multiple sequence alignments and in vitro cleavage assay, validating our approach. However, dual-RNA guided Cas9 from S. thermophilus* could efficiently cleave target DNA in the presence of only NGG instead of NGGNG (Supplementary Figure S9). This is in contrast to data obtained in vivo, where mutation of the third G abrogates interference by Cas9 of S. thermophilus* (23). For S. thermophilus**, the PAM was published as NNAGAAW (27), which differs by one base from the sequence that we derived (NNAAAAW). In vitro cleavage assays with these two sequences demonstrate that the DNA substrate with the 'NNAAAAW' PAM is cleaved more efficiently by Cas9 of S. thermophilus** compared to the 'NNAGAAW' PAM (Supplementary Figure S9). Using the same approach, we also validated the PAM activity of the most common protospacerdownstream sequences for C. jejuni, F. novicida and P. multocida by in vitro cleavage assays, resulting in the most probable PAM sequences being NNNNACA (C. jejuni), GNNNCNNA (P. multocida) and NG (F. novicida) ( Figure 5C, Supplementary Figure S9). Analysis of the protospacer-adjacent sequence from C. jejuni shows the same frequency of C and A ('NNNNCCA' or 'NNNNACA') at position 5 downstream of the protospacer (Supplementary Table S3). Hence, we tested both substrates for cleavage activity by C. jejuni dual-RNA:Cas9. Only the DNA target containing A at this position was cleaved efficiently (Supplementary Figure S9). This result could be explained by the origin of the protospacer, with the 'NNNNCCA' PAM being mostly found in genomic DNA or prophages of Campylobacter strains. In this case, the mutated PAM sequence on the chromosomally located protospacer prevents self-targeting. The P. multocida PAM requires further verification given that the multiple sequence alignment was derived from only two protospacer sequences. Thus, we identified a series of specific PAMs that enable dsDNA cleavage by dual-RNA:Cas9 complexes from different bacterial species in vitro. For gene editing purposes, it would be advisable to test a range of potential motifs to select those PAMs that would allow efficient targeting with limited off-site effect.
Phylogenetic clustering of Cas9 defines dual-RNA:Cas9 exchangeability As described above, clustering of Cas9 orthologs correlates with the ability to substitute for the RNA-stabilizing role of S. pyogenes Cas9 in tracrRNA:pre-crRNA processing by RNase III in vivo ( Figure 4B). We investigated the exchangeability between Cas9 and dual-RNA in closely related CRISPR-Cas systems at the level of DNA interference. Plasmid cleavage assays were performed using S. pyogenes Cas9 complexed with dual-RNAs from selected CRISPR-Cas systems representative of the clustering of the type II CRISPR-Cas systems. As shown in Figure 6A (upper panel), S. pyogenes Cas9 can cleave target DNA in the presence of dual-RNAs from S. mutans and S. thermophilus* (type II-A, yellow subcluster), but not from any other tested species. The same result was observed when the dual-RNA from S. pyogenes was incubated with Cas9 orthologs from different bacteria ( Figure 6A, lower panel). Cleavage assays were also performed with all Cas9 orthologs incubated with cognate and non-cognate dual-RNAs on their PAM-specific plasmid DNA. Only the combinations of Cas9 and dual-RNA within the same type II subcluster conferred dsDNA cleavage activity ( Figure 6B, Supplementary Figure S10). More striking was the gradient of activity dependent on how closely related the species are in the corresponding type II group. This effect can be observed for C. jejuni Cas9 that is able to cleave DNA in the presence of dual-RNA from P. multocida and N. meningitidis, but not as efficient as with its own RNA (type II-C, blue subcluster). This finding is in good agreement with the phylogenetic tree of Cas9 ( Figure 1A) showing that all three Cas9 orthologs belong to type II-C but C. jejuni Cas9 clusters more distantly from P. multocida and N. meningitidis Cas9. This effect was even greater for S. thermophilus** Cas9, which belongs to type II-A together with S. pyogenes, S. mutans and S. thermophilus*. However, none of the dual-RNAs from the three latter loci could direct DNA cleavage by S. thermophilus** Cas9. This result supports the recent findings demonstrating the lack of exchangeability between Cas9 from CRISPR1 and CRISPR3 of S. thermophilus DGCC7710 with regard to dual-RNA binding (17). We conclude that Cas9 and tracrRNA:crRNA interchangeabilty directly results from Cas9 coevolution with dual-RNA and follows the Cas9 phylogeny that may differ from the phylogeny of the respective bacterial species due to horizontal transfer.

DISCUSSION
In this work, we identified and characterized dual-RNA and PAM requirements for eight Cas9 orthologous enzymes representative of the Cas9 phylogenetic grouping. To evaluate dual-RNA:Cas9 diversity, we performed bioinformatics analysis of type II CRISPR-Cas systems from available genomes and identified Cas9 orthologs in a plethora of bacterial species that belong to 12 phyla and were isolated from diverse environments (Supplementary Tables S2 and S4). Most of the strains that harbor type II CRISPR-Cas systems (and accordingly Cas9) are pathogens and commensals of vertebrates. A majority of these strains were isolated from gastrointestinal tracts and feces of mammals, fish and birds, but also from wounds, abscesses and spinocereberal fluid of septicaemia patients. We also identified strains isolated from invertebrates and environmental samples, including fresh and sea water, plant material, soil and food, the latter comprising species used in fermentation processes. Cas9 is also present in species from extreme environments such as deep sea sediments, hot springs and Antarctic ice, further demonstrating the wide spread of type II CRISPR-Cas systems in bacteria. A comparison of the taxonomy and habitats of representative strains with the phylogenetic clustering of Cas9 sequences shows little correlation (Supplementary Figure S11). In particular, we identified clusters of Cas9 genes from taxonomically distant bacteria that were isolated from similar habitats. Examples include diverse Firmicutes, Molicutes, Spirochaete and Fusobacteria, that were all isolated from gastrointestinal tracts of mammals, and members of different Proteobacteria, Firmicutes and Fusobacteria families mostly found in environmental samples (Supplementary Figure S11, clusters 1 and 3). A few exceptions involve grouping of Cas9 genes from closely related species isolated from diverse habitats such as Actinobacteria isolated from human and dog specimens but also from hot springs (Supplementary Figure S11, clusters 2, 4 and 5). This complex distribution of Cas9 across bacterial genomes indicates that evolution of dual-RNA:Cas9 systems in bacteria occurs both vertically and horizontally (55).
To investigate the basis for the horizontal dissemination of CRISPR-Cas modules among bacteria, we assessed the specificity of RNase III utilized by type II CRISPR-Cas for dual-RNA maturation. Complementation analysis shows that RNase III from a variety of species, including bacteria that lack type II CRISPR-Cas, can process S. pyogenes tracrRNA:pre-crRNA, suggesting that type II In vitro plasmid cleavage assays using S. pyogenes Cas9 in complex with orthologous dual-RNA (upper panel) and orthologous Cas9 enzymes in complex with S. pyogenes dual-RNA (lower panel). Plasmid DNA containing protospacer speM and S. pyogenes PAM (NGG) was incubated with different dual-RNAs in complex with S. pyogenes Cas9. tracrRNA and crRNA-repeat sequences of the dual-RNAs are from the indicated bacterial species, with crRNA spacer targeting speM. In the lower panel, plasmid DNA containing speM protospacer and the specific PAM was incubated with Cas9 orthologs in complex with S. pyogenes dual-RNA. S. pyogenes Cas9 can cleave plasmid DNA only in the presence of dual-RNA from S. pyogenes, S. mutans and S. thermophilus* (yellow). Dual-RNA from S. pyogenes can mediate DNA cleavage only with Cas9 from S. pyogenes, S. mutans and S. thermophilus* (yellow). li: linear cleavage product; sc: supercoiled plasmid DNA. (B) Summary of Cas9 and dual-RNA orthologs exchangeability. Specific PAM sequences were used according to Figure 5. The color code reflects the type II CRISPR-Cas subgroups ( Figure 1). + + +: 100-75% cleavage activity; + +: 75-50% cleavage activity; +: 50-25% cleavage activity; À: 25-0% cleavage activity observed under the conditions tested. Cas9 and dual-RNA duplexes from the same type II group can be interchanged and still mediate plasmid cleavage providing that the PAM sequence is specific for Cas9. See also Supplementary Figure S10.