Novel molecular aspects of the CRISPR backbone protein ‘Cas7’ from cyanobacteria

The cyanobacterium Anabaena PCC 7120 shows the presence of Type I-D CRISPR system that can potentially confer adaptive immunity. The Cas7 protein (Alr1562), which forms the backbone of the Type I-D surveillance complex, was characterized from Anabaena . Alr1562, showed the presence of the non-canonical RNA recognition motif and two intrinsically disordered regions (IDRs). When overexpressed in E. coli , the Alr1562 protein was soluble and could be purified by affinity chromatography, however, deletion of IDRs rendered Alr1562 completely insoluble. The purified Alr1562 was present in the dimeric or a RNA-associated higher oligomeric form, which appeared as spiral structures under electron microscope. With RNaseA and NaCl treatment, the higher oligomeric form converted to the lower oligomeric form, indicating that oligomerization occurred due to the association of Alr1562 with RNA. The secondary structure of both these forms was largely similar, resembling that of a partially folded protein. The dimeric Alr1562 was more prone to temperature-dependent aggregation than the higher oligomeric form. In vitro , the Alr1562 bound more specifically to a minimal CRISPR unit than to the non-specific RNA. Residues required for binding of Alr1562 to RNA, identified by protein modeling-based approaches, were mutated for functional validation. Interestingly, these mutant proteins, showing reduced ability to bind RNA were predominantly present in dimeric form. Alr1562 was detected with specific antiserum in Anabaena , suggesting that the Type I-D system is expressed and may be functional in vivo . This is the first report that describes the characterization of a Cas protein from any photosynthetic organism.


Introduction
Clustered regularly interspaced short palindromic repeats (CRISPR) along with CRISPRassociated proteins (Cas) are responsible for adaptive immunity against invading nucleic acids and are found in several bacterial and archaebacterial species [1,2]. The adaptive feature comes from CRISPR system's ability to record memory of previous infection by integrating small pieces of invading DNA into the genome in the form of spacers between the repeats. This genetic memory is utilized to recognize and neutralize subsequent infections involving the same invading species. This immunity is executed in three steps: adaptation or spacer acquisition, expression of Cas proteins and CRISPR RNAs (crRNAs) and finally, interference with the the best model with a low C-score was selected for future analysis [19]. Identification of residues involved in binding to RNA was based on conservation with other homologues, I-TASSER predictions and structural overlapping of the Alr1562 model (generated through I-TASSER) to the known E. coli cascade complex. Multiple sequence alignment to assess conservation was accomplished using Clustal omega and MUSCLE [20] whereas the structural overlap of the Alr1562 model and the E. coli cascade was performed using Chimera (1.13.1) [21].

Overproduction, purification of Alr1562 proteins and Western blotting
The alr1562 ORF was PCR-amplified from the genomic DNA of Anabaena PCC 7120 using Alr1562_Fwd and Alr1562_Rev DNA primers (Supplementary Table 1). The amplified PCR product was digested with the restriction enzymes BamHI and SalI and subsequently ligated to a similarly digested pRSF-1b vector. The pRSFalr1562 was transformed into E. coli NovaBlue cells. PCR-based amplification using suitable primers and Anabaena DNA was used to generate DNA fragments encoding the truncated Alr1562 proteins. These DNA products were also ligated to pRSF-1b as mentioned above. Site directed mutagenesis to obtain point mutants of Alr1562 was performed using KOD plus mutagenesis kit (from TOYOBO Co. Ltd.), employing the wildtype pRSF-alr1562 as template. Six site directed mutants (R29A, F37A, R38A, R72A, K73A and Y119A) were generated by converting all the respective residues to alanine. All the above-mentioned plasmids were sequenced to confirm their nucleotide sequence and subsequently transformed into BL-21 DE3 (pLysS) cells for overexpression of the desired proteins. Table 2) were transformed into E. coli BL21 DE3 pLysS for overproduction of the his-tagged Alr1562. E. coli cells were initially grown in LB medium at 37 o C (with shaking, 150 rpm) till the culture density reached OD 600 ~1.0.

All respective plasmids (Supplementary
Subsequently, the culture was shifted to 20 o C and after 30 min, IPTG (0.5 mM) was added.
Purification of Alr1562 from the soluble fraction was performed with Ni-NTA agarose (Sigma) in a batch process. After thoroughly washing the resin with buffer A, the desired protein was eluted by increasing the concentration of imidazole in buffer A to 50-250 mM. Alr1562 protein eluted between 50 mM to 250 mM imidazole, with maximal elution at 100 mM imidazole.
RNaseA (2 units/10 ml homogenate) and NaCl (1.5 M) treatment for removal of RNA was performed during homogenization of E. coli cells prior to incubation with the Ni-NTA beads.
The purified Alr1562 or its variants were resolved on sodium dodecyl sulphate-polyacrylamide gels (12% or 15%) or on native polyacrylamide gels (8% acrylamide) and stained with Coomassie Brilliant Blue G-250 or ethidium bromide. The Alr1562 protein was also resolved on native polyacrylamide gels using standard 0.5X TBE buffer at 4°C. Alr1562 protein was also employed to raise specific antiserum in rabbits at a commercial facility (Genei, India) The total cellular proteins from Anabaena cultures were extracted with Laemmli's buffer [22], electrophoretically separated on denaturing polyacrylamide gel (15%), blotted on to a nitrocellulose membrane (Sigma), and probed with the Alr1562 antiserum (1:5000 dilution).
ALP conjugated anti-rabbit IgG was used as the secondary antibody and blots were developed with the substrate NBT/BCIP (Roche). All Western blots were repeated at least thrice with consistent results.

RNA isolation
RNA was isolated using TRIzol TM LS reagent (Invitrogen) by following the manufacturer's protocol. RNA associated with Alr1562 obtained from different imidazole elution fractions (i.e. 50 mM, 100 mM or 250 mM imidazole) was resolved on 1.4% denaturing formaldehyde gel.
RNA was subjected to RNAseq analysis at a commercial facility (Agrigenome, India)

Electrophoretic mobility shift assay (EMSA)
Binding of Alr1562 to different RNA substrates was evaluated using γ-32 P-labelled ssRNA oligos. Three different custom-made RNA substrates (Sigma) i.e. CRISPR repeat-spacer-repeat Downloaded from https://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200026/867717/bcj-2020-0026.pdf by guest on 14 February 2020 (RSR) motif, only CRISPR repeat, and one non-specific 20 mer were employed. These substrates were end labelled with γ-32 P-ATP employing polynucleotide kinase. Labelling was done using 10 pmol RNA/DNA oligo with 6.6 pmol γP 32 ATP and 40 units of polynucleotide kinase (NEB) in the buffer provided with the enzyme. This 10 µl reaction was incubated at 37°C for 45 minutes, 40 µl of RNase free water was added and the reaction contents were denatured at 65°C for 20 min. After heat denaturation, the unbound γ-P 32 ATP was removed from the labelled oligos using Microspin™ G-50 Columns (Merck). Binding was carried out in reaction buffer (10 mM HEPES, 15 mM MgCl 2, 200 mM KCl, 10 mM DTT, 25% Glycerol, pH 7.0) and 200 nmol of substrate was used per reaction. Protein and RNA substrates were incubated for 30 min at room temperature, mixed with a tracking dye and separated on 8% TBE gel (80V for 1h). Gel was then vacuum dried and exposed to X-ray film for 2h to overnight. X-ray films were developed using developer and fixer solutions in a dark room. For competition experiments, 25X, 75X or 200X of different RNA oligos were added.

Circular Dichroism (CD) spectroscopy and Dynamic Light Scattering
For CD spectra analysis, nearly, 0.1 mg/ml concentration of protein was used as described earlier [23]. The purified protein was diluted in buffer containing 20 mM Tris (pH 7.6) and spectra were recorded using a CD spectrophotometer (Biologic spectrometer MOS500). CD spectra of different forms of protein was taken from 30ºC to 80ºC to analyze melting temperature (T m ).
Hydrodynamic size of Alr1562 was determined by dynamiclight scattering (DLS) using the Malvern Zetasizer nanoseries instrument. Individual protein fractions (in 20 mM Tris, pH 8), were analyzed in 0.2 ml or1 ml cuvettes supplied with the instrument.

Negative stain transmission electron microscopy (TEM)
The protein (in 20 mM Tris, pH 7.6) was diluted to 1:10 ratio with 20 mM Tris, and 5 µl of sample was spotted on UV-activated carbon coated copper grids (200 mesh). After 5 min, grids were stained with 2.0 % aqueous uranyl acetate solution for 2 min. The excess stain was removed using a filter paper wick and the grid was allowed to dry. Samples were viewed on a Carl Zeiss Libra 120 TEM operating at an accelerating voltage of 120 kV.

Bioinformatic analysis of alr1562/Alr1562 from Anabaena PCC 7120
A comprehensive in silico search for CRISPR loci in Anabaena PCC 7120 (NC_003272) genome with CRISPRdisco (http://github.com/crisprlab/CRISPRdisco) showed the presence of 8 CRISPR loci. Of these, a complete putative Type-ID system was observed in CRISPR locus 3 ( Figure 1A). The CRISPR RNA from this locus (CrRNA-3) formed the typical stem-loop structure, which is a characteristic of functional CrRNA (Supplementary Figure 1A). The CRISPR type I-D system is mostly associated with the cyano sub-type of CRISPR/Cas locus, usually found in cyanobacteria and other archaeal species [4]. The alr1562 ORF (1035-bp, encoding a protein of 344 amino acids), which was the second ORF of CRISPR type-ID system, was flanked by alr1561 (2694-bp, encoding a Cas10d protein of 897 amino acids) and alr1563 (759-bp, encoding a Cas5 protein of 252 amino acids) ( Figure 1A). In the sequence database (http:// genome.microbedb.jp/cyanobase /GCA_ 00 0009705.1/genes/ alr1562), Alr1562 is annotated as a protein with unknown function. BLAST search showed the Alr1562 protein to be virtually identical (93-94 % identity) to the putative Csc2/Cas7 protein present in filamentous cyanobacterial species such as Nostoc punctiforme and Nostoc sp. PCC 7107.Alr1562 was designated as Csc2 i.e. CRISPR/Cas Subtype Cyano protein 2, a repeat associated mysterious protein (RAMP), in the conserved domain database (CDD). Unlike its cyanobacterial counterparts, Alr1562 showed very poor sequence identity to the other Cas7 proteins that have been functionally characterized (Supplementary Figure 1B). Bioinformatic analysis for prediction of disordered regions for Alr1562 (http://d2p2.pro/search), which predicts a combined consensus of disorder in proteins from several online servers, showed Alr1562 to contain higher Downloaded from https://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200026/867717/bcj-2020-0026.pdf by guest on 14 February 2020 degree of disorder at N-terminal (1-100 amino acids) and C-terminal ends (300-344 amino acids) ( Figure 1B).

Heterologous over-expression, purification and in vivo detection of Alr1562 in Anabaena
In order to characterize Alr1562, the full-length protein or its variants were overexpressed and purified from E. coli. To express proteins with a N-terminal hexa-histidine tag, the complete alr1562, ORF or the three truncated versions of the alr1562 ORF, encoding the Alr1562 proteins that lacked the disordered regions ( Figure 1B), were ligated into the E. coli expression vector pRSF-1b. The overproduction of all the respective proteins, in line with their expected mol. wt., was observed in E. coli ( Supplementary Figure 2A, 2B, 2C, 2D). Interestingly, only the fulllength Alr1562 was found in the soluble fraction (even at 37 o C) ( Figure 1C), whereas all the three truncated variants were exclusively present in the inclusion body fraction of E. coli. In fact, lowering the temperature of incubation (post-induction) to 20°C did not help, and these overproduced proteins continued to remain insoluble. The purified Alr1562 protein was employed to raise specific antiserum in rabbits which, was used to detect the production of the Alr1562 protein in Anabaena ( Figure 1D). With Anabaena cell-free protein extracts, the expected-size signal corresponding to Alr1562 could be visualized on Western blots. The slightly lower band seen in the nitrogen-supplemented lane could be due to the processing of Alr1562 under these conditions in Anabaena. These results indicate that the Alr1562 protein is indeed expressed in Anabaena.

The purified Alr1562 protein exists in dimeric as well as higher oligomeric forms
The full-length Alr1562 protein could be purified to near homogeneity by affinity chromatography employing the Ni-NTA matrix ( Figure 1C). When resolved on native PAGE, the 100/250 mM imidazole fractions showed the presence of higher oligomeric forms (seen close to the well) as well as a lower oligomeric form as shown in the ( Figure 2A). Incidentally, when the 50 mM imidazole fraction was resolved on native PAGE, the lower oligomeric form of Alr1562 was mostly observed (Figure 2A). Gel filtration of Alr1562 (250 mM imidazole fraction) showed two distinct peaks ( Figure 2B); peak I (PI) at around 8 ml retention volume and peak II (PII) at around 15 ml retention volume. The molecular weight of lower oligomeric form of Alr1562 present in the PII peak was found to be around ~90 kDa, whereas the PI peak was Downloaded from https://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200026/867717/bcj-2020-0026.pdf by guest on 14 February 2020 >600 kDa (Supplementary Figure 3A). Thus, the lower oligomeric form is very likely to be the dimeric form of Alr1562.
Both PI and PII fractions showed the same-sized band (~40 kDa) in when resolved on SDS PAGE. However, on native PAGE, a distinct difference in their profile was observed ( Figure   2C). The PI fraction showed the presence of the higher oligomeric form of Alr1562 whereas only the lower oligomeric form of the protein was present in the PII fraction. When the 50 mM elution fraction was subjected to gel filtration, the PII peak was dominant and only a small peak corresponding to PI was observed (Supplementary Figure 3B), indicating the prevalence of lower oligomeric form in this fraction. DLS profile of Alr1562 corresponding to peaks PI or PII showed hydrodynamic radii of 30 nm and 6 nm, respectively (Figure2D). PI showed a broader peak, indicating the heterogenous nature of the higher oligomeric form. Circular dichroism profile of either the PI or the PII form was found to be similar ( Figure 2E, 2F), showing the presence of 43% distorted alpha helix, 1.36% beta sheet and 46% random coils.

The higher oligomeric form of Alr1562 is associated with RNA
The 250 mM imidazole elution fraction of Alr1562 when resolved on native acrylamide gels showed a distinct ethidium bromide fluorescence at the top of the well, indicating the association of nucleic acids (from E. coli) along with the purified protein ( Figure 3A). When treated with DNase I, this fluorescence did not disappear, but when treated with RNaseA, a distinct reduction in this fluorescence was observed, indicating that it was RNA that was associated with the Alr1562 protein ( Figure 3B). Moreover, the A 260 /A 280 ratio of PI and PII ( Figure 2B) was ~1.3 and ~0.6 respectively, further corroborating that the association of RNA with the PI fraction.
RNA was isolated from 50 mM/100 mM/250 mM imidazole elution fractions that were obtained from the Ni-NTA matrix during purification of Alr1562 from E. coli. As shown in Figure 3C, RNA could be obtained only from 100 mM or 250 mM fraction that contained both forms i.e. the higher oligomeric as well as the lower oligomeric form of Alr1562. But, the 50 mM imidazole elution fraction, which mostly contained the lower oligomeric form, did not yield any significant amount RNA. These results indicate that the higher oligomeric form of Alr1562 is generally associated with RNA. RNAseq analysis found the RNA associated with the Alr1562 protein (250 mM imidazole elution) to be the abundantly expressed RNAs from E. coli and no sequence  Table. 3).

NaCl & RNaseA
The 250 mM imidazole elution fraction showed the preponderance of higher oligomeric forms as compared to the lower oligomeric i.e. dimeric form ( Figure 3D). In fact, the higher oligomeric forms of Alr1562 appeared as smear, which started right from the well of the stacking gel ( Figure 3D). On treatment with 1.5 M NaCl and RNaseA, the overall content of the higher oligomeric form decreased, as evident from the reduction of smearing in the stacking gel and a distinct increase in the dimeric form was observed in the lower portion of the gel. In gel filtration analysis, compared to the untreated Alr1562 preparation, the content of the higher oligomeric form (i.e. PI) decreased and a concomitant increase in the lower monomeric form (PI) was observed in the NaCl & RNaseA-treated preparation. Also, with the above-mentioned treatment, a broadening of the PI peak was observed ( Figure 3E). These results clearly indicate the conversion of higher oligomeric form to lower oligomeric forms on treatment with NaCl-RNaseA. Clearly, Alr1562 associated with the endogenous RNA of E. coli to form the higher oligomers. Hence, conditions that disrupt RNA-protein interaction (high salt/RNaseA), release the Alr1562 protein from RNA present in the higher oligomeric form, consequently leading to increase in the content of the dimeric form.

Alr1562 aggregates on heat treatment, releasing the bound RNA
The 50 mM imidazole elution fraction that mostly contains the dimeric form of Alr1562 was heated (40°-80°C) and subsequently resolved on native gels. Compared to the protein incubated at 40°C, the content of the dimeric Alr1562 band reduced considerably at 50°C and the protein was observed at the top of the lane, indicating aggregation. The size of the aggregates increased on further increasing the temperature, and the Alr1562 protein was observed only in the stacking gel at temperature above 70°C ( Figure 4A). The CD profile of the Alr1562 protein present in this fraction showed temperature-dependent loss of secondary structure, and temperature over 70°C, it was completely denatured (Figure 4B). The higher oligomeric form (PI peak, Figure 2B) did not show aggregation up to 60°C ( Figure   4C). In the corresponding CD profile too, there was no apparent loss in the secondary structure till 60°C (Figure 4D). In 40°-60°C-treated samples, RNA was observed in the upper half of the gel, associated with the Alr1562 protein. However, at higher temperatures (i.e. 70°C and above), most of the protein was observed in the stacking gel, demonstrating aggregation. Likewise, CD analysis showed the protein to have completely lost its secondary structure at higher temperatures. Interestingly, in contrast to the 40°-60°C-heated samples, on incubation at 70°C and 80°C, the majority of RNA was observed in the lower half of the gel, indicating the inability of the aggregated Alr1562 protein to bind to RNA (Figure 4E).

The Alr1562 protein binds to ssRNA in vitro
The CRISPR RNA (crRNA) consists of two repeats separated by a spacer. These repeats in crRNA form specific secondary structures (Supplementary Figure 1A), which facilitates binding of the Cas7 protein. Ability of Alr1562 to bind to CRISPR repeat-spacer-repeat (RSR) RNA motif from Anabaena was assessed by electrophoretic mobility shift assay (EMSA). For competition, along with the unlabeled RSR, two different RNA substrates viz. Anabaena CRISPR repeat (R2) and a 20 nucleotide RNA unrelated to Anabaena CRISPR (20-mer) were employed ( Figure 5A). The dimeric form i.e. PII peak obtained during gel filtration, which is not associated with RNA, was employed for the assays.
The Alr1562 bound to the 32 P-RSR RNA in a protein concentration-dependent manner ( Figure   5B). Competition assays showed the cold RSR RNA, inhibit binding of Alr1562 to the 32 P-RSR RNA very efficiently. Reduction in binding was also observed in the presence of cold R2 RNA, but it was not as efficient as that observed with the RSR RNA. Interestingly, the cold 20-mer RNA could not effectively compete with the 32 P-RSR and a considerable amount of 32 P-RSR-Alr1562 complexes could be observed even with 75X molar excess of cold 20-mer RNA. EMSA assays indicate that Alr1562 interacts most efficiently with RSR RNA, followed by R2 RNA and 20-mer RNA in that order ( Figure 5C).
Electron microscopy of Alr1562 (250 mM imidazole fraction) showed several spiral structures, ranging from 30-120 nm (Figure 5D). Along with these, many indistinct structures were also viewed in the background. However, only the indistinct structures were seen when the dimeric (i.e. RNA-free) form of Alr1562 was observed under the electron microscope ( Figure 5E). As

Structural modelling of Alr1562 to identify residues involved in binding to RNA
The closest homolog of Alr1562 with a known structure was Cas7 from Thermofilum pendens (PDB ID:4TXD). This homolog showed 30% similarity with a query coverage of 0.71 with Alr1562. However, the structure obtained with Swiss-Model (https://swissmodel.expasy.org/) employing 4TXD was of low quality, and hence a fold-recognition approach for modeling of were not only position-wise conserved in E. coli Cas7, but they also interacted with the crRNA.
Most of the other residues (T39, S42, R72, K73, V75 and A76) were spatially very close to the crRNA suggesting their involvement in binding to the RNA ( Figure 6B).

Alr1562 point mutants are prevalently present in lower oligomeric forms and display reduced capacity to bind RNA
To experimentally verify the importance of the above-mentioned amino acids in the actual binding to RNA, 6 of these (R29, F37, R38, R72, K73 and Y119) were converted to alanine by site directed mutagenesis and the corresponding proteins purified after over-expression in E. coli.
Interestingly, unlike the wild-type Alr1562 that co-purified with lot of E. coli-RNA, all these mutant proteins showed considerably reduced amount of associated RNA. OD 260/280 ratio of these  Gel filtration analysis showed all the mutant proteins to have a substantially reduced PI peak but the PII peak was considerably elevated, which was the reverse of that observed with the wildtype Alr1562 (Figure 6C). On native acrylamide gels too, the lower forms of the mutant Alr1562 proteins predominated. The CD spectrum of the Alr1562 point mutants was mostly similar to that of the wild-type protein (Supplementary Figure 4), indicating that the mutant proteins were indeed properly folded. EMSA assays showed the mutant Alr1562 proteins to have a substantially decreased ability to bind to the RSR substrate. Only 26-52µM of the wild-type Alr1562 was enough to saturate the RSR, whereas in contrast, even 176 µM of any of the mutant proteins was unable to completely shift RNA employed in the reaction ( Figure 6D).

Discussion:
The CRISPR systems are adaptive immune systems of prokaryotes that memorize earlier assaults of foreign genetic material and use this stored information to fend off attacks from similar elements in the future [24]. This process is brought about by the coordinated action of several CRISPR-associated i.e. 'Cas' proteins. In the CRISPR-mediated interference process, the leader sequence activates transcription of several repeat-spacer-repeat (RSR) units i.e. the pre-crRNA, which is subsequently processed by the Cas6 endonuclease to yield the mature crRNA [25].
Different Cas proteins recognize this mature crRNA and form the interference complex around it. The main backbone protein of this complex is Cas7, which forms a helical assembly around the crRNA. Cas7 associates with Cas8, and Cas5 to cover the 5′-terminal of the crRNA. Small subunits coat this helical backbone to form the CASCADE complex, which then enables binding of crRNA to the invading DNA [26]. Ultimately, the target destruction is brought about by Cas3 (nuclease/helicase) protein [2]. Thus, the Cas7 protein plays a key role in this process and is therefore vital for interference. CRISPR system [16]. Moreover, the crRNAs associated with the type ID system were expressed in vivo and processed into mature crRNAs in Anabaena [17]. Also, using the specific antiserum, the Cas7 protein of this type I-D system, Alr1562, was detected in Anabaena extracts ( Figure   1D), indicating that this CRISPR system may indeed be functional in vivo.
Unlike completely structured proteins, many proteins have regions of intrinsic disorder, which are important for their function [27]. The presence of intrinsically disordered regions (IDRs) may affect the stability, solubility or its ability to crystallize [28]. Experiments with E. coli or insect cell-free extracts have shown that higher content of intrinsic disorder in proteins is associated with the increased solubility of the expressed protein [29,30]. However, in another study, the presence of low complexity IDRs was recommended to be minimised to aid solubility of proteins in E. coli [31]. The full-length receptor protein, Cqm1, which has IDR at both its ends (akin to Alr1562), was completely insoluble when expressed in E. coli. But deletion of these IDRs rendered the truncated Cqm1 protein soluble in this expression system [32]. In the case of Alr1562, deletion of either of the N-terminal or the C-terminal IDR caused this protein to be insoluble (Figure 1A), indicating the requirement of these regions for expression of the soluble Alr1562 in E. coli.
Many CRISPR-associated RNA binding proteins, including Cas7, belong to the newly identified class of repeat-associated mysterious proteins (RAMPs). The RAMPs are characterized by the presence of two RRMs (RNA recognition motifs) of β-α-β type. Like the other RAMP proteins, Alr1562 also showed the presence of β-α-β-β-α-β topology, which are usually associated with binding to nucleic acids. The typical β1-α1-β2-β3-α2-β4 topology of Alr1562 RRM was interrupted by hairpins and loops. Notably, the central domain that contains β-sheets were similar in all known homologues of Cas7 proteins (Supplementary Figure 5) even when there was very little sequence similarity between them. Although known to bind RNA using conventional modes of protein-RNA interaction, RAMPs are capable of binding to a wide variety of RNA motifs [33].
Alr1562 bound to a non-spacer-repeat (i.e. nonspecific, NS) RNA (Figure 5C) RSR (repeat-spacer-repeat), revealing an increased specificity towards this substrate ( Figure   5C). When purified from E. coli, Alr1562 was found to be associated with the endogenous E.
coli-RNA. Incidentally, when identified, these RNAs molecules turned out to be the most abundantly expressed RNAs in E. coli, again denoting ability of the RAMPs such as Alr1562 to bind to non-specific RNAs.
The Alr1562 protein oligomerized in a helical manner along RNA (Figure 5D). Similarly, Cas7 proteins from E. coli or S. solfataricus [6] also formed a helical arrangement on RNA. This kind of organization appears to help the other proteins to associate with Cas7 in building up the Cascade complex. The S. solfataricus or M. kandleri Cas7 proteins were shown to be monomeric in solution and required other Cas proteins and crRNA for helical assembly [7]. The Alr1562 protein was observed to be dimeric and formed higher oligomers/helical assembly only in the presence of RNA. This was best exemplified by the Alr1562 mutants which were largely present in their dimeric form as they showed substantially reduced binding to RNA. This also suggests that ability of Alr1562 to dimerize is independent of its capability to bind RNA. Cas7 led to complete loss of binding activity [7]. Mutation of the corresponding residue of Alr1562, R-29, also led to reduced binding to RNA, indicating the importance of this residue in Cas7 function. Residues F37, R38, R72 and K73 were identified to be important for binding of Alr1562 to RNA in this study. As these four residues were not reported in previous studies, it would be interesting to see if the corresponding residues in the other Cas7 homologs are similarly involved in binding to RNA.
In conclusion, the Alr1562 protein showed the ability to exist in dimeric or higher oligomeric form when purified from E. coli. Detailed analysis showed that oligomerization occurred due to the association of Alr1562 dimers with endogenous RNA of E. coli. Interestingly, Alr1562 bound more specifically to repeat-spacer-repeat (RSR) RNA than to the non-specific RNA.
Using a combination of bioinformatic approaches, residues required for binding of Alr1562 to RNA were identified and subsequently mutated by site directed mutagenesis. These mutated proteins not only showed the prevalence of the dimeric form, but also displayed considerably reduced binding to RNA, demonstrating that dimer formation was independent of binding to RNA. Moreover, Alr1562 was expressed in Anabaena, suggesting that the CRISPR Type-ID system may be indeed functional in vivo. To our knowledge, this is the first report that describes the characterization of a Cas protein from any photosynthetic organism.          (B) Binding of the dimeric Alr1562 (PII fraction) to the radiolabeled RSR RNA. After electrophoresis, the gel was dried, exposed to X-ray film and subsequently developed. The free γ-32 P-RSR is indicated by an arrow, while the protein-RNA complex is indicated by *.  After electrophoresis, the gels were dried, exposed to X-ray film and subsequently developed. The free RNA is indicated by an arrow while the position of protein-RNA complex is indicated by *. Downloaded from https://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20200026/867717/bcj-2020-0026.pdf by guest on 14 February 2020