Analysis of protein–RNA interactions in CRISPR proteins and effector complexes by UV-induced cross-linking and mass spectrometry
Introduction
In a cell, RNA molecules almost invariably function in association with proteins. Since RNA molecules can have enzymatic activity, and are structurally more versatile than double-stranded DNA, the variety and numbers of proteins binding to RNA is significantly greater than those found associated with classical double-stranded DNA. Accordingly, a multitude of RNA-binding proteins (RBPs) have been described in prokaryotes and eukaryotes [1], [2]. RNA binding by these proteins is versatile and is mediated by many different RNA-binding domains (RBDs), which can occur in various combinations within one RBP. In contrast, DNA-binding proteins such as transcription factors reveal only a very moderate variation in their DNA binding motifs.
Proteins that bind to RNA can modulate or stabilize RNA structures, thereby making RNA catalytically active and also mediate interactions between RNA and other macromolecules [3]. Conversely, RNA molecules can guide catalytically active proteins to their destinations. Furthermore – like the vast majority of proteins in higher eukaryotes, which are organized in protein complexes – RBPs with their cognate RNAs also serve as assembly platforms for proteins, while also being able to prevent proteins from interacting with the RNA. Thus RBPs are often, if not always, organized in ribonucleoprotein complexes (RNPs) [1]. These play essential roles in the major cellular steps of gene expression and its regulation. Hence, there is major interest in the molecular characterization of RNA-binding proteins with clear emphasis on identifying putative RNA-binding sites, as these regions are often essential for a functional RNP.
The “gold standard” for characterizing molecular interactions of RBDs with their cognate RNA molecules by structure determination is co-crystallization [4], [5]; others include NMR of the complex [6], or high-resolution EM of entire RNPs, as performed for the ribosome [7]. Although the number of co-structures of RBPs has been steadily increasing with more than 200 co-structures of protein–RNA complexes available in the PDB, most RBPs are still crystallized without RNA. Consequently, the molecular characterization of the RBD requires mutation studies combined with definition of the surface charge of the protein to allow localization of the RBD. Similarly, perturbations in the chemical shift of amino acid residues in NMR that are caused by interaction with RNA can allow the localization of the RBDs [8].
In recent years, chemical protein–protein cross-linking and UV-induced protein–nucleic acid cross-linking, in combination with mass spectrometry, have emerged as complementary methods for obtaining information about the spatial arrangement of proteins in complexes and in RNPs [9], [10]. In the case of UV-induced protein–RNA cross-linking, MS has been applied to identify the cross-linked proteins by standard quantitative MS-based proteomic approaches [11], [12], [13]. Subsequent database-searching has led to the identification of conserved structural motifs in these proteins [2], such as RNA-recognition motifs (RRMs) [14], K homology (KH) domains [15], zinc-finger domains [16], tudor domains [17], double-stranded RNA binding domains (dsRBDs) [18], G-patch domains [19], Sm motifs [20] etc. However, such proteomic approaches yield little or no information about (i) whether the protein cross-links to the RNA through its canonical RBD or through other domains within the protein; (ii) which RBD is involved in interaction with RNA when the proteins contains several potential RBDs; (iii) how proteins that do not harbor any known RBD (as identified by sequence) interact with RNA.
The latter situation occurs very often when prokaryotic RNA-binding proteins are investigated. These do not show primary RNA-binding sequence motifs that resemble those of eukaryotic proteins. Nonetheless, three-dimensional structures of bacterial RBPs are similar to structures of eukaryotic RBDs, for example, the bacterial HfQ protein with the characteristic Sm fold [21], [22] and the prokaryotic Cas7 protein family with their RRM motifs [23], [24].
We have now developed a straightforward approach that utilizes UV-induced cross-linking and mass spectrometry, not only to identify proteins that cross-link to RNA but also to identify unambiguously the cross-linked amino-acid and the cross-linked nucleotide(s) [25]. The approach is easily applicable to single (e.g., recombinant) proteins that interact with RNA but whose structure cannot be determined in complex with RNA. In contrast to other approaches, it can be also applied to assembled RNPs of any complexity, obtained either by reconstitution or by purification from extracts. Importantly, it can even be applied at the level of entire UV-cross-linked cells.
Here we describe the method for applying this approach to single recombinant proteins bound to RNA in detail. The proteins described here belong to the recently discovered prokaryotic adaptive immune defense system CRISPR-Cas [26]. In this system Cas proteins are guided by a CRISPR RNA (crRNA) to target and degrade complementary foreign nucleic acids in a manner that is functionally reminiscent of the eukaryotic RNA interference mechanism [27]. Type I, II and III CRISPR-Cas systems are classified based on their signature Cas genes (cas3, cas9 and cas10 respectively) that are further classified into different subtypes based on the presence of other Cas genes [28]. Type I and subtypes III-A and III-B form multiprotein RNPs together with different Cas proteins in addition to Cas3 or Cas10. Type II contains mainly one Cas protein, Cas9, and generates an RNP with two different RNA molecules (crRNA and tracrRNA). Some Cas proteins comprise nuclease domains, distinct helicase domains and also RRM domains that are typical for RNA-binding proteins [29]. The Cas7 family proteins, which form the backbone of the surveillance and effector complexes in Type I and Type III systems, consist of RRMs and belong to the RAMP (repeat associated mysterious proteins) superfamily [28]. Interestingly, most Cas proteins lack conserved amino-acid residues that account for RNA interaction. The diverse peripheral domains of the Cas protein family thus mediate RNA binding.
The Cas proteins that we use to demonstrate our approach are: Type I-A Cas7 from Thermoproteus tenax; Type I-D Cas7 from Thermofilum pendens; and Type III-A Cas7 (Csm3) from Thermus thermophilus. These homologs belonging to the Cas7 protein family were not co-crystallized with their cognate crRNAs. The investigations shown here in detail for Csm3 from T. thermophilus derived from a recent study of the fully assembled CRISPR-Cas Type III-A Csm complex in which we mapped protein–RNA cross-linking sites on all the proteins within this complex [30].
Section snippets
Experimental procedures
Below we give a detailed protocol for the investigation of the molecular interaction of recombinant RNA-binding proteins with their (cognate) RNA oligonucleotides and of endogenous protein–RNA complexes isolated from prokaryotic cells using UV-induced cross-linking. The protocol allows the mapping of UV cross-linking sites between proteins and RNA at single amino acid and nucleotide resolution. The principle of this approach is that after UV-induced cross-linking of amino acid side chains
Mapping the RNA binding interface in Cas7 proteins
We applied the biochemical, mass spectrometric and computational workflow to map the RNA-binding sites within homologous Cas7 family proteins – T. tenax Cas7, T. pendens Cas7 and T. thermophilus Csm3 – bound to polyU and to crRNA. In vivo, several copies of Cas7 proteins are wrapped around crRNA in a sequence-unspecific helical fashion [5], [30], [50], [51]. Crystal structures from single and complex-bound Cas7 proteins show two composite RNA-binding surfaces: a central cleft and a structurally
Conclusions
We have established a general workflow of UV-induced cross-linking and mass spectrometry for the identification of proteins with their respective peptides and amino acids in contact with RNA. The workflow outlined here proves especially useful when crystal structures or structural models of RNA-binding proteins are available without their cognate RNA. In this case, the cross-linking sites help map the RNA on to the structure of its binding proteins. The given examples of the Cas7 protein
Author contributions
K.S. carried out the protein–RNA crosslinking experiments and data analysis in the lab of H.U. A.H. performed the expression and purification of T. pendens and T. Tenax Cas7 proteins in the lab of E.C, using the plasmid constructs provided by A.M. and L.R. respectively. A.H. performed the modeling and superposition for Fig. 3. R.S purified the endogenous T. thermophilus Type III-A Csm complex in the lab of J.v.d.O. K.K., T.S and O.K. established the data analysis workflow and provided useful
Acknowledgements
The authors thank M. Raabe and U. Pleßmann for technical assistance, all the members of Urlaub laboratory and members of Forschergruppe 1680 for helpful discussions. This work was supported by the Deutsche Forschungsgemeinschaft [DFG, FOR 1680].
References (60)
- et al.
Cell
(2014) - et al.
Cell
(2012) - et al.
Mol. Cell
(2012) Trends Biochem. Sci.
(1997)- et al.
Trends Biochem. Sci.
(1999) - et al.
Trends Biochem. Sci.
(2009) - et al.
Mol. Cell
(2014) - et al.
Mol. Cell
(2008) - et al.
The Journal of biological chemistry
(2000) - et al.
Int. J. Mass Spectrom.
(2011)
Mol. Cell. Proteomics: MCP
Mol. Cell
Mol. Cell
J. Biol. Chem.
Cell
Nat. Rev. Mol. Cell Biol.
Nat. Rev. Genet.
Science
Science
Nature
Nature
Nucleic Acids Res.
Nucleic Acids Res.
Methods Mol. Biol.
Nucleic Acids Res.
FEBS J.
FEBS J.
Metall. Integr. Biomet. Sci.
Proc. Natl. Acad. Sci. U.S.A.
EMBO J.
Cited by (23)
Approaches to study CRISPR RNA biogenesis and the key players involved
2020, MethodsCitation Excerpt :After RNase and trypsin digestion, the enriched crosslinked peptide-RNA heteroconjugates can be analyzed by mass spectrometry, yielding direct information about the amino acid residues involved in the interaction and the respective ribonucleosides. Details of this methodology have been published by Sharma et al. [104]. This approach was used to characterize the Shewanella putrefaciens CN-32 (S. putrefaciens) type I-Fv complex that consists of crRNA and only three Cas proteins (Cas5fv, Cas7fv and Cas6f).
The Human CCHC-type Zinc Finger Nucleic Acid-Binding Protein Binds G-Rich Elements in Target mRNA Coding Sequences and Promotes Translation
2017, Cell ReportsCitation Excerpt :This motif is mirrored by the RREs of CNBP found by in vitro selection (Ray et al., 2016), as well as by PAR-CLIP in this study. RNA-protein crosslinking in 4SU-PAR-CLIP, HiTS-CLIP, and other CLIP-seq procedures occurs predominantly at uridines (Kramer et al., 2014; Sharma et al., 2015) and, therefore, requires the presence of uridine bases within a few nucleotides of the binding site. Our unbiased motif enrichment analysis revealed CNBP’s G-rich RRE, mitigating concerns that results from UV-crosslinking-based protocols are disproportionately skewed toward U-rich RREs.
Editorial
2015, MethodsExtended DNA threading through a dual-engine motor module of the activating signal co-integrator 1 complex
2023, Nature Communications