Affinity-enhanced RNA-binding domains as tools to understand RNA recognition

Summary Understanding how the RNA-binding domains of a protein regulator are used to recognize its RNA targets is a key problem in RNA biology, but RNA-binding domains with very low affinity do not perform well in the methods currently available to characterize protein-RNA interactions. Here, we propose to use conservative mutations that enhance the affinity of RNA-binding domains to overcome this limitation. As a proof of principle, we have designed and validated an affinity-enhanced K-homology (KH) domain mutant of the fragile X syndrome protein FMRP, a key regulator of neuronal development, and used this mutant to determine the domain’s sequence preference and to explain FMRP recognition of specific RNA motifs in the cell. Our results validate our concept and our nuclear magnetic resonance (NMR)-based workflow. While effective mutant design requires an understanding of the underlying principles of RNA recognition by the relevant domain type, we expect the method will be used effectively in many RNA-binding domains.


INTRODUCTION
Post-transcriptional RNA regulation expands genomic diversity and is key to cellular differentiation and organismal development. Understanding how RNA-binding proteins recognize the RNA targets is a key step to rationalize the selectivity of the RNA regulatory networks. 1,2 In the last decade, the extensive use of methods that map the interaction of proteins with cellular RNAs has provided an overview of the protein-RNA-binding landscape. [3][4][5] However, in many cases, our molecular understanding of protein-RNA interactions is far from complete. This is, at least in part, because we lack the molecular models of recognition required to interpret the interactions in the cell. In particular, we require information on the sequence specificity of the low-affinity RNA-binding protein domains that are common within the multi-domain regulators. Understanding how these regulators select the RNA targets requires an insight into the sequence specificity and affinity of all of the domains. 1,2 The RNA-binding specificity of a protein can be examined using a range of in vitro methodologies (reviewed in Dasti et al. 6 ). However, the analysis can be a challenge for domains that bind RNA with low affinity, as most current methodologies are optimized for stable interactions. A number of these domains are outside the range of current methods, even those aimed at low-affinity MOTIVATION RNA-binding domains represent the building blocks of RNA regulation, and their sophisticated nucleic acid recognition builds into the selectivity and functional variation of complex post-transcriptional regulatory networks. The RNA recognition properties and the function of some domains are well characterized. However, as current methods to dissect RNA recognition require the interaction to be relatively stable, the low affinity of many other domains prevents the characterization of their RNA recognition properties and the understanding of the role they play in protein function. To overcome this limitation, we propose to use protein design to enhance a domain's affinity of RNA binding without altering nucleobase recognition and specificity. We use these affinity-enhanced domains to reveal their RNA sequence specificity and their role in target selection in the cell and protein function.
interactions, such as scaffold-independent analysis (SIA). 7 As a consequence, low-affinity RNA-binding domains are often reported to be non-specific and assumed to provide a limited contribution to target selection. In order to define the specificity of the domain and test the contribution to RNA target selection, we propose an orthogonal approach, i.e., to increase the affinity of weakly interacting RNA-binding domains and bring them within the useful range of existing methods.
In many well-studied systems, increasing the affinity of a macromolecular interaction without significantly changing the binding mode can be challenging. However, in low-affinity RNA-binding domains, including many K-homology (KH), RNA recognition motif (RRM), and zinc-finger (ZnF) domains, protein-RNA contacts are often not optimized to the same extent, and there is scope to enhance the strength of the interaction. Here, we use the KH1 domain of the fragile X syndrome protein FMRP as a paradigm for a class of low-affinity RNA-binding domains whose contribution to RNA recognition is unclear. We have designed a structure-based, localized mutation that increases KH1 RNA-binding affinity without affecting its RNAbinding mode. Then, we have shown how this mutation can be used to define the domain's sequence specificity and to rationalize FMRP in vivo recognition of its target sequences. This work both provides a proof of principle for the use of an RNA affinity-enhancing (AE) mutational strategy in the molecular investigation of target selectivity by RNA-binding proteins and defines a nuclear magnetic resonance (NMR) toolbox to execute this strategy ( Figure 1A). It also provides an important molecular insight into a key neuronal regulator and a mutation/tool that could be used by the FMRP community in a range of in vivo studies to probe the role of KH1.
FMRP is a multi-domain RNA-binding protein essential for the correct development and function of the brain. FMRP misexpression, or a dysfunctional mutant protein, is the causal factor of fragile X syndrome, the most common inherited form of cognitive impairment. 9 Because of its pivotal role in the development of the nervous system and its medical relevance, FMRP has been extensively studied at the molecular, cellular, and systemic level in mammals, in model systems, and in vitro. These studies  have linked FMRP function to RNA regulation at the molecular level, but our understanding of the process of RNA target selection is incomplete. FMRP contains multiple RNA-binding domains and recognizes both structured motifs, such as G-quartets, and single-stranded (ss) RNA elements. [9][10][11] However, how the protein would recognize the ssRNA targets is not understood. 9 FMRP displays, among others, two KH domains. KH domains typically recognize ssRNA, and the FMRP KH1 has been reported to bind homopolymeric ssRNA in vitro. 12 In addition, it has been shown that a mutation destabilizing the flanking KH2 domain, which makes contact with KH1, affects the recognition of ssRNA sequences in the cell, 13 and a construct comprising the KH1 and KH2 domains, plus additional flanking regions, binds selectively to an ssRNA sequence in vitro. 14 On the contrary, a recent biochemical study has reported that the KH1 and KH2 domains do not interact with ssRNA, 15 and structural studies have suggested they may instead bind to the ribosome. 16 The role of the KH domains in ssRNA recognition is unclear.
Notably, FMRP KH1 has been reported to bind ssRNA with low affinity. 12 The domain can be expressed as part of the KH1-2 structural unit, 17,18 and we reasoned that this offered us the opportunity to test our concept by providing physiologically relevant information on sequence specificity. It also exemplifies how the concept can be explored in those more ''complex'' cases where domains cannot be expressed individually. We first asked whether we could design a KH1 mutant with increased affinity while maintaining the domain's structure, stability, and RNA-binding mode. Then, we explored whether this mutant can be used to define the sequence specificity of KH1. Finally, we asked whether this specificity can be used to understand better the cellular protein-RNA interaction. Our results validate the concept and methods used in this paper and indicate how we can obtain important information on a key regulator of neuronal development. We finally discuss how to apply the strategy to different RNA-binding domains.

RESULTS AND DISCUSSION
In order to obtain a model system that allows examining the interaction between FMRP KH1 and RNA, we expressed KH1 within a KH1-2 di-domain, where KH2 has been mutated to eliminate the interaction with RNA, using a well-established RNAknockout (KO) mutant 8 ( Figure 1B). We refer to this construct as FMRP RNA wild type (RNA-WT), as the emphasis of this study is on the RNA-binding properties of KH1. Then, we used our general understanding of the KH-RNA interaction 19 to design an RNA-AE mutant. We mutated the KH1 GxxG loop, which interacts with the RNA backbone in KH domains and is flexible in the free protein, 19 from GTHG to GKKG, obtaining an RNA-AE/ RNA-KO KH1-2 double mutant, which we refer to as FMRP RNA-AE ( Figures 1B and 1C). Next, we used NMR as a multi-purpose tool to (1) test the structural conservation and the preservation of the RNA-binding mode in the mutant, (2) validate the increase in affinity, and (3) extract the specificity. The comparison of fingerprint NMR spectra of the KH1 WT and RNA-AE mutant constructs confirmed that the structure of the domain is maintained, with only very local changes being observed ( Figure 1C). In addition, the circular dichroism (CD)monitored unfolding curves of the KH1 WT and RNA-AE constructs indicate that the mutation has no significant effect on protein stability ( Figure S1). We therefore proceeded to test whether the KK mutation increases the affinity of KH1 for ssRNA.
In the absence of direct information on KH1 RNA sequence specificity, we titrated the fully randomized NNNNN RNA into either WT or mutated RNA-AE protein and recorded 15 N-correlation NMR experiments ( Figures 1D and S2). The direction and size of the chemical shift changes across the protein spectrum indicated that, as originally proposed, 12 KH1 interacts with ssRNA. Further, the comparison of the changes in the protein spectra indicates that the affected peaks and the direction of the shift are the same-which indicates that WT and mutant have the same RNA-binding mode and, we expect, sequence preference. Importantly, the molar fraction of the bound protein (that we read as the distance traveled by the peak) is increased, indicating that the mutant has a higher affinity than the wild type ( Figure 1D).
Then, we examined whether the mutation could help characterize the specificity of the domain using SIA as a step in our NMR workflow. SIA is an NMR-based method that allows defining the nucleobase preference of a domain in each of the positions of the bound RNA 7 (Figure 2A). The method is designed for protein-RNA interactions in the weak-intermediate range, which still excludes many RNA-binding domains of important regulators, which have K D values in the sub-millimolar range. Briefly, NMR spectra are recorded on the domain free and when bound to quasi-degenerate RNA oligos with all but one randomized position. Comparison of the changes in the protein spectra when in complex with different oligos with either A, C, G, or U in a given sequence position reports on the nucleobase preference of the domain in that position. In practice, changes are measured as the chemical shift perturbations (CSPs) of backbone amide peaks that are in fast exchange on a chemical shift timescale. Normalization and averaging are then performed to obtain the final SIA scores 7 ( Figures 2B and S3). We recorded SIA data for both the WT and RNA-AE KH1 constructs to assess whether the tighter binding of the mutant results in a meaningful improvement of the data. Although good-quality spectra were recorded for both protein constructs, the sizes of the chemical shift changes in the assays with the WT KH1 construct are too small to be measured accurately. Instead, the higher affinity of the mutant protein resulted in much larger chemical shift changes, and that allowed us to obtain reliable SIA scores for the four bound nucleobases examined (four nucleobases are recognized specifically in KH-RNA interactions 19 ) ( Figure 2B).
Our SIA data indicate that KH1 prefers a G and a C in positions 2 and 4, respectively. In positions 1 and 3, instead, G and A and A and C have similar scores, yielding a G/A-G-C/A-C sequence preference. Notably, SIA scores are semi-quantitative and comparative, and small differences depend on the choice of peaks and on small experimental variations. 7 In order to more precisely define the KH's sequence specificity, we tested the nucleobase preference of the domain in positions 1 and 3 by directly comparing the binding of the oligos/nucleobases with similar scores in our SIA we titrated the domain with the GAGCC and the GAGAC RNAs and measured the affinity of the two interactions by fitting the chemical shift changes in 2D 15 N-correlation NMR spectra against the protein/RNA ratio (Figures 3A and S4). This showed the protein prefers an A over a C with around 3-fold selectivity. We then examined whether the protein prefers an A or a G in position 1 by comparing the affinity of the UGGAC and UAGAC RNAs (Figures 3A and S4). Notably, while here we propose to use the mutant in SIA assays, depending on the affinity increase, it may be possible to test the specificity of the (mutated) domain using methods normally employed for higher affinities.
The result of our assays indicates that the domain recognizes a G with a 2.5-fold binding preference. Taken together, our SIA and follow-up assays define the domain's sequence preference as nGGAC. The recognition of specific ssRNA sequences is a significant but poorly understood element in FRMP selection of the cellular targets, and how FMRP recognizes these target sequences represents a key question in the biology of this protein. The most consistently found motif in the transcriptome-wide analysis of FMRP targets is GGA or WGGA (W = A/U), which was first identified in a PAR-CLIP (PhotoActivatable Ribonucleoside-enhanced CrossLinking and ImmunoPrecipitation) analysis 13 and later reported by others in cell analysis as UGGA. 9,[20][21][22] To what extent this motif could be organized in G-quartet structures is debated, with some studies reporting an enrichment in patterns compatible with a G-quartet organization and others reporting distribution of  15 N HSQC spectra that were recorded for the free and bound proteins (FMRP WT and FMRP RNA-AE); a single peak is shown to highlight the differences in the shifts' magnitudes. The chemical shift changes were normalized with respect to the highest shift value so that each peak contributes equally to the output. Normalized values are averaged over the set of residues to give the final set of SIA scores. motifs that do not reflect a G-quartet organization. Therefore, one important underlying question is which domain would recognize the GGA motif and whether the motif would be recognized by FMRP in a single-stranded background. Our strategy indicates that the KH1 domain recognizes the GGA sequence in an ssRNA setting ( Figure 3B). Further, our work extends the recognized sequence to a GGAC tetranucleotide, which, in fact, includes the previously reported in vitro-recognized GAC sequence 14 ( Figure 3B). This could imply that the multiple copies of the motif present in the target RNA sequence could mediate a multimerization of the protein on the RNA, the requirement for a high density of sites to increase the affinity, or both. Regardless and importantly, the higher affinity KH1 RNA-AE mutation, which increases binding affinity 40-fold (Figure S4), represents a tool to directly test the role of this domain in the binding to different RNA targets in the cell, and we expect that it will be useful to the broader community working on the role of FMRP in health and disease.
Our understanding of RNA recognition by common RNA-binding domains has increased thanks to high-resolution structures of protein-RNA complexes and to bioinformatic studies, and we now have detailed information on how domains such as KH, RRM, and many other common RNA-binding domains interact with the cognate RNAs. This information helps the design of AE mutants, extending the concept and NMR workflow we discuss here to RNA-binding domains from a range of RNA-binding proteins, and we expect that this will help us understand how these domains contribute to target recognition in the cell. For example, KH domains have a common RNA-binding mode, and in the free protein, the GxxG loop is exposed in solution. 19 The GKKG mutation used here can therefore be tested in KH domains with no positively charged residues in the GxxG loop (examples in Figure 4A) and can help define their specificity and contribution to target selection. Notably, AE mutations can be designed, in principle, in other common RNA-binding folds. For example, the interaction of the RRM domain, the most common RNA-binding domain, with the RNA targets has been studied in depth. 23 While RRMs can interact with RNA using different surfaces, the most common RNA-binding mode involves the binding of three hallmark aromatics in the domain b-sheet to the RNA bases. 23 In addition, a charged residue in position 1 is often found to make contact with the RNA backbone, and its mutation leads to a strong decrease in RNA affinity. 24 Notably, in a number of RRMs where the sequence specificity is unclear, this position is not occupied by a positively charged amino acid (examples in Figure 4B), and mutating the relevant amino acid to a lysine or arginine in this position would represent an effective AE strategy. Importantly, the method is not meant to be high throughput but rather is meant to be applied to the specific domains of interest to obtain important information on the domain RNA recognition and function. The potential of the approach is testified by the insight we have gained from the analysis of the The two green arrows indicate the two residues enclosed by the GxxG loop, which we propose could be mutated to lysine to enhance affinity. (B) Left: an example set of RRM domains. We choose domains that are expected to bind single-stranded RNA in a canonical binding mode, as they contain three hallmark conserved hydrophobic residues (black arrows and right part of this panel). The green arrow indicates the position where a Lysine or an Arginine is found in high RNA-binding affinity RRM domains. We propose that, in the domains whose seqeunces are displayed, this residue could be mutated to Lys or Arg in order to obtain an affinity enhancement. Right: canonical RRM-RNA binding mode, the relevant contacts are represented.
Cell Reports Methods 3, 100508, June 26, 2023 5 Report ll OPEN ACCESS fragile X protein, a system important but challenging to study. Our results answer an important question about that systemhow does the protein FMRP recognize ssRNA sequencesand provide a working model and a tool (a mutant) to explore this concept further in functional studies.

Limitations of the study
In order to design AE mutation(s), an understanding of the RNAbinding mode of each domain is required, and it is possible that a number of mutants need to be tested, depending on the domain to be examined. For KH domains, we expect that it will be generally possible to use the GxxG to GKKG mutation in the investigation as (1) KH domains have a common RNA-binding mode and (2) the GxxG loop is surface exposed and can be mutated without affecting the domain's fold or stability. For other domains, where one or more structures in complex with RNA are available, we expect it will be possible to identify amino acids that are positioned in proximity of nucleic acid backbone phosphates that could be mutated to positively charged residues in homologs to optimize affinity without affecting nucleobase recognition. We use the well-studied RMM domain to exemplify this design. However, if no structural information is available, for example for newly identified RNA-binding domains, then designing AE mutants is likely to be quite difficult. It is also important to point out that while the mutants are valuable tools to understand target recognition and protein function, mutant design and testing is time demanding. Therefore, rather than high throughput, the concept we propose is to focus the experimental workflow on a specific protein of interest, which can be then studied in vivo.
From a more technical stance, the SIA experiments require 15 N labeling of the protein domain. While straightforward with proteins expressed recombinantly in Escherichia coli, this is more complex in other expression systems. Additionally, there is a requirement for the protein to be stable at micromolar concentration for a period of days. This is typically not a problem for proteins that have been previously studied with biochemical or biophysical methods, and a range of buffer and experimental parameters can be used to help with this. Finally, as NMR is a size-sensitive method, experiments will typically provide better data for smaller domains or proteins. While in a majority of cases RNA-binding domains are typically of a size amenable to NMR studies (8)(9)(10)(11)(12)(13)(14)(15), in isolation or as di-or even tri-domains, size should be considered.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Andres Ramos (a.ramos@ucl.ac.uk).

Materials availability
Plasmids generated in this study have been deposited to Addgene, pET-47b, 71461-3.
Data and code availability d FMRP KH12 nuclear magnetic resonance assignment data have been deposited at the Biological Magnetic Resonance DataBank (BMRB) and are publicly available as of the date of publication. Accession number is listed in the key resources table. d This paper does not report original code. d Any additional information required to reanalyse the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
In vitro studies Commercial Escherichia coli BL21(DE3) cells were used as the source organism to obtain the recombinant proteins studied in this work. The cells were stored at À80 C, and freshly transformed with the plasmids containing FMRP KH12, WT and RNA-AE coding sequences. Cell cultures and protein induction are detailed in the method details section.

METHOD DETAILS
Cloning and mutagenesis A gene encoding FMRP with the D331-396 deletion, 25 codon-optimized for E. coli expression, was purchased from Eurofins. Primers were designed using the Crystallization Construct Designer on-line tool (https://ccd.rhpc.nki.nl) 31 and used to amplify the DNA region encoding KH1 and KH2 domains incorporating 5 0 AND-3 0 extensions complementary to sections of the vector to produce the inserts. The vector used was pET-47b which contains an N-terminal hexahistidine tag cleavable by Human Rhinovirus 3C protease and a resistance marker to kanamycin. The vector was digested with Kpn1/Sac1 and then both inserts and linearized vector were treated with T4 DNA polymerase (NEB) in the presence of dATP and dTTP respectively to produce complementary single stranded overhangs. The plasmid was transformed into BL21-Gold(DE3) (Agilent) using a standard heat shock protocol. In a second round of mutations, primers were designed to introduce the mutations T236D/H237D and T236K/H237K into KH1, and K299D/N300D into KH2 were introduced into the constructs by amplification of the plasmid using overlapping complementary primers with the mutation of interest inserted at the center of the oligonucleotides. Following PCR amplification parent DNA was removed by DpnI digestion. The primers used for cloning and mutagenesis are reported in the STAR protocol.

Protein expression and purification
The plasmid containing the FMRP KH12, WT and RNA-AE coding sequences was transformed in BL21(DE3) E. coli cells, which were used to inoculate 1000 mL of M9 minimal media containing 15