Exon‐independent recruitment of SRSF1 is mediated by U1 snRNP stem‐loop 3

Abstract SRSF1 protein and U1 snRNPs are closely connected splicing factors. They both stimulate exon inclusion, SRSF1 by binding to exonic splicing enhancer sequences (ESEs) and U1 snRNPs by binding to the downstream 5′ splice site (SS), and both factors affect 5′ SS selection. The binding of U1 snRNPs initiates spliceosome assembly, but SR proteins such as SRSF1 can in some cases substitute for it. The mechanistic basis of this relationship is poorly understood. We show here by single‐molecule methods that a single molecule of SRSF1 can be recruited by a U1 snRNP. This reaction is independent of exon sequences and separate from the U1‐independent process of binding to an ESE. Structural analysis and cross‐linking data show that SRSF1 contacts U1 snRNA stem‐loop 3, which is required for splicing. We suggest that the recruitment of SRSF1 to a U1 snRNP at a 5′SS is the basis for exon definition by U1 snRNP and might be one of the principal functions of U1 snRNPs in the core reactions of splicing in mammals.

(1) Compared to the established binding mode in which SRSF1 binds exonic sequences independently of the U1 snRNP, how strongly does SRSF1 bind stem loop 3 of U1? The single molecule stoichiometry experiments indirectly suggest that the binding affinities of these two distinct substrates could be quite similar however quantitative measurements of binding affinity are missing from this manuscript. The authors should perform in vitro binding assays to quantify the relative binding affinity of SRSF1 RS to the U1 snRNP (or just stem loop 3) compared to a typical exonic binding site for this protein. No matter the outcome, this result would provide insight into the relative prevalence of these binding modes. Furthermore, in vitro binding assays should also be performed to the stem loop mutant used in figure 4 C to show that this mutation really does prevent SRSF1 binding.
(2) In figure 4C the authors argue that that efficient splicing requires SRSF1 in complex with stem loop 3 of the U1 snRNP. However, the authors should address (1) the impact of this mutation on the binding of other proteins, namely FUS and PTBP1 which appear to target stem loop 3 as well and (2) the impact of this mutation on U1 structure and its implications for spliceosome assembly. As the manuscript stands, figure 4C does not convincingly measure the contribution of SRSF1 to the overall reduction in splicing observed in condition 5. Therefore, this manuscript would benefit from a more convincing link between SRSF1/U1 heterodimerization and splicing efficiency.
(3) The section titled, "The association of U1 and SRSF1 does not require pre-mRNA" requires further clarification. In this experiment, why is it not possible that U1 and SRSF1 are in complex with unlabeled endogenous mRNA from the nuclear extract?
Minor comments: (1) Figure 1 A: While the authors performed a rigorous analysis of the number of photobleaching steps for SRSF1 -mEGFP and U1-mCherry they also need to perform this same analysis for Cy5 mRNA molecules. I believe the authors only assume that each that Cy5 spot corresponds to only a single mRNA, however this may not be the case and could significantly change the interpretation of the data. Furthermore, the authors illustrate their photobleaching analysis workflow with two representative traces. It is preferable to include a figure that illustrates the overall quality of all their traces. This can be done with a histogram that shows the distribution of intensities for all analyzed spots (Cy5, mEGFP, and mCherry). One would expect to see multiple peaks corresponding to different binding stoichiometries. Another potential option is to use transition density plots {https-J /g ithub .com/ebf ret/T ransition Density Plots). (4) A number of the figures (2C, 3B) appear to be from screenshots. Remove the artifacts. Figure 3B (top right) also has some formatting issues.

Responses to reviewers
To begin, we would like to thank the reviewers for the work involved in their analyses of this manuscript and for their helpful and constructive comments. These have, we hope, resulted in very significant improvements to the manuscript. The reviewers' comments are in blue, and our responses are in black.

Reviewer 1
In this manuscript, the authors probe the binding targets of the splicing enhancer SRSFl. Using quantitative single-molecule imaging, the authors uncover a new mode of SRSFl binding in which the protein targets stem loop 3 on the Ul snRNP. Potentially, this observation is an impactful conceptual advancement in our understanding of spliceosome assembly. A new issue is raised however: if Ul is capable of independent recruitment of SRSFl, how does one achieve exon definition? A few comments need to be addressed before the manuscript can be published.
Major comments: (1) Compared to the established binding mode in which SRSFl binds exonic sequences independently of the Ul snRNP, how strongly does SRSFl bind stem loop 3 of Ul? The single molecule stoichiometry experiments indirectly suggest that the binding affinities of these two distinct substrates could be quite similar however quantitative measurements of binding affinity are missing from this manuscript. The authors should perform in vitro binding assays to quantify the relative binding affinity of SRSFlllRS to the Ul snRNP (or just stem loop 3) compared to a typical exonic binding site for this protein. No matter the outcome, this result would provide insight into the relative prevalence of these binding modes. Furthermore, in vitro binding assays should also be performed to the stem loop mutant used in figure 4 C to show that this mutation really does prevent SRSFl binding.
In order to quantitively access the strength of the binding of SRSFlARS to stem loop 3, we performed isothermal titration calorimetry (ITC). The protein was gradually added to the diluted solution of SL3 at 27°C. We determined a dissociation constant of 10.9 ± 2.8 µM (see Figure 1 below). The binding appears to be roughly 200-times weaker than for an optimal ssRNA motif containing a GGA motif preceded by a CA motif (Clery A et al., Nat. Comm. 2021). In the case of SL3, the CA motif is located in the loop, in a ssRNA region. The solution structure of SRSFl RRM2 bound to RNA clearly shows that the pseudoRRM is a specific ssRNA binder (Clery A et al., PNAS 2013). However, the GGA motif is involved in base-pairing at the apical part of the stem and the binding of SRSFlARS to stem loop 3 will compete with the formation of the secondary structure, thus explaining the reduced affinity of SRSFlARS for stem loop 3 when compared to ssRNA. As suggested by the reviewer, we also performed the same experiment by replacing the wild-type SL3 by the SL3 mutant. Using ITC, we only observed residual binding and the strength of the binding cannot be quantitatively determined. The removal of the CA and GGA motif of SL3 impaired the binding of SRSFlARS on Ul snRNA stem loop 3 (Figure 1 below). These data have now been added as Appendix Fig.  S9. Note that we have previously seen in the case of the splicing factor FUS that micromolar affinities for SL3 have physiological roles. However, we cannot rule out that other proteins or enzymatic activities (as helicases) would favour the binding of SRSFl on Ul SL3 in vivo or in nuclear extracts or that the RS domain will play a role in this protein-RNA interaction.

Pagel 1st Revision -Editorial Decision 9th Sep 2021
Dear Prof. Eperon, Thank you for submitting your revised manuscript. Please also excuse the delay in communicating this decision to you, which was due to delayed referee responses over the summer, as well as further discussions regarding the issues raised by one of the referees. Please find the comments of the three original referees below.
As you will see, referee #2 and referee #3 still express a number of concerns. I have consulted with all referees on referee #3's issues regarding the availability of experimental data to support main experiments and their analysis (ref #3-point 1, 2, 3), in particular the co-localization. We recognize that you have provided a large amount of source data, which includes imaging series. However, we would ask you to further address this issue: 1) To make the uploaded datasets easier to navigate for the reader, please add a "read-me" document detailing what the files represent to the respective folder (i.e. n stacks of x images acquired by y and analyzed by z). 2) Please also carefully review the Materials and Manuscript section on data acquisition and data analysis, and ensure that all necessary information is provided. In particular, it is important to make the following points clear, also to non-experts: a) What was the measure to distinguish between spots that were co-localized or not? b) Which controls were used to ensure mapping between images collected at different wavelengths is correct? c) How was co-localization defined? (less than or equal to 2 pixels?) 3) Please also consider revising the figure (also with respect to referee #2's comments) and adding example images for co localization or no co-localization.
In addition to these issues, please also address referee #2's points regarding the clarity of the respective figures and revise the figure legends or if needed the figure. These changes will also overall improve the accessibility of your findings to readers that are not directly working in this field. When submitting the revised version, please also include a point-by-point response to all of the referees' comments.
As mentioned in the previous decision letter, it is normally EMBO Journal's policy to allow only one round of major revision, such that it is now crucial that you address the remaining referee concerns fully in the next revised version. If you have any questions regarding this revision or would like to discuss any points in more detail, please contact me.
Kind regards,

Stefanie Boehm Editor The EMBO Journal
Overall, the authors made a serious attempt to answer my questions.
1) I appreciate how they measured SRSF1 's affinity for stem loop 3 and the stem loop 3 mutant even though the result is rather disappointing and, in my opinion, puts the overall significance of this finding into question. It also seems like they're comparing their measured KD to a previously published value for SRSF1 binding to its exonic site (Clery A et al., Nat. Comm. 2021) instead of doing the measurement themselves with the same buffer conditions and same assay. Nevertheless, I am satisfied that they performed these binding assays.
2) I also appreciate how they performed additional NMR experiments to probe whether the SL3 mutant impacts FUS binding (none observed). While it would've been nice to test if this mutant impacts PTBP1 binding, I understand this this could be difficult and perhaps beyond the scope of what is reasonable.
3) To address our concerns about endogenous RNAs present in their photobleaching assay, they directed us to the supplement where they added a nuclease and still measured colocalization between SRSF1 and the U1 snRNP. They also added a sentence to the main text. As they point out, this isn't the right assay to do here as there is the potential to degrade the U1 snRNA. Because of this issue, my opinion is that the following claim on line 228 is not well supported with the data presented at this point in the manuscript: "Our results show that a significant proportion of each protein is present in a heterodimer in functional splicing conditions." I suggest moving this section to later in the manuscript, after the NMR and splicing assays. 4) I'm fine with all of their comments to our minor issues.
Referee #2: In this revision, the authors have thoroughly and thoughtfully responded to the issues raised by the reviewers. The scientific conclusions are sound and caveats/issues for future consideration are well described. Moreover, the experiments have now been better placed in a broader context for the field. Overall, this is an excellent paper with significant impact for the field and I fully support publication in EMBO J.
That being said an important remaining issue is the clarity of the figures and their presentation. This really detracts from the science and conclusions. For example, Figure 1 A is now a hodge-lodge of video images, cartoons, raw and fitted fluorescence intensity traces, and histograms. Even after reading the legend it is unclear what is what and the logical flow of the experiment. I assume the green circle in the cartoon in 1 A is mEGFP SRSF1 but why is that not labeled?How will color blind readers be able to interpret this or other figures? I think the authors should make figure 1 A its own figures where the raw data, integrated fluorescence traces and fits, and histograms can all be logically and orderly presented and with their own figure labels (1 A, 1 B, 1 C, etc .. ). Similarly, many figures have multiple parts that are not uniquely labeled. Figure 1 B has 4 sections: 2 histograms and 2 cartoons. In Figure 1 C, the label on U1 is white instead of black since presumably U1 was unlabeled in this experiment but where is this explained? Why does only the first histogram in 1 C get a cartoon and not the others? Do the cartoons always represent the interpretation of the experimental results or are they meant to represent the assay? In 1 B, why are the green circles not associated with the RNAs? Are they not bound? Why is U1 no longer labeled in Figure 2? What does Grey vs. orange U1 represent? Why are some cartoons to the left of histograms, some to the right, and others in the middle to be shared by two histograms? This is really a major issue with Figures 1-3. While I don't think any changes are absolutely required for acceptance given the strength of the data and the significance of the science, the confusion these figures will generate will greatly reduce the readability and impact of the manuscript.

Referee #3:
In the revised manuscript, the authors responded to the points raised, and provided some additional experimental support, especially for the in vitro results. In many cases however, the authors merely restate what was written in the original version of the manuscript, with some further explanations, but without providing additional experimental support, or showing raw data, i.e .concerning the colocalization results. In my view some of these points should still be improved.
-The authors merely provide bar plots with the colocalization frequency but no raw data are shown to document the experimental evidence. The authors argue with p-values to support statistical significance, but these state just this, i.e. statistical relevance, and do not necessarily confirm the validity of the conclusions.
-Concerning the lysate used to observe colocalization with U2AF proteins, the authors state that they used exactly the same lysate as in the published work from 2018, but experiments were done with the three substrates, thus the lysate has to be a different one in each case, due to the expression of labelled U2AF proteins. The response is not very convincing and there remains some question whether the conclusions made based on the colocalization analyses are justified, and well supported by the experimental data.
-One strong point of the paper is the binding of SRSF1 to U1 snRNP in vitro to support the colocalization data. But now it turns out that this interaction is 200 times weaker than for the ESE sequences and no other evidence of helicases or SR region involvement is shown {this is just speculated about), which could rationalize this. All the data related to the binding are a bit unclear, with different temperatures used, saturation at 1 :1 from the protein perspective that is not mirrored in saturation at 1 :1 when looking at the SL3 RNA data.
-To provide some support for the proposed melting of SL3 to enable recognition of the GGA motif the authors analyze NMR spectra arguing that the line-broadening observed is consistent with a conformational change, i.e. melting, of the upper part of the stem-loop. This is a reasonable explanation (but line broadening could also be a result of the binding and the increased molecular weight of the complex). It still leaves the result that the overall affinity to SL3 is much reduced compared to the ssRNA and in fact rather low. Will this be relevant in a cellular context to explain the biological effects?
-The authors were asked to document the effect of adding a consensus 5' splice site at the end of GloC and BGSMN2. Data are shown now in Appendix Fig. S6 for BGSMN2, while data for GloC are not shown. These data show only a marginal improvement in splicing (one of two cases) and the text has been altered to acknowledge this. This is fine but does not really strengthen the overall conclusions of the manuscript.
-Nomenclature should be changed to U2AF2/1, and rather point to the old nomenclature when first introducing U2AF.

2nd Revision -Editorial Decision 29th Sep 2021
Re: EMBOJ-2021-107640R1 Exon-independent recruitment of SRSF1 is mediated by U1 snRNP stem-loop 3 Dear Prof. Eperon, Thank you for submitting your revised manuscript and addressing the remaining issues, as well as including the additional explanations with the available source data. I am pleased to say that we will now proceed with publication and would therefore ask you to address a small number of editorial and formatting issues that are listed in detail below. Once these remaining issues are resolved, we will be happy to formally accept the manuscript for publication in The EMBO Journal.
Thank you again for giving us the chance to consider your manuscript for The EMBO Journal. Please feel free to contact me if you have further questions regarding the revision or any of the specific points listed below.
Kind regards,

Stefanie
The authors have made all requested editorial changes.

B-Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner.
a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.
definitions of statistical methods and measures: a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).
The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).
The number of SM spots analysed was determined by the requirement to distinguish between binding of one or two molecules of SRSF1 or U1A. Splicing assays were performed using at least three biological replicates and NMR/ITC experiments were repeated at least one time.
graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified the exact sample size (n) for each experimental group/condition, given as a number, not a range; Each figure caption should contain the following information, for each panel where they are relevant:

N/A
The selection of Cy5 spots for analysis is based on the fit of their intensities to Gaussian parameters and the absence of overlapping spots, as decribed in the Statistics section of the Methods.
SM frequencies evaluated as discrete data used in Chi square test.

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way.