Kinetic competition during the transcription cycle results in stochastic RNA processing

Synthesis of mRNA in eukaryotes involves the coordinated action of many enzymatic processes, including initiation, elongation, splicing, and cleavage. Kinetic competition between these processes has been proposed to determine RNA fate, yet such coupling has never been observed in vivo on single transcripts. In this study, we use dual-color single-molecule RNA imaging in living human cells to construct a complete kinetic profile of transcription and splicing of the β-globin gene. We find that kinetic competition results in multiple competing pathways for pre-mRNA splicing. Splicing of the terminal intron occurs stochastically both before and after transcript release, indicating there is not a strict quality control checkpoint. The majority of pre-mRNAs are spliced after release, while diffusing away from the site of transcription. A single missense point mutation (S34F) in the essential splicing factor U2AF1 which occurs in human cancers perturbs this kinetic balance and defers splicing to occur entirely post-release. DOI: http://dx.doi.org/10.7554/eLife.03939.001


Single-transcript kinetics dominates at short delays
For each time trace obtained from TS tracking, autocorrelation and crosscorrelation functions are defined as with δa(t) = a(t) − a and δb(t) = b(t) − b (2) where · denotes time average (i.e. f (t) = lim T →∞

2T
T −T f (t)dt) and where a(t) and b(t) can be any combination of the red and green time traces r(t) and g(t) to yield all 4 correlation functions. Let a 0 (t) and b 0 (t) be the fluorescence signals from a single transcript. We first consider all the transcripts to follow the same kinetics. Let G 0 ab (τ ) be the correlation function between a 0 (t) and b 0 (t), i.e. for a single transcription event. In this context, where the signals to be correlated are non-null only over a finite time-domain (i.e. square-integrable), the correlation function is defined as i.e. temporal averages · in eq. (1) are replaced here by temporal sums R · dt (see Appendix 1). If initiation is considered a homogenous Poisson process (i.e. at each instant, the gene has a constant probability to fire, given by the initiation rate κ) then [1,2] (Appendix 1) This simply reflects the fact that the fluorescence emitted by one transcript in a(t) and b(t) is statistically uncorrelated with that of all the other transcripts. Intuitively, since transcripts can initiate any time (with uniform statistics) and independently to each other, their pairwise correlation of occurrence do not show any time dependence, resulting in a flat null baseline ( Figure 3-figure supplement 1 and Video 4).
In the case where initiation events do not arrive with a constant rate (e.g. reflecting a bursting process or cell cycle effects), this adds an extra term to eq. (4) that is the autocorrelation of the time-dependent initiation rate κ(t), convolved with G 0 ab (τ ). Unless the processes causing these fluctuations are very rapid and occur at a similar timescale as the transcription/splicing process (i.e. faster than ∼300 sec), the extra term in eq. (4) only consists in a slow decay that can be approximated as a constant baseline offset in our region of interest (i.e. typically 0 ≤ τ <∼300 sec). This is a reasonable assumption in our case since (i) the slow decaying component in the correlation functions is often well separated from the fast transcription/splicing kinetics (Figure 2-figure supplement 1B-F), (ii) bursting and cell cycle fluctuations (which can be observed in long acquisitions with a slow frame rate) occur at a significantly slower timescale than the duration of our time traces, and (iii) only the parts of the time traces where a non-null signal is visible and trackable are retained for analysis, effectively making the measurement to describe mostly the intra-burst kinetics rather than the bursting kinetics itself. Hence, we used eq. (4) in which the correlation functions of the whole fluorescence time traces simply boils down to the correlation functions for individual transcripts, scaled by the initiation rate κ.

Geometry of the transcription/splicing correlation functions
In this section, we describe how the shape of the correlation functions is mathematically linked to the timing of the transcription and splicing process. We do not assume mechanisms such as the interdependence between transcription and splicing or the kinetics of elongation, pausing and termination/3' end processing. Such assumptions about the underlying mechanisms are made in 'Mechanistic models for RNA synthesis and processing' in the 'Materials and methods' section of the main article. Here, we simply describe how the geometry of the fluorescence time profiles translate into the geometry of the correlation functions.

Simplified construct
The following aims at giving an intuition about the information content of the correlation functions and how their shape relates to the underlying timing of transcription and splicing. We make here two approximations that we will drop in the subsequent sections. Namely, we assume that (i) the kinetics of all transcripts is identical, and (ii) the MS2 and PP7 cassettes are small enough to consider that the fluorescence profiles can be described using step functions (Figure 2-figure supplement 2A).
The time course of the fluorescence emitted by a single transcript displays 4 specific events: the rise and fall of either the red or the green signal. We show here that the correlation functions reflect the time delay between any two of these 4 time points.
We consider the construct depicted on Figure 2-figure supplement 2A. We note t r , t g and t rg respectively the time delays between rise and fall in the red signal, rise and fall in the green signal, and rises in red and green. Because the red signal cannot fall after the green, we have t r ≤ t rg + t g .
The fluorescence time profiles in both colors for a single transcript can be written where t = 0 is considered to be the rise in the red signal, H(·) is the Heaviside function (with the convention H(0) = 1 2 ) and the bar denotes time reversion, i.e. the H(t) = H(−t). For convenience in the following, everything is written in terms of H(·) instead of H(·). Time reversed versions of these functions are The general formulation of a correlation function using the convolution product * is Substituting eqs. (5) to (8) into eqs. (9) and (4) and dropping all the terms that are null for We used the notation H 2 (t) = H(t) * H(t), i.e. a time-reversed ramp function and the property that . We should note that, even though these equations are defined for t ≥ 0, eq. (12) is actually valid on R since no term was found null in its calculation and hence omitted. Indeed, the first term of eq. (12) corresponds to the time-reversed version of eq. (13) and all three other terms are null on t ≤ 0 by construction. Hence, eq. (12) is the two-sided expression of G rg (t).
These correlation functions ( Figure 2-figure supplement 2A) are piece-wise linear with angles revealing specific time delays in the underlying process (i.e. each H 2 (·) term is an angle). Namely, these are all the delays from the 6 possible pairs obtained by combining the rising and falling edges of the red and green signals. The 2 pairs between edges of the same color are in the autocorrelations and the 4 pairs mixing both colors are in the crosscorrelation.
A few geometrical conclusions are worth pointing out: • In this context where transcripts are identical ( Figure 2-figure supplement 2C), we observe: • a null G rg (0) if splicing is co-transcriptional and occurs before the MS2 cassette is transcribed (t r < t rg ). • a non-null G rg (0) with a positive slope at 0 + if splicing occurs between transcription of the MS2 cassette and release of the transcript (t rg < t r < t rg + t g ). In that case, it equals the slope at 0 − (obtained from G gr (t)). • a non-null G rg (0) with a null slope at 0 + if splicing is post-release (t r = t rg + t g ).
None of these scenarios are observed in our experimental data (Fig. 2B, inset) since G rg (0) is always non-null and the slope of G rg (0 + ) is strictly positive but not aligned with that of G rg (0 − ).
• The delay where the red-to-green crosscorrelation G rg (t) starts decreasing measures the time to elongate between both cassettes, independently of any upstream (e.g. initiation, proximal pausing) and downstream effects (e.g. termination). From Fig. 2B (inset) and given the distance between the centers of both cassettes, we estimate graphically that the elongation rate is around ∼2.5kb/min.
• The duration of the green signal (from rise to fall) appears in the decay of both autocorrelation G gg (t) and the last segment of crosscorrelation G rg (t). Conversely, the duration of the red signal appears in both autocorrelation G rr (t) and the rising part of the two-sided crosscorrelation made of G gr (t) and G rg (t). In addition, the fall of the green signal (release of the RNA) imposes the red signal to fall with it (the RNA is unspliced) if it has not already. As a consequence, if some process delays the fall in green (e.g. a slow termination kinetics) without making it substantially longer than the splicing time, then the tails of all four correlation curves are expected to get longer and show similar decay. This is what is observed in our experimental data (Fig. 2B).

Ramped construct
Here, we drop the simplification that the cassettes are short enough so that the rises of fluorescence can be approximated by step functions. Instead, we consider ramps that span the entire width of the cassettes, describing the progressive increases of fluorescence as the polymerase progresses through these regions ( Figure 2-figure supplement 2B). We show that the sharp angles in the correlation functions described previously section turn into smooth angles since they originate from the convolution between fluorescence transitions that are no longer instantaneous. We note t r1 and t g1 the time to elongate through the red and the green cassettes, t r2 and t g2 the time between the end of the ramps and the fall of the signal, and t rg the time between the beginning of both ramps. By construction, we have t r1 < t rg , and because the red signal cannot fall after the green, we have t r1 + t r2 ≤ t rg + t g1 + t g2 .
Although eqs. (18) to (21) are defined for t ≥ 0, eq. (20) is actually valid on R since no term was found null in its calculation and hence omitted. Indeed, the first line of eq. (20) is to the time-reversed version of eq. (21), making eq. (20) the two-sided expression of G rg (t).
It is instructive to compare these new correlation functions with those obtained in the previous section with simplified fluorescence profiles (Figure 2-figure supplement 2A and B). We realize that the sharp angles previously identifying the well-defined delays between rapid fluorescence transitions (terms in H 2 (·) in eqs. (10) to (13) originating from the convolution of pairs of step functions) are now replaced 1 , for most of them, by smooth angles reflecting yields a curve that is rounded on a ≤ t ≤ b but straight the less-defined delays between a ramp and a step (terms in H 3 (·) in eqs. (18) to (21)) or between two ramps (terms in H 4 (·)). Only the angle reflecting the well-defined delay between the rapid falls of both fluorescence signals remains sharp, i.e. the one revealing the occurrence of splicing relatively to transcript release. Like with the simplified construct, the slopes of G rg (t) around t = 0 also indicate the post-or co-release nature of the splicing: First, only the two first lines of eq. (20) can contribute to the slopes of G rg (t) around 0. Indeed, by construction, the red ramp always finishes before the green ramp starts (t r1 < t rg ) and before the green signal falls. Thus, the two rightmost angles in G rg (t) (two last lines of eq. (20)) have no influence on the slopes at 0, i.e. they both equal a term in H 2 (·) with opposite sign, yielding a null contribution to the slope.
Hence, for t in the neighborhood of 0, we have ( 2 3 ) i.e., the slopes of G rg (t) at 0 − and 0 + are proportional to the value of the green signal respectively right before and right after the fall of the red signal. Hence, if splicing is post-release (i.e. both red and green signals fall together), there is a break of slope at 0 since rg κ d dt G rg (t) equals 1 at 0 − and is null at 0 + . Otherwise, splicing occurs before release and both slopes are equal and positive.
Another way of reaching this conclusion is by noting that the sharp angle due to the term in H 2 (·) in eq. (20) is at t = 0 if splicing is post-release and at t > 0 otherwise, making the two slopes to differ only in the former case.

Stochastic transcript kinetics
The explicit form of eq. (4) we used so far is This equation assumes that all transcripts have identical kinetics, described by signals a 0 (t) and b 0 (t). When each transcript is different, this equation generalizes to (see Appendix 1) and equal to H 2 (t − a − b/2) anywhere else.
Similarly, a term of the form where a i (t) and b i (t) are the contributions of transcript i to the total fluorescence signals a(t) and b(t), and · i denotes the average over all transcripts. In other words, up to some normalization constants, the correlation function resulting from a heterogeneous population of transcripts is simply the average of the correlation functions of each transcript.
To get rid of the normalization constants, one can formulate this equation in terms of covariance (i.e. unnormalized central moments M ab (τ ) = δa(t) δb(t + τ )) as follows Fraction of pre-release splicing: Change of slope at the origin Due to the stochastic kinetics of transcription and splicing, only a fraction of transcripts may be spliced before release while the rest is spliced after release. In that case, the slopes of G rg (t) around 0 will reflect this heterogeneity ( indicating that the slopes at 0 − and 0 + are proportional to the average values of the green signal respectively right before and right after the fall of the red signal (noted t red↓ i ). Since, these two values differ by 1 for all transcripts spliced post-release, and by 0 for all transcripts spliced pre-release, then the difference measures exactly the fraction of transcripts that are spliced post-release. However, the prefactor in this expression cannot be obtained exactly from the correlation curves. The slope at 0 − provides an approximation This approximation is exact only if splicing always happens after the ramp in the green signal (i.e. after elongating through the MS2 cassette), and is an underestimation otherwise. Combining eqs. (28) and (29), the fraction of pre-release splicing can be estimated by constituting an approximation that can only underestimate the actual value.
Hence, the orientation of G rg (0 + ) between horizontal and aligned with G rg (0 − ) provides a measure of the pre-release fraction (Figure 2-figure supplement 2D).
Using our experimental correlation curves (Fig. 2B and Figure 2-figure supplement 3A), we computed the two slopes by fitting the time derivatives of the crosscorrelation functions to a constant value on either side of τ = 0, using the points highlighted in fig Figure 2-figure supplement 3B. SEM of the derivatives were obtained from bootstrapping and propagated to yield SEM on the slope values. We concluded that splicing occurs pre-release for at least 13%(±5%) of the transcripts. In addition, we performed two two-sided z-tests to compute a p-value of less than 0.003 indicating that splicing is neither all pre-release nor all post-release. These tested respectively if the slope of G rg (τ ) is (i) different from 0 and (ii) different from the slope of G gr (τ ). Both p-values p 1 and p 2 were then combined into a single p-value 1 − (1 − p 1 )(1 − p 2 ) < 0.003 indicating that both outcome are occurring. Figure 2-figure supplement 3C illustrate with simulated data how accurately this method is able to estimate the actual pre-release fraction. Stochastic simulations were performed with two of the models presented in the next section, using parameters obtained from fitting the experimental data (apart from the splicing time which is varied). Each point on the main panel of Figure 2-figure supplement 3C corresponds to the same amount of data as for the experimental correlation functions (i.e. 23 traces of 245 frames on average). These figures indicate that the ratio of crosscorrelation slopes is an accurate estimate of the actual pre-release fraction, even though it has relatively large errors and often slightly underestimate the actual value. The inset panel uses more simulated data, yielding more precise estimates.

Appendix 1
The transcriptional time traces a(t) and b(t) can be viewed as the sums where a i (t) and b i (t) are the contribution of transcript i to the signals. From the definition of the correlation functions (1), it follows that Each of the above terms corresponding to a temporal average of a sum over all the transcripts (e.g. i a i (t) ) can be turned 3 into an average over all the transcripts of a temporal sum (e.g. κ R a i (t)dt i ), yielding If transcription initiation has homogeneous Poisson statistics, the occurrence of transcripts i and j is uncorrelated. In that context, a i (t) and j =i b j (t + τ ) are independent stochastic signals, so that the latter can be replaced in the equation above by its temporal mean which simplifies into κ R b j (t)dt j in the limit of many transcripts. This yields If all transcripts are identical, eq. (37) simplifies into is the average over all the transcripts. = κ R a i (t)dt i Note that eq. (37) expresses the correlation between square-integrable signals a i (t) and b i (t) (i.e. although defined on R, their non-null part is of finite duration), whence the use of temporal sums. On the contrary, eq. (32) uses non-square-integrable signals a(t) and b(t) (i.e. infinite signals), whence the use of temporal averages and mean-subtracted signals.

Appendix 2
Appendix 2A -Scheme I We show that, using the distributions of scheme I (Figure 2-figure supplement 4), the correlation functions are µ X and h X represent the arithmetic and harmonic means of a distribution X.
To simplify the calculations, we first derive the expressions of the covariance functions M ab (τ ) = δa(t) δb(t + τ ) before normalizing them to obtain the correlation functions G ab (τ ) = M ab (τ ) ab . We note M det ab (τ ) the covariance functions derived in the deterministic context of eqs. (18) to (21). In the stochastic context of scheme I, eq. (26) rewrites Because the expression between the brackets is independent of t A , the term R A(t A )dt A (which equals 1) can be factorized out. The same is true for t B , yielding The same also holds for t E with respect to the term in H 4 . Moreover, for any function f (t), is the definition of the convolution product. Thus, Then, similarly, are also convolution products. And H 3 (t) t D dt D = H3(t) h D where h D denotes the harmonic mean of D(t). Hence, we obtain where δ(t) is the Dirac function (e.g. H 3 (t) * δ(t) = H 3 (t)), used here simply to factorize the expression. Finally, we simplify the notation by omitting all the dependencies on t, yielding The red autocovariance M rr (t) is obtained by applying the same derivation, simply replacing distribution D by A, and distribution E by B * D * E, yielding to eq. (40). The red to green crosscovariance function M rg (t) is derived as follows: Of the terms between brackets, only second and third lines are non-null over t ≥ 0. Applying similar steps as from eqs. (46) to (49), and then factorizing, we obtain eq. (42). Finally, the green to red crosscovariance function M gr (t) is derived as follows: Applying again similar steps as from eqs. (46) to (49), we obtain eq. (43).
Similarly as for scheme I, eq. (26) rewrites in the context of scheme II as follows: The derivation of the green autocorrelation is identical to the one we performed for scheme I, yielding the exact same expression for eq. (53).
The same derivation also applies for the red autocorrelation so that, as for scheme I, it can be obtained by substituting distributions in eq. (49). However, here, distribution D is replaced by A, and distribution E by the distribution of the minimum between two random numbers drawn from distributions B * F and B * C * D * E (reflecting that the red signal falls when either splicing occurs or when the transcript is released). The complementary cumulative distribution function (CCDF) of the minimum between two distributions is simply the product of their CCDF 4 . Then, to obtain M rr (t) from eq. (49), H * E is replaced by B * (H * F )(H * C * D * E), with the convention that the regular product precedes over the convolution product. We obtain The red to green crosscovariance function is derived as follows: Noting that, over t ≥ 0, a term of the form H n (t + min(a, b)) simplifies to H n (t + b) if a ≥ 0, and applying similar steps as from eq. (46) to (49), then factorizing, we find Finally, the green to red crosscovariance function is derived as follows: 4 The CCDF of any distribution f simply reads −∞ t f (t )dt = R H(t − t )f (t )dt = H * f .