Accurate inference of stochastic gene expression from nascent transcript heterogeneity

Transcriptional rates are often estimated by fitting the distribution of mature mRNA numbers measured using smFISH (single molecule fluorescence in situ hybridization) with the distribution predicted by the telegraph model of gene expression, which defines two promoter states of activity and inactivity. However, fluctuations in mature mRNA numbers are strongly affected by processes downstream of transcription. In addition, the telegraph model assumes one gene copy, but in experiments cells may have two gene copies as cells replicate their genome during the cell cycle. It is thus unclear how accurately the inferred parameters reflect transcription. To address these issues, here we measure both mature and nascent mRNA distributions of GAL10 in yeast cells using smFISH and classify each cell according to its cell cycle stage. We infer transcriptional parameters from mature and nascent mRNA distributions, with and without accounting for cell cycle stage and compare the results to live-cell transcription measurements of the same gene. We conclude that: (i) not accounting for cell cycle dynamics in nascent mRNA data overestimates the magnitude of promoter switching rates and the initiation rate, and underestimates the fraction of time spent in the active state and the burst size. (ii) use of mature mRNA data, instead of nascent data, significantly increases the errors in parameter estimation and can mistakenly classify a gene as non-bursting. Furthermore, we show how to correctly adjust for measurement noise in smFISH at low nascent transcript numbers. Simulations with parameters estimated from nascent smFISH data corrected for cell cycle phases and measurement noise leads to autocorrelation functions that agree with those obtained from live-cell imaging. Therefore, our novel data curation method yields a quantitatively accurate picture of gene expression.


Introduction
Transcription in single cells occurs in stochastic bursts [1,2].Although the first observation of bursting occurred more than 40 years ago [3], the precise mechanisms behind this phenomenon are still under active investigation [4,5].The direct measurement of the dynamic properties of bursting employs live-cell imaging approaches, which allow visualization of bursts as they occur in living cells [6].However, in practice such live-cell measurements are challenging because they are low-throughput and require genome-editing [7,8].To circumvent this, one can exploit the fact that bursting creates heterogeneity in a population.In this case, it is relatively straightforward to obtain s steady-state distributions of the number of mRNAs per cell from smFISH or single-cell sequencing experiments.These distributions have been used to infer dynamics by comparison to theoretical models.The simplest mathematical model describing bursting is the telegraph (or two-state) model [9,10].In this model, promoters switch between an active and inactive state, where initiation occurs during the active promoter state.The model makes the further simplifying assumption that the gene copy number is one and that all the reactions are effectively first-order.The mRNA in this model can be interpreted as cellular (mature) mRNA since its removal via various decay pathways in the cytoplasm is known to follow single-exponential (first-order) decay kinetics for eukaryotic cells [11,12].The solution of the telegraph model for the steady-state distribution of mRNA numbers have been fitted to experimental mature mRNA number distributions to estimate the transcriptional parameters [1,2,10,13].
However, the reliability of the estimates of transcriptional parameters from mRNA distributions is questionable because the noise in mature mRNA (and consequently the shape of the mRNA distribution) is affected by a wide variety of factors.Recent extensions of the telegraph model have carefully investigated how mRNA fluctuations are influenced by the number of promoter states [14,15], polymerase dynamics [16], cell-to-cell variability in the rate parameter values [17,18], replication and binomial partitioning due to cell division [19], nuclear export [20] and cell cycle duration variability [21].A way to avoid noise from various post-transcriptional sources is to measure distributions of nascent mRNA rather than mature mRNA, and then fit these to the distributions predicted by an appropriate mathematical model.Nascent mRNA [22,23] is mRNA that is being actively transcribed, i.e. it is still tethered to an RNA polymerase II (Pol II) moving along a gene during transcriptional elongation.Fluctuation in nascent mRNA numbers thus directly reflects the process of transcription.Because nascent mRNA removal is not first-order, an extension of the telegraph model has been developed (the delay telegraph model) [24].Fitting the distribution predicted by the delay telegraph model to the experimental distribution is therefore expected to improve the estimates of transcriptional parameters [25][26][27][28][29].
However, nascent mRNA data still suffers from extrinsic sources of noise due to cell-to-cell variability.For example in a asynchronous population of dividing cells, cells can have either one or two gene copies.In the absence of a molecular mechanism that compensates for the increase in gene copy number upon replication, cells with two gene copies which cannot be spatially resolved will have a different distribution of nascent mRNA numbers (one with higher mean) than cells with one gene copy.The importance of the cell cyle is illustrated by the finding [30] that noisy transcription from the synthetic TetO promoter in S. cerevisiae is dominated by its dependence on the cell cycle.Although it is possible to simultaneously measure mature and nascent mRNA as a function of the cell cycle position [26], cell cycle stage is generally not taken into account when fitting smFISH distributions.Since estimation of all transcriptional parameters (switching rates and initiation rate) from nascent data as a function of the cell cycle phase has not been reported, it is unclear how the cell cycle affects transcriptional parameter inference.Additionally, there are no studies which compare transcriptional parameters estimated from nascent mRNA data and those estimated from cellular (mature) mRNA data, and hence the reliability of the latter (which is the standard and commonest procedure) remains an open question.
In this paper, we seek to understand the impact of post-transcriptional noise and cell-to-cell variability on the accuracy of transcriptional parameters inferred from mature mRNA data.The fitting algorithms (for mature and nascent mRNA data) are first tested on simulated data, where some limitations of the algorithms are uncovered in accurately estimating a subset of the transcriptional parameters in certain regions of parameter space.The algorithms are then applied to four independent experimental data sets, each measuring GAL10 mature and nascent mRNA data from smFISH in galactose-induced budding yeast, conditional on the stage of the cell cycle (G1 or G2) for thousands of cells.Comparison of the transcriptional parameter estimates allows us to separate the influence of ignoring cell cycle variability from that of post-transcriptional noise (mature vs nascent mRNA data).We find that only fitting of nascent cell-cycle data, corrected for measurement noise, provides

Steps of the algorithm to estimate parameters from mature mRNA data
The procedure consists of the following steps: (i) select a set of random transcriptional parameters; (ii) use the solution of the telegraph model to calculate the probability of observing the measured number of mature mRNA from each cell; (iii) evaluate the likelihood function for the observed data; (iv) iterate the procedure until the negative log-likelihood is minimized; (v) the set of parameters that accomplishes the latter provides the best point-estimate of the parameters of the model used to generate the synthetic data.
For step (i), we restrict the search for optimal parameters in the following region of parameter space (σ off , σ on , ρ) ∈ [Uniform(0, 150), Uniform(0, 50), Uniform(0, 300)] (min −1 ) =: Θ. (2.2) Step (ii) can be obtained either by computing the distribution from the analytical solution [9] or by using the finite state projection (FSP) method [33].Here, for the sake of computational efficiency, we use the FSP method to compute the probability distribution of mature mRNA numbers.
For step (iii) we calculate the likelihood of observing the data given a chosen parameter set θ L(θ) = N cell ∏ j=1 P(n j ; θ), (2.3) where P(n j ; θ) is the probability distribution of mature mRNA numbers obtained from step (ii) given a parameter set θ, n j is the total number of mature mRNA from cell j and N cell is the total number of cells.
Steps (i) and (iv) involve an optimization problem.Specifically we use a gradient-free optimization algorithm, namely adaptive differential evolution optimizer (ADE optimizer) using BlackBoxOptim.jlwithin the Julia programming language to find the optimal parameters θ * = arg min θ∈Θ − N cell ∑ j=1 log P(n j ; θ) . (2.4) The minimization of the negative log-likelihood is equivalent to maximizing the likelihood.Note the optimization algorithm is terminated when the number of iterations is larger than 10 4 ; this number is chosen because we have found that invariably after this number of iterations, the likelihood has converged to some maximal value.Note that the inference algorithm is particularly low cost computationally, with the optimal parameter values estimated in at most a few minutes.
Once the best parameter set θ * is found, we calculate the mean relative error which is defined as where θ * i and θ true,i represent the i-th estimated and true parameters respectively, and M denotes the number of the estimated parameters.Thus, the mean relative error reflects the deviation of the estimated parameters from the true parameters.

Nascent mRNA inference 2.2.1 Mathematical model
The steady-state solution of the delay telegraph model [24] gives the distribution of the number of bound Pol II.In SI Section 1, we present an alternative approach to derive the steady-state solution.The reaction steps are illustrated in Fig. 2a.
The position of a Pol II molecule on the gene determines the fluorescence intensity of the mRNA attached to it.In particular for fluorescence data acquired from smFISH PP7-GAL10, the fluorescence intensity of a single mRNA on the DNA locus looks like a trapezoidal pulse (see Fig. 2b for an illustration).This presents a problem because using the delay telegraph model, we can predict the distribution of the number of bound Pol II however we do not have any specific information on their spatial distribution along the gene.However since the delay telegraph model implicitly assumes that a Pol II molecule has fixed velocity and that Pol II molecules do not interact with each other (via volume exclusion), it sounds reasonable to assume that in steady-state the bound Pol II molecules are uniformly distributed along the gene.This hypothesis is confirmed by stochastic simulations of the delay telegraph model where the position of a Pol II molecule is calculated as the product of the constant Pol II velocity and the time since its production.
By the uniform distribution assumption and the measured trapezoidal fluorescence intensity profile, it follows that the signal intensity of each bound Pol II has the density function g defined by where L 1 = 862 bp (base pairs), L 2 = 2200 bp, L = L 1 + L 2 as defined in Fig. 2b.The indicator function 1 [0,1] (s) = 1 if and only if s ∈ [0, 1] and δ 1 (s) is the Dirac function at 1.The probability of the signal s being between 0 and 1 is due to the first part of the trapezoid function and hence is multiplied by L 1 /L which is the probability of being in this region if Pol II is uniformly distributed.Similarly, the probability of s being 1 is due to the L 2 part of the trapezoid and hence the probability is L 2 /L by the uniform distribution assumption.Note that the signal s from each Pol II is at most 1 because in practice, the signal intensity from the transcription site is normalized by the median intensity of single cytoplasmic mRNAs [22].
The total signal is the sum of the signals from each bound Pol II.Hence, the density function of the sum is given by the convolution of the signal densities from each bound Pol II.Defining p(s|k) as the density function of the signal given there are k bound Pol II molecules, we have that p(s|k) is the k-th convolution power of g, i.e.
where δ 0 (s) is the Dirac function at 0. Finally we can write the total fluorescent signal density function as Next we describe the generation of synthetic nascent mRNA data and the algorithm used to infer parameters from this data.

Generation of synthetic nascent mRNA data
We generate synthetic smFISH signal data by using the SSA modified to include delay to simulate the delay telegraph model; specifically we use Algorithm 2 described in [34].One realization of the algorithm simulates the fluctuating number of bound Pol II molecules in a single cell.
The total fluorescence intensity (mimicking smFISH) is obtained as follows.When a particular bound Pol II is produced by a firing of the transcription reaction G → G + N, we record this production time; since the elongation rate is assumed to be constant, given the production time we can calculate the position of the Pol II molecule on the gene at any later time and hence using Fig. 2b we can deduce the fluorescent signal due to this Pol II molecule.
Specifically we normalize each transcribing Pol II's position to [0, 1] and map the position to its normalized signal by where x is the normalized position on the gene.Thus at a given time, the total fluorescent signal from the n-th cell (the n-th realization of the SSA) equals where J n is the number of bound Pol II molecules in the n-th cell, and {x j } with j = 1, . . ., J n is the vector of all Pol II positions on the gene.The total signal from each cell is a real number but it is discretized into an integer.

Steps of the algorithm to estimate parameters from nascent mRNA data
The inference procedure is essentially the same as steps (i)-(v) described in mature mRNA inference except for the following points.
In step (ii), the probability of observing a total signal of intensity i from a single cell is obtained by integrating p(s; θ) in Eq. (2.7) on an interval [i − 1, i] for i ∈ N which, in our numerical scheme, means Note that the integration over the interval of length 1 is to match the discretization of the synthetic data and θ ∈ Θ. Intuitively, one can always choose a positive integer K such that P(k) = 0 for any k ≥ K.The computation of the solution of the delay telegraph model P(k) can be done either using the analytical solution (evaluated using high precision) or using the finite state projection algorithm (FSP) [33].In SI Fig. 1 and SI Table 1, we show that the two methods yield comparable accuracy and CPU time.
For step (iii) we calculate the likelihood of observing the data given a chosen parameter set θ where q j is the discretized total signal intensity from cell j and N cell is the total number of cells.In the optimization, we aim to find The whole procedure is summarized by a flow-chart in Fig. 2c.

Experimental data acquisition and processing
Yeast cultures were grown to early mid-log, fixed with paraformaldehyde (PFA), permeabilized with lyticase and hybridized with 7.5pmol each of four PP7 probes labeled with Cy3 (Integrated DNA Technologies) as described in Trcek et al. [35] and Lenstra et al. [36,37].The PP7 probe sequences are: atatcgtctgctcctttcta, atatgctctgctggtttcta, gcaattaggtaccttaggat, aatgaacccgggaatactgc. Coverslips were mounted on microscope slides using mounting media with DAPI (ProLong Gold, Life Technologies).
The coverslips were imaged on a Zeiss AxioObserver (Zeiss, USA) widefield microscope.Light sources for imaging DAPI, Cy3 were 440/20nm and 550/15nm, respectively from SpectaX Lumencor light engine (Lumencor, USA).The signal was detected on a Hamamatsu ORCA-Flash4.0V3 Digital CMOS camera (Hamamatsu Photonics, Japan).The imaging was performed using MicroManager (UCSF).Fields of view were selected based on the DAPI channel and stacks of 13 images with a z-step of 0.5 um were acquired in two colors.
Spots were classified as nuclear or cytoplasmic and the brightest nuclear spots were classified as transcription sites.The intensity of the brightest nuclear spot in a cell was normalized with the median intensity of all the cytoplasmic spots in the same cell.This is due to the fact that most of cytoplasmic mRNAs are isolated, thus the median of the fluorescence signal of cytoplasmic mRNAs can be considered as the normalizing value.The distribution of the normalised intensity of the brightest nuclear spot, calculated over the cell population, is the experimental equivalent of the total fluorescent signal density function as given by the solution of the modified delay telegraph model, Eq. (2.7).
The number of mature mRNA in each cell is given by counting the number of spots in the entire cell, i.e. nuclear plus cytoplasmic.The transcription site is counted as 1 mRNA, regardless of its intensity, but this has negligible influence since the mean number of mature mRNA is much greater than 1.The distribution of the number of spots is the experimental equivalent of the solution of the telegraph model, i.e. the marginal distribution of mature mRNA numbers in steady-state conditions.
The integrated nuclear intensity of each cell was calculated by summing the DNA content intensity (DAPI) of all the pixels within the nucleus mask.The distribution of the intensities was fit with a bimodal Gaussian distribution.Those cells whose intensity was within a standard deviation of the mean of the first (second) Gaussian peak was classified as G1 (G2) (see SI Fig. 2).This gave similar results to a different cell cycle classification method using the Fried/Baisch model [43] which was recently employed in [26].See SI Fig. 3 for a comparison of the two methods.
We did four independent experiments with a total number of cells equal to 2510, 6411, 4592, 3181 respectively.After classification, the numbers of G1 cells are 766, 2111, 1495, 904 and the number of G2 cells are 683, 1657, 1209, 1143, whereas the rest were classified as undetermined.

Inference from mature mRNA data: testing inference accuracy using synthetic data
Transcriptional bursting can be mathematically described with a two-state model, where we define the rate of switching from the ON state (active state) to the OFF state (inactive state) as σ off , the rate of switching from the OFF state to the ON state as σ on and the production rate of mRNAs in the ON state as ρ.The first-order decay rate of mature mRNA in the telegraph model is given by d (Fig. 1a).
To understand the accuracy of the algorithm in various regions of the parameter space, we used stochastic simulations of the telegraph model to generate synthetic data and then used an inference algorithm to infer transcriptional parameters by matching the distribution of synthetic mature mRNA data to the analytical distribution of the conventional telegraph model.
We note that the steady-state solution of the telegraph model is a function of the non-dimensional parameter ratios ρ/d, σ off /d and σ on /d [10].Hence without a direct experimental measurement of the degradation rate d, only these three ratios can be inferred [2].In these simulations, we fix the degradation rate d = 1 min −1 and aim to estimate ρ, σ off and σ on from synthetic data.
The generation of synthetic data is described in Methods Section 2.1.2.The inference algorithm is described in detail in Methods Section 2.1.3.It is based on a maximization of the likelihood of observing the single cell mature mRNA numbers measured in a population of cells.The likelihood of observing a certain number n of mature mRNA numbers from a given cell is given by the telegraph model's steady-state probability distribution of mature mRNA numbers evaluated for n copy numbers.
To test the accuracy of the inference algorithm, we generated 50 independent sets of synthetic mature mRNA data (each having 10 4 cells and a unique set of parameters) and for each, we calculated the mean relative error in the parameters (for its definition see Methods, Eq. (2.5)).Fig. 1b shows the mean relative error as a function of the fraction of ON time (this is defined as f ON = σ on /(σ off + σ on )).We find that the error reaches a minimum when the promoter is ON half of the time which occurs when σ off is equal to σ on .Fig. 1c shows the best fit distributions for 6 different parameter sets.In Fig. 1d we show the corresponding estimated parameters compared to the true ones.In agreement with Fig. 1b, we find that only for parameters sets 3 and 4 (where the switching rates are similar and the fraction of ON time is not far from 0.5) the estimates of the three transcriptional parameter rates (σ off , σ on , ρ) and the burst size (the ratio ρ/σ off ) are accurate.For parameter sets 1 and 2 (where the promoter spends a large fraction of time in the OFF state), only the burst size and the burst frequency σ on are reliable.This finding agrees with the theoretical proof that with long OFF periods, the distribution solution of the master equation of the telegraph model is well approximated by a negative binomial distribution with only two free parameters (the burst size and burst frequency) [19].For parameter sets 5 and 6 (where the promoter spends a large fraction of time in the ON state), errors are large in all parameters; while the error is smallest in the production rate in the ON state, this is still sizeable.We note that for all data sets, the effective rate of transcription given by ρ = ρ f ON is reliably inferred .This is because the inference algorithm tries to match the mean mature mRNA number, which equals ρ/d and since d is fixed, ρ is generally well inferred.
< l a t e x i t s h a 1 _ b a s e 6 4 = " r e T q M U a 5 Y F B 8 J + + v S 3 K y S T u f 8 s on < l a t e x i t s h a 1 _ b a s e 6 4 = " v 6 2 q B I e J J z C O 0 I z G j w j F A S d H 4 9

Inference from nascent mRNA data: testing inference accuracy using synthetic data
While it is common to infer transcriptional parameters using mature mRNA data, the upstream process of splicing, export and nuclear mRNA degradation adds additional extrinsic noise and changes the distributions of transcript copy numbers [44].One way to avoid this type of noise is to directly infer the parameters from distributions of nascent mRNA numbers since this is free from post-transcriptional noise sources.
While the telegraph model could in principle be used to describe nascent mRNA, it is not appropriate because nascent mRNA does not follow a first-order decay.Once a Pol II molecule is bound, it travels along the gene with an approximately constant velocity for a fixed time after which it unbinds and the nascent mRNA tail dissociates to form a mature mRNA.Hence the time between a nascent mRNA production event (the binding of a Pol II to the gene) and its removal (the dissociation of the Pol II molecule from the gene) is not exponentially distributed and cannot be simply modelled by an effective first-order reaction as for mature mRNA.To model nascent mRNA dynamics, it is more appropriate to use a delay telegraph model, where the first-order degradation reaction is replaced by a reaction that removes a nascent transcript after a fixed elongation time τ (Fig. 2a).This non-Markovian model was studied by Xu et al. [24] who found an exact steady-state solution for the distribution of nascent mRNAs; the first two moments for this distribution were also reported in [45].
The delay telegraph model predicts the distribution of the number of bound Pol II molecules.However, the latter does not directly translate into the numbers of nascent RNA molecules.In the case of smFISH [46][47][48], a method that is commonly used for mRNA detection, a fluorescent signal is emitted by oligonucleotide probes bound to the nascent mRNA.Since as a bound Pol II travels along the gene, its nascent mRNA tail grows, we expect the fluorescent signal intensity to increase as well.
Specifically in this manuscript, we use fluorescence data acquired from smFISH of PP7-GAL10 in budding yeast, where probes were hybridized to the PP7 sequences.In this case the fluorescence intensity of a single mRNA on the DNA locus looks like a trapezoidal pulse (see Fig. 2b for an illustration).As the Pol II molecule travels through the 14 repeats of the PP7 loops, the intensity of the mRNA increases as the fluorescent protein binds to the mRNA (this is the linear part of the trapezoidal pulse).However once all 14 loops on the mRNA are bound by the fluorescent proteins, the plot starts to plateau because the mRNA is as bright as it can be but still needs to elongate through the GAL10 gene body before it is released (hence the flat part of the trapezoidal pulse).The total fluorescent signal density function is hence given by < l a t e x i t s h a 1 _ b a s e 6 4 = " J G F z 2 x 1 y 5 G y 6 2 J t u d f 2 z 7 H z H Compute RNA distribution < l a t e x i t s h a 1 _ b a s e 6 4 = " R l V 9 K l A E 6 S C X l V 1 o + W z T Y t Y G P T 8 = " > A A A B 7 n i c b Z D L S s N A F I Z P 6 q W 1 3 q o u 3 Q y 2 Q t 2 U R B B d F k R w W c G 0 h T a U y X T S D p l M w s x E K a E P 4 c a F I m 5 9 H n d 9 C 9 9 A p 5 e F t v 4 w 8 P H / 5 z D n H D / h T G n b n l i 5 t f W N z X x h q 7 i 9 s 7 u 3 X z o 4 b K o 4 l Y S 6 J O a x b P t Y U c 4 E d T X T n L Y T S X H k c 9 r y w + t p 3 n q g U r F R n e e V V a J 7 X n I u a f e e U 6 1 W Y q w D H c A J V c O A S 6 n A L D X C B Q A h P 8 A K v V m I 9 W 2 / W + 7 w 0 Z y 1 6 j u C P r I 8 f y 0 G S O w = = < / l a t e x i t > P (k) < l a t e x i t s h a 1 _ b a s e 6 4 = " R l

Compute the signal intensity
Select < l a t e x i t s h a 1 _ b a s e 6 4 = " t 6 K W 2 4 K 6 Y k 3 w 2 5 K i 8 n 1 u m Y X 3 / q A = " > A A A B 7 W N z a 3 t w k 5 x d 2 / / 4 L B 0 d N w 0 c a o F N k S s Y t 0 O u E E l I 2 y Q J I X t R C M P A 4 W t Y H w 7 8 1 t P q I 2 M o w e a J O i H f B j J g R S c r N S u d G m E x C u 9 U t m t u n O w V e L l p A w 5 6 r 3 S V 7 c f i z T E i I T i x n Q 8 N y E / 4 5 q k U D g t d l O D C R d j P s S O p R E P 0 f j Z / N 4 p O 7 d K n w 1 i b S s i N l d / T 2 Q 8 N G Y S B r Y z 5 D Q y y 9 5 M / M / r p D S 4 8 T M Z J S l h J B a L B q l i F L P Z 8 6 w v N Q p S E 0 u 4 0 N L e y s S I a y 7 I R l S 0 I X j L L 6 + S 5 m X V u 6 q 6 9 1 6 5 5 u Z x F O A U z u A C P L i G G t x B H R o g Q M E z v M K b 8 + i 8 O O / O x 6 J 1 z c l n T u A P n M 8 f W W q P c w = = < / l a t e x i t > θ θ in the parameter space Maximum likelihood estimate Generate RNA distribution < l a t e x i t s h a 1 _ b a s e 6 4 = " R l V 9 K l A E 6 S C X l V 1 o + W z T Y t Y G P T 8 = " > A A A B 7 n i c b Z D L S s N A F I Z P 6 q W 1 3 q o u 3 Q y 2 Q t 2 U R B B d F k R w W c G 0 h T a U y X T S D p l M w s x E K a E P 4 c a F I m 5 9 H n d 9 C 9 9 A p 5 e F t v 4 w 8 P H / 5 z D n H D / h T G n b n l i 5 t f W N z X x h q 7 i 9 s 7 u 3 X z o 4 b K o 4 l Y S 6 J O a x b P t Y U c 4 E d T X T n L Y T S X H k c 9 r y w + t p 3 n q g U r F a J 7 X n I u a f e e U 6 1 W Y q w D H c A J V c O A S 6 n A L D X C B Q A h P 8 A K v V m I 9 W 2 / W + 7 w 0 Z y 1 6 j u C P r I 8 f y 0 G S O w = = < / l a t e x i t > P (k)        We use this extension to generate synthetic data that mimicks the smFISH signal (Methods Section 2.2.2).Specifically we use the SSA to generate histograms of the total fluorescence intensity for kinetic parameters that are randomly selected.An inference algorithm is constructed to infer the parameters, which is based on a maximization of the likelihood of observing the single cell total fluorescence intensity measured in a population of cells (Methods Section 2.2.3).Note that the likelihood of observing a certain fluorescence signal intensity from a cell is given by Eq. (3.1).This algorithm is used to infer the promoter switching and initiation rate parameters.The elongation time is not estimated but assumed to be known since this can often be measured experimentally.
The mean relative errors (see Eq. (2.5) for definition) obtained from 20 independent numerical experiments (averaged over all parameter sets) is about 0.1, which indicates that the inference is accurate.The algorithm fits the distributions of signal intensity very accurately, as shown in Fig. 2d for 6 parameter sets.A direct comparison of the estimated and true parameters for these parameter sets is shown in Fig. 2e.Note that these cases describe moderate to highly frequent gene expression, i.e. f ON 0.5.
A more extensive analysis using sets drawn over a wider range of parameter space and for which f ON spans the whole range from 0 to 1 is reported in SI Section 3, SI Fig. 4, and SI Tables 2 and 3. Therein we show that the mean relative error versus the fraction of ON time displays the same trend as Fig. 1b with the errors being largest when the fraction of ON time is close to 0 or 1.
Additionally, if one utilizes the conventional telegraph model to fit the nascent data generated by delay SSA, it is possible to obtain a distribution fitting as good as the delay telegraph model but with low-fidelity parameter estimation (SI Section 4, SI Fig. 5 and SI Table 4).Analytically, the telegraph model is only an accurate approximation of the delay telegraph model when: (i) the promoter switching timescales are much longer than the time spent by Pol II on a gene or (ii) the off switching rates are very small such that gene expression is practically constitutive.
Overall, from these synthetic data, we learn that parameter interference is more accurate from nascent than from mature mRNA distributions, and the reliability is highest when the fraction of ON time is roughly 0.5.

Applications to experimental yeast mRNA data
Now that we have introduced the inference algorithms and tested them thoroughly using synthetic data, we applied the algorithms to experimental data (see Method Section 2.3 for details of data acquisition).

Inference from mature mRNA data: merged versus cell-cycle specific
We perform the inference in two different ways: (i) using the merged data from all cells, irrespective of their position in the cell cycle.(ii) using cell-cycle specific data.
The inference of transcriptional parameters using the merged data is done using the algorithm described in Methods Section 2.1.3but with the experimental mature mRNA data replacing the synthetic data.For the cell-cycle specific data, for mature mRNA data of cells in G1, the inference protocol remains the same.However for cells in the G2 stage, this protocol needs change because now there are two gene copies whereas the solution of the telegraph model assumes one gene copy.Assuming the transcriptional activities of the two gene copies are independent, the distribution of the total molecule number is the convolution of the molecule number (obtained from the telegraph model) with itself for mature mRNA data.This convolved distribution is then used in steps (ii) and (iii) of the inference algorithm.
Note that the independence of gene copy transcription has been verified for genes in some eukaryotic cells [26] where the two copies can be easily resolved.For yeast data, as we are analyzing in this paper, it is not generally possible to distinguish the two copies of the allele in G2 because they are within the diffraction limit.This is because the yeast nucleus is a lot smaller than the mammalian nucleus and the two GAL10 genes tend to be close together, so many cells do not have two clearly separate nuclear spots in fluorescence intensity.Hence in the absence of this data, the independence assumption is the simplest reasonable assumption that we can make.
We start by the inference of the merged mature mRNA data where we have four independent data sets.In the absence of an experimental measurement of the degradation rate, we can only estimate the 3 transcriptional parameters normalised by d.The best distribution fits and the inferred transcriptional parameters for the 4 experimental data sets are shown in Fig. 3a (top) and Fig. 3b, respectively.Since σ off is much larger than σ on , only the normalised burst frequency σ on /d and the burst size are reliable, as shown for synthetic data.The small estimated values of burst size and fraction of ON time suggests small but infrequent bursty expression.Next, we performed inference for cell-cycle specific mature mRNA data, the results of which are shown in Fig. 3a (bottom) and Fig. 3c.As expected, the mean number of mRNAs in G2 cells is larger than that in G1 cells.However, inference of the transcriptional parameters for these cell cycle stages was found to be very unreliable, where very different sets of parameters lead to excellent fits of the data.
Note that the problem of estimating any of the parameters reliably is in line with the findings using synthetic data when the promoter spends most of its time in the ON state (see Section 3.1 and Fig. 1b).A measure that is frequently used for the burstiness of a gene is the Fano factor, which is defined as the variance of molecule numbers divided by the mean of molecule numbers.If the Fano factor is around 1 the gene follows non-bursting constitutive initiation, whereas larger Fano factors indicate larger burst sizes.For mature cell-cycle specific data, the Fano factor is close to 1 (Fig. 3c), which implies expression that is almost constitutive.As we showed using synthetic data, the only effective parameter that can be reliably estimated is the normalised effective rate of transcription ρ/d.Since for 3 out of 4 data sets, the effective normalised production rate per gene copy is less in G2 compared to G1, there is likely a mild form of gene dosage compensation at play, namely the transcriptional parameters are altered upon replication such that each copy of the gene has a reduced expression [26].
Hence, using cell-cycle fits suggests that expression in G1 and G2 is not bursty.This apparent lack of bursting does not agree with live-cell transcription measurements that clearly show transcriptional bursts of transcription for GAL10 [6,36].
What is particularly surprising in our analysis is the differences in the inference results using merged and cell-cycle specific data: the former suggests a higher degree of bursty expression than the latter (compare the Fano factors in Fig. 3b and Fig. 3c) [49].Heterogeneity in the merged data, due to cells with one or two gene copies, could thus lead to fluctuations in mature mRNA abundance that can be mistakenly interpreted as arising from bursty expression.Conversely, a gene that displays bursting in other measurements, can mistakenly be classified as non-bursting, as in the case for GAL10 [6,36].This analysis exemplifies that estimates of transcriptional parameters from mature mRNA distributions should be interpreted with caution.

Inference from nascent mRNA data: cell cycle effects, experimental artifacts and
comparison with mature mRNA inference

cell-cycle specific versus merged data
The inference of transcriptional parameters using the merged data is done using the algorithm described in Methods Section 2.2.3 but with the experimental nascent mRNA data replacing the synthetic data.
Similar to above, to account for two gene copies in G2 cells, we assume that the transcriptional activities of the two gene copies are independent.The distribution of the total fluorescent signal from both gene copies is the convolution of the signal distribution (obtained from the extended delay telegraph model, i.e.Eq. (3.1)) with itself (for an illustration see SI Fig. 2).This convolved distribution is then used in steps (ii) and (iii) of the inference algorithm.
The inference of transcriptional parameters from nascent RNA data was done using a fixed elongation time, which was measured previously at a related galactose-responsive gene (GAL3) at 65 bp/s [6].Since the total transcript length is 3062 bp (see Fig 2b), the elongation time (τ in our model) is ≈ 47.11 s ≈ 0.785 min.The fixed elongation rate enables us to infer the absolute values of the three transcriptional parameters σ off , σ on and ρ.
In Fig. 4a and Table 1, we show the estimates of the transcriptional parameters using both merged and cell-cycle specific data.In contrast to the mature estimates, for the nascent estimates the switching rates σ off and σ on are comparable, and we are in a parameter regime where the absolute values of all three transcriptional parameters can be reliably estimated and distributions are reasonable fit (Fig. 4b).However, merged and cell-cycle specific data produce different parameter estimates.To understand which of these estimates is correct, we compare these estimates to previous live-cell transcription measurements of the same gene [6].Because live-cell traces and simulated traces with the estimated transcriptional parameters are difficult to compare directly, we instead compare their normalized autocorrelation functions (ACFs).Specifically we feed the parameter estimates to the SSA to generate synthetic live-cell data and then calculate the corresponding ACF (SI Section 6).We find that the estimates from cell-cycle specific data produce ACFs that match the live-cell data closer than that from the merged data -see Fig. 4c (left and middle).This is also clear from the sum of squared residuals which for each dataset is smaller for the ACF computed using the cell-cycle specific estimates rather than those from merged data -see Fig. 4c right.
By not taking into account the cell cycle, heterogeneity is increased, which artificially amplifies the Fano factor and burstiness of gene expression.In addition, it results in underestimation of the fraction of time spent in the ON state ( f ON ) and the burst size.This comparison indicates that inference using merged data will lead to parameter estimates that are incorrect.
Comparing the results of inference from cell-cycle specific mature mRNA data (Fig. 3c) vs. cellcycle specific nascent mRNA data (Table 1), we find that mature data does not allow the estimation of any of the transcriptional parameters (or their normalised values).However using cell-cycle specific nascent data we could estimate all parameters reliably.Interestingly, use of mature mRNA data drastically underestimates the burstiness of gene expression: the Fano factors are 1.27 in G1 and 1.43 in G2 from mature mRNA inference vs 4.12 in G1 and 4.61 in G2 from nascent mRNA inference.
The transcriptional estimates of the G1 and G2 populations show that the burst frequency (σ on ) is considerably less in G2 compared to G1 (a 41% reduction on average)(Table 1); the other two parameters σ off and ρ show smaller differences between the two cell cycle phases (reductions of 27% and 8% on average, respectively).This decrease of the burst frequency σ on after replication has also been reported for some genes in mammalian cells [26,31], indicating that this could be a general mechanism for gene dosage compensation.Our results are consistent with a ChIP-Seq study [50] which showed that an increase in DNA dosage after replication does not increase gene expression in budding yeast.Best fit distributions for G1 and G2 (top row and middle row respectively) and merged data (bottom row) for data sets 1-4 (from left to right).c.Normalised ACF plots of merged and cell-cycle specific data (middle and left) and their residuals (right).The ACF plots are generated by stochastic simulations using estimated parameters from merged and cell-cycle specific nascent mRNA data for each of the four data sets; these are compared with the ACF measured directly using live-cell data in [6] (green line).We also compare the sum of squared ACF residuals of merged and cell-cycle specific data from each dataset (this is the sum of squared deviations between the measured and estimated normalised ACF where the sum is calculated over all time points). Nascent

Correcting for experimental artefacts
Although inference on cell cycle seperated data outperformed inference on merged data, we noticed that the corresponding best fit distributions did not match well to the experimental signal distributions in the lower bins (Fig. 4b).In all cases, the experimental distributions show high intensities in bins 1, 2, and 3, which is likely an artifact of the experimental data acquisition system.Since we define the transcription site as the brightest spot, that means that if in reality there is no transcription site, we confuse a mature transcript with a nascent transcript.We therefore investigated two methods to correct for this, the "fusion" method and the "rejection" method.
The rejection method removes all data associated with the first k bins of the experimentally obtained histogram of fluorescent intensities (and renormalises afterwards).We find that the parameter estimates vary strongly when the number of bins from which data is rejected (k) is changed (Fig. 5a).Although the distributions fit well to the experimental histograms (Fig. 5b), comparison with the live-cell normalized ACF indicates that the estimates actually become worse than non-curated estimates, with a higher sum of squared residuals (Fig. 5c).The rejection method therefore does not produce reliable estimates.
H h o n F f e s 4 t 6 c l q q X W R x 5 t I f 2 U R m 5 6 B x V 0 T W q o T o i 6 B E 9 o 1 f 0 Z j 1 Z L 9 a 7 9 T F t z V n Z z C 7 6 I + v z B 7 R J n J M = < / l a t e x i t > o↵(min a E p w Z 0 + e J 8 2 T q n t W d W 9 O y 7 X L v I 4 C 2 k c H q I J c d I 5 q 6 B r V U Q M R 9 I i e 0 S t 6 s 5 6 s F + v d + p h G F 6 x 8 Z g / 9 g f X 5 A / L J n C s = < / l a t e x i t > on(min 1 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " m 7 y G T m Y K l L Y T 3 c C c w 5 j E j S f j l 3 w = "   5. c.Normalised autocorrelation function (ACF) predicted by stochastic simulations using the estimated parameters (for k = 4) for each of the four data sets versus that measured directly using live-cell data (green line).Inset shows the sum of squared residuals of the ACF plots.
Next we considered another data curation method which we call the fusion method.This works by setting to zero all fluorescent intensities in a cell population which are below a certain threshold.In other words, we fuse or combine the first k bins of the experimentally obtained histogram of fluorescent intensities, thereby taking into account that the true intensity of bin 0 is artificially distributed over some of the first bins.Fig. 6a shows that the fusion method leads to estimates that vary little with k which enhances our degree of confidence in them (note that k = 1 is the same as the uncurated data).The peak at the zero bin for both G1 and G2 is better captured using the fusion method than using non-curated data (compare Fig. 4b and Fig. 6b).Comparison to the autocorrelation function of the live-cell data shows that correction with the fusion method also leads to improved transcriptional estimates, as indicated by a reduction in the sum of the squared residuals for all 4 data sets (Fig. 6c).

G1 G2
< l a t e x i t s h a 1 _ b a s e 6 4 = " 8 u 2 U w t k d 9 c u K F X t T q 2 h s T a E p w Z 0 + e J 8 2 T q n t W d W 9 O y 7 X L v I 4 C 2 k c H q I J c d I 5 q 6 B r V U Q M R 9 I i e 0 S t 6 s 5 6 s F + v d + p h G F 6 x 8 Z g / 9 g f X 5 A / L J n C s = < / l a t e x i t > on(min 1 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " m 7 y G T m Y K l L Y T 3 c C c w 5 j E j S f j l 3 w = "    predicted by stochastic simulations using the estimated parameters (for k = 4) for each of the four data sets versus that measured directly using live-cell data (green line).Inset shows the sum of squared residuals of the ACF plots.d.Estimated parameters of cell cycle specified data and merged data of nascent mRNAs with fusion method with k = 4 (fusing bins 0-3).These correspond to the fitted distributions in b.The elongation time τ is fixed to 0.785 mins.See the inferred parameters in SI Table 6 for all other values of k.
Overall, we conclude that for inferring parameters from the smFISH data, the optimal method is to use nascent cell-cycle specific data, corrected by fusion method.The optimally inferred parameters for the four data sets in our study are those given in Fig 6d .In SI Section 7, we use the profile likelihood method to obtain the 95% confidence intervals of each of the estimated parameters.

Discussion
In this study, we compare the reliability of transcriptional parameter interference from mature and nascent mRNA distributions, with and without taking into account the cell cycle stage.Although these distributions come from the same experiment, we find that the different fits produce very different parameter estimates, ranging from non-bursting to small bursts to very large bursts.Comparison to live-cell data reveals that the optimal inference method is to use nascent mRNA data that is separated by cell cycle.
Our findings illustrate the risk of inferring transcriptional parameters from fitting of mRNA distributions.The commonest method of parameter inference in literature is fitting of mature mRNA distributions that are not separated by cell cycle [2,10,22].Obtaining such distributions is straightforward using methods such as smFISH, where one can directly count the number of mRNAs per cell.Additionally, with the advance of single-cell mRNA sequencing technologies, it is possible to obtain mRNA distributions for many genes simultaneously and it is tempting to use these to estimate bursting behaviour across the genome [2,13].However, our comparisons on the same dataset show that the values obtained from mature mRNA fits can be significantly different from the real values, with underestimation of the burst sizes of more than 25-fold and underestimation of the active fraction of more than 4-fold.Such large inaccuracies indicate that parameter inference of mature mRNA data should be treated with caution.
It is more common to fit mature distributions rather than nascent distributions because nascent distributions are technically more challenging to obtain.As nascent single-cell sequencing methods are still in the early phase [51], the only method available so far for nascent measurements is sm-FISH [37].In such smFISH experiments, intronic probes can be used to specifically label nascent RNA, although there may be some effects of splicing kinetics on the distribution [52].If introns are not present, like for most yeast genes, one can use exonic probes instead [22].Since exonic probes label both nascent and mature mRNA transcript, it may be challenging to identify the nascent transcription site unambiguously, especially at lower transcription levels.We show in this manuscript that the fusion method can correct for this bias by combining bins below k RNAs, which results in an improvement of the parameter estimates.
Our analysis also emphasizes the importance of separately analyzing G1 and G2 cells [26].It is important to note that for cell-cycle-specific analysis, experimental adjustments or cell-cycle synchronized cultures are not required.Although asynchronous cultures consist of a mix G1, S and G2 cells, the integrated DNA intensity of the nucleus of each cell, for example from a DAPI signal, can be used to separate these cells by cell cycle stage in silico [26,53].As most smFISH experiments already include a DNA-labelled channel, adding an extra analysis step should in principle not limit the incorporation of this step in future smFISH fitting procedures.
Even with our optimal fitting strategy, there is a residual error of the simulated ACF and the measured ACF from live-cell measurements.This difference may be the result of different experimental biases of the two measurements.For example, live-cell measurements have a detection threshold below which RNAs may not be detected.In addition, live-cell measurements include cells in S phase, which are excluded in smFISH.There could also be differences in the exact percentage of G1 and G2 cells, or other extrinsic noise sources between live-cell and smFISH experiments.Alternatively, the fit may be imperfect because there might be parameter sets, others than the ones which our inference algorithm found, which provide an accurate fit of the nascent mRNA distribution and perhaps an even better fit to the ACF than we found.We cannot exclude this possibility because we estimated f ON to be 0.7 − 0.8 and using synthetic data we showed that the inference algorithm performed best when f ON was about 0.5, and its accuracy deteriorated as f ON approached 0 or 1.Another factor which could explain the residual error between the simulated ACF and the measured ACF is that perhaps the two-state model may be too simplistic to cover the true promoter states in living cells and may therefore not be able to describe the true in vivo kinetics.Nevertheless, given that there is no explicit time component in smFISH data, the closeness of the simulated ACF to the measured

. 7 )
where P(k; θ) is the steady-state solution of the delay telegraph model giving the probability of observing k bound Pol II molecules for the parameter set θ. Hence Eq. (2.7) represents the extension of the delay telegraph model to predict the smFISH fluorescent signal of the transcription site.
t e x i t s h a 1 _ b a s e 6 4 = " p E 1 2 j F i F V 5 H z a s d z e b K d Q C V u N K 8 = " > A A A C 5 n i c f Z L P b 9 M w F M e d 8 G s t P 1 b g y M r 7 / s 8 2 X 7 P U S G 4 s U H w y / O v X L 1 2 / c Z G r 3 / z 1 u 0 7 m 4 O 7 9 / a M K j W D G V N C 6 Y O I G h B c w s x y K + C g 0 E D z S M B + d P S m y e 8 f g z Z c y f d 2 V c A i p 6 n k C W f U O m s 5 + E 0 i S L m s 4 E P Z O q O a C B q B c M Y z C w J S T Y u s 7 v d 6 H U c F T y X E j d N 7 H I 5 P p 5 i c a J 5 m l m q t P l b E 8 D S n y 4 p Y O L G V S p K 6 r q e H I 0 J a f u L 4 w 9 F / K 2 R T 0 O H P H b 7 O 6 k z V e P p 0 p w O 2 x q c 7 a 0 B c k 2 O q p 1 u y 2 T F + k l O b j L e c w y p p 1 a r e Z b V 2 f 1 G p 3 + Q Z F e E E T u E c P L i C O t x B A 5 p A Y A Q v 8 A p v z r P z 7 n w 4 n / P S g p P 3 H M M C n K 9 f R i 2 X l w = = < / l a t e x i t > r < l a t e x i t s h a 1 _ b a s e 6 4 = " O B l m e W O o u C 3 p 1 n p w 3 5 9 3 5 W J Y W n L z n F P 2 B 8 / k D 0 W 6 W M A = = < / l a t e x i t > d < l a t e x i t s h a 1 _ b a s e 6 4 = " J G F z 2 x 1 y 5 G y 6 2 J t u d f 2 z 7 H z H 2 b o 7 b 7 S v y 4 y q 6 B i d o D N k o U v U R r e o g 7 q I o A S 9 o F f 0 p j 1 r 7 9 q H 9 j l v r W j l T B 0 t Q P v 6 B Y F e n t c = < / l a t e x i t > s off ⌦ < l a t e x i t s h a 1 _ b a s e 6 4 = " M d U g 0 C w 4 E + O m 9 b N u 7 3 2 7 l 3 z M + L k = " > A A A B / H i c b V D L S g M x F L 3 j s 9 b X a J d u g l V w V W a q o M u C G 5 c V 7 A P a U j J p p g 3 N J E O S E Y a h / o o b F 4 q 4 9 U P c + T d m 2 l l o 6 4 H A 4 Z x 7 u D c n i D n T x v O + n b X 1 j c 2 t 7 d J O e X d v / + D Q P T p u a 5 k o r e D O e j B f j 3 f i Y t S 4 Z 6 c w B + A P j 8 w c o 0 5 P V < / l a t e x i t > y = 4.727x 2 4.699 0.085

Figure 1 :
Figure 1: Inference of transcriptional parameters using as input synthetic mature mRNA data generated by SSA simulations.a.A schematic illustration of the telegraph model.b.Plot of the Log mean relative error (RE) against fraction of ON time ( f ON ) for 50 numerical experiments, each consisting of 10 4 SSA samples generated for a unique (random) set of parameters.Note that the Log(Mean R.E.) is base 10. c.Distributions of mature mRNA from synthetic data (red dots) fit using the inference algorithm with telegraph model (blue) for six different parameter sets.d.Estimates using the inference algorithm with the telegraph model for the six parameter sets in c.For both the ground truth and the estimated parameters, we fix the degradation rate d = 1 min −1 .

. 1 )
where p(s|k) is the density function of the signal given there are k bound Pol II molecules and P(k; θ) is the steady-state solution of the delay telegraph model giving the probability of observing k bound Pol II molecules for the parameter set θ.In Methods Section 2.2.1 we show how p(s|k) can be approximately calculated for the trapezoidal pulse.Hence Eq. (3.1) represents the extension of the delay telegraph model to predict the smFISH fluorescent signal of the transcription site.t e x i t s h a 1 _ b a s e 6 4 = " c s n z H 4 y y + v k l a t 6 l 9 V a w / X l f p 5 E U c J n a I z d I l 8 d I P q 6 B 4 1 U B N R N E b P 6 B W 9 O b n z 4 r w 7 H 4 v W N a e Y O U F / 4 H z + A D p k k / M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 / / F b H 4 K a W z e c D I g l k M k u S / 9 v b E o e j Y K 3 4 w H W y + X 6 1 g l 9 8 h 9 8 o h E 5 D n Z I q / J D t k l z J P e J + + L 9 9 U 3 / m f / m / / 9 F P W 9 Z c 1 d c i 7 8 H 7 8 B 3 L b v M A = = < / l a t e x i t >1.G off !G ⇤ 2. G ⇤ on !G t e x i ts h a 1 _ b a s e 6 4 = " m 5 K D l 3 e b h l b f 1 l c a t 8 z 7 O v F k a W A = " > A A A B + H i c b V A 9 S w N B E J 2 L X z F + 5 N T S Z j E I V u F O F C 2 D N h Y W E c w H J E f Y 2 2 w u S / Z 2 j 9 0 9 J R 7 5 J T Y W i t j 6 U + z 8 N 2 6 S K z T x w c D j v R l m 5 o U J Z 9 p 4 3 r d T W F l d W 9 8 o b p a 2 t n d 2 y + 7 e f l P L y s i Q 6 w w M T a r k g 3 B X 3 x 5 m T R P q / 5 5 1 b s 7 q 9 S u 8 j i K c A h H c A I + X E A N b q A O D S C Q w j O 8 w p v z 5 L w 4 7 8 7 H v L X g 5 D M H 8 A f O 5 w 9 B a p N 6 < / l a t e x i t >=)< l a t e x i t s h a 1 _ b a s e 6 4 = " l R z P B u V s a N a 9 X E g o U S R P PK d 0 5 u c = " > A A A C A H i c b V D L S g M x F M 3 U V 6 2 v q k s 3 w S K 4 K j O l q M u C G 5 c V7 A P a o W T S T B u a S Y b k j l C H b v w A t / o J 7 s S t f + I X + B t m 2 l n Y 1 g O B w z n 3 l R P E g h t w 3 W + n s L G 5 t b 1 T 3 C 3 t 7 R 8 c H p W P T 9 p G J Z q y F l V C 6 W 5 A D B N c s h Z w E K w b a 0 a i Q L B O M L n N / M 4 j 0 4 Y r + Q D T m P k R G U k e c k o g k / p A k k G 5 4 l b d O f A 6 8 X J S Q T m a g / J P f 6 h o E j E J V B B j e p 4 b g 5 8 S D Z w K 9 O Y 8 O + / O h / O 5 K C 0 4 e c 8 p W o L z 9 Q t F W 5 e Q < / l a t e x i t > t ⌦ < l a t e x i t s h a 1 _ b a s e 6 4 = " M d U g 0 C w 4 E + O m 9 b N u 7 3 2 7 l 3 z M + L k = " > A A A B / H i c b V D L S g M x F L 3 j s 9 b X a J d u g l V w V W a q o M u C G 5 c V 7 A P a U j J p p g 3 N J E O S E Y a h / o o b F 4 q 4 9 U P c + T d m 2 l l o 6 4 H A 4 Z x 7 u D c n i D n T x v O + n b X 1 j c 2 t 7 d J O e X d v / + D Q P T p u a 5 k o O e b w = = < / l a t e x i t > son < l a t e x i t s h a 1 _ b a s e 6 4 = " v 6 2 q B I e J J z C O 0 I z G j w j F A S d H4 9 I = " >A A A C E X i c b V D L S s N A F J 3 U V 6 2 v a J d u g k V w V Z I i 6 r L o x m U F + 4 A m h M l 0 0 g 6 d Z M L M j R h D v s I P c K u f 4 E 7 c + g V + g b 9 h 0 m Z h W w 9 c O J x z X x w v 4 k y B a X 5 r l b X 1 j c 2 t 6 n Z t Z 3 d v / 0 A / P O o p E U t C u 0 R w I Q c e V p S z k H a B A a e D S F I c e J z 2 v e l N 4 f c f q F R M h P e Q R N Q J 8 D h k P i M Y c s n V 6 7 Z i 4 w C 7 q Q 3 0 E V L h + 1 n m 6 g 2 z a c 5 g r B K r J A 1 U o u P q P / Z I k D i g I R C O l R p a Z g R O i i U w w m l W s 2 N F I 0 y m e E y H O Q 1 x Q J W T z p 7 P j N N c G R m + k H m F Y M z U v x M p D p R K A i / v D D B M 1 L J X i P 9 5 w x j 8 K y d l Y R Q D D c n 8 k B 9 z A 4 R R J G G M m K Q E e J I T T C T L f z X I B E t M I M 9 r 4 U q x O 8 J P o k j G W s 5 h l f R a T e u i 2 b o 7 b 7 S v y 4 y q 6 B i d o D N k o U v U Rr e o g 7 q I o A S 9 o F f 0 p j 1 r 7 9 q H 9 j l v r W j l T B 0 t Q P v 6 B Y F e n t c = < / l a t e x i t > s off < l a t e x i t s h a 1 _ b a s e 6 4 = " r e T q M U a 5 Y F B 8 J + + v S 3 K y S T u f 8I Q = " > A A A C A H i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s w U U R c u C m 5 c V r A P a I e S S T N t a C Y Z k o x Q h 2 7 8 A L f 6 C e 7 E r X / i F / g b Z t p Z 2 N Y D g c M 5 9 5 U T x J x p 4 7 r f T m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 t E w U o U 0 i u V S d A G v K m a B N w w y n n V h R H A W c t o P x b e a 3 H 6 n S T I o H M 4 m p H + G h Y C E j 2 G R S T 4 1 k v 1 x x q + 4 M a J V 4 O a l A j k a / / N M b S J J E V B j C s d Z d z 4 2 N n 2 J l G O F 0 W u o l m s a Y j P G Q d i 0 V O K L a T 2 e 3 T t G Z V Q Y o l M o + Y d B M / d u R 4 k j r S R T Y y g i b k V 7 2 M v E / r 5 u Y 8 N p P m Y g T Q w W Z L w o T j o x E 2 c f R g C l K D J 9 Y g o l i 9 l Z E R l h h Y m w 8 C1 u y 2 T F + k l O b j L e c w y p p 1 a r e Z b V 2 f 1 G p 3 + Q Z F e E E T u E c P L i C O t x B A 5 p A Y A Q v 8 A p v z r P z 7 n w 4 n / P S g p P 3 H M M C n K 9 f R i 2 X l w = = < / l a t e x i t > y X T S D p l M w s x E K a E P 4 c a F I m 5 9 H n d 9 C 9 9 A p 5 e F t v 4 w 8 P H / 5 z D n H D / h T G n b n l i 5 t f W N z X x h q 7 i 9 s 7 u 3 X z o 4 b K o 4 l Y S 6 J O a x b P t Y U c 4 E d T X T n L Y T S X H k c 9 r y w + t p 3 n q g a J 7 X n I u a f e e U 6 1 W Y q w D H c A J V c O A S 6 n A L D X C B Q A h P 8 A K v V m I 9 W 2 / W + 7 w 0 Z y 1 6 j u C P r I 8 f y 0 G S O w = = < / l a t e x i t > P (k)

< l a t e x i t s h a 1 _
b a s e 6 4 = " t 6 K W 2 4 K 6 Y k 3 w 2 5 K i 8 n 1 u m Y X 3 / q A = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y C J 4 C r u C 6 D H g x W M E 8 4 B k C b O T T j J k 9 u F M r x C W / I Q X D 4 p 4 9 X e 8 + T d O k j 1 o Y k F D U d V N d 1 e Q K G n I d b + d t f W N z a 3 t w k 5 x d 2 / / 4 L B 0 d N w 0 c a o F N k S s Y t 0 O u E E l I 2 y Q J I X t R C M P A 4 W t Y H w 7 8 1 t P q I 2 M o w e a J O i H f B j J g R S c r N S u d G m E x C u 9 U t m t u n O w V e L l p A w 5 6 r 3 S V 7 c f i z T E i I T i x n Q 8 N y E / 4 5 q k U D g t d l O D C R d j P s S O p R E P 0 f j Z / N 4 p O 7 d K n w 1 i b S s i N l d / T 2 Q 8 N G Y S B r Y z 5 D Q y y 9 5 M / M / r p D S 4 8 T M Z J S l h J B a L B q l i F L P Z 8 6 w v N Q p S E 0 u 4 0 N L e y s S I a y 7 I R l S 0 I X j L L 6 + S 5 m X V u 6 q 6 9 1 6 5 5 u Z x F O A U z u A C P L i G G t x B H R o g Q M E z v M K b 8 + i 8 O O / O x 6 J 1 z c l n T u A P n M 8 f W W q P c w = = < / l a t e x i t > y X T S D p l M w s x E K a E P 4 c a F I m 5 9 H n d 9 C 9 9 A p 5 e F t v 4 w 8 P H / 5 z D n H D / h T G n b n l i 5 t f W N z X x h q 7 i 9 s 7 u 3 X z o 4 b K o 4 l Y S 6 J O a x b P t Y U c 4 E d T X T n L Y T S X H k c 9 r y w + t p 3 n q g a J 7 X n I u a f e e U 6 1 W Y q w D H c A J V c O A S 6 n A L D X C B Q A h P 8 A K v V m I 9 W 2 / W + 7 w 0 Z y 1 6 j u C P r I 8 f y 0 G S O w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = "Q d L B V Q G K B V u 4 U S 1 K O W 0 n C n L Z p j s = " > A A A B 7 3 i c b Z A 7 S w N B F I X v x l e M j 0 Q t b Q Y T I T Z h N y A K N g E b y w j m A c k S Z i e z y Z D Z 2 X V m V g h r K n s L G w t F b C z 8 O 3 b + E G u d P A p N P D D w c c 6 9 z L 3 X i z h T 2 r Y / r d T S 8 s r q W n o 9 s 7 G 5 t Z 3 N 7 e z W V R h L Q m s k 5 K F s e l h R z g S t a a Y 5 b U a S 4 s D j t O E N z s d 5 4 4 Z K x U J x p Y c R d Q P c E 8 x n B G t j N Q t R U d 0 O j g q d X N 4 u 2 R O h R X B m k K 9 k v 7 / u 3 s 7 u q 5 3 c R 7 s b k j i g Q h O O l W o 5 d q T d B E v N C K e j T D t W N M J k g H u 0 Z V D g g C o 3 m c w 7 Q o f G 6 S I / l O Y J j S b u 7 4 4 E B 0 o N A 8 9 U B l j 3 1 X w 2 N v / L W r H 2 T 9 2 E i S j W V J D p R 3 7 M k Q 7 R e H n U Z Z I S z Y c G M J H M z I p I H 0 t M t D l R x h z B m V 9 5 E e r l k n N c s i + d f M W G q d K w D w d Q B A d O o A I X U I U a E O D w A E /w b F 1 b j 9 a L 9 T o t T V m z n j 3 4 I + v 9 B 3 w 3 k 1 A = < / l a t e x i t > p(s|k) < l a t e x i t s h a 1 _ b a s e 6 4 = " Q d L B V Q G K B V u 4 U S 1 K O W 0 n C n L Z p j s = " > A A A B 7 3 i c b Z A 7 S w N B F I X v x l e M j 0 Q t b Q Y T I T Z h N y A K N g E b y w j m A c k S Z i e z y Z D Z 2 X V m V g h r K n s L G w t F b C z 8 O 3 b + E G u d P A p N P D D w c c 6 9 z L 3 X i z h T 2 r Y / r d T S 8 s r q W n o 9 s 7 G 5 t Z 3 N 7 e z W V R h L Q m s k 5 K F s e l h R z g S t a a Y 5 b U a S 4 s D j t O E N z s d 5 4 4 Z K x U J x p Y c R d Q P c E 8 x n B G t j N Q t R U d 0 O j g q d X N 4 u 2 R O h R X B m k K 9 k v 7 / u 3 s 7 u q 5 3 c R 7 s b k j i g Q h O O l W o 5 d q T d B E v N C K e j T D t W N M J k g H u 0 Z V D g g C o 3 m c w 7 Q o f G 6 S I / l O Y J j S b u 7 4 4 E B 0 o N A 8 9 U B l j 3 1 X w 2 N v / L W r H 2 T 9 2 E i S j W V J D p R 3 7 M k Q 7 R e H n U Z Z I S z Y c G M J H M z I p I H 0 t M t D l R x h z B m V 9 5 E e r l k n N c s i + d f M W G q d K w D w d Q B A d O o A I X U I U a E O D w A E / w b F 1 b j 9 a L 9 T o t T V m z n j 3 4 I + v 9 B 3 w 3 k 1 A = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " F b Select < l a t e x i t s h a 1 _ b a s e 6 4 = "A u K 8 C h Z / m y Z 3 C R T G v 2 g X s Q e Z D c Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x V Y Q D y U R R I 8 F L x 4r 2 A 9 s Y 9 l s N + 3 S z S b s T o Q S + i + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O y u r a + s b m 4 W t 4 v b O 7 t 5 + 6 e C w a e J U M 9 5 g s Y x 1 O 6 C G S 6 F 4 A w V K 3 k 4 0 p 1 E g e S s Y 3 U z 9 1 h P 5 d g S y r S w t x I 2 p J o y t C E V b Q j e 4 s v L p H l R 9 S 6 r 7 p 1 X r r l 5 H A U 4 h h M 4 A w + u o A a 3 U I c G M F D w D K / w 5 h j n x X l 3 P u a t K 0 4 + c w R / 4 H z + A H e R k A 8 = < / l a t e x i t > θ * < l a t e x i t s h a 1 _ b a s e 6 4 = " A u K 8 C h Z / m y Z 3 C R T G v 2 g X s Q e Z D c Y = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x V Y Q D y U R R I 8 F L x 4 r 2 A 9 s Y 9 l s N + 3 S z S b s T o Q S + i + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O y u r a + s b m 4 W t 4 v b O 7 t 5 + 6 e C w a e J U M 9 5 g s Y x 1 O 6 C G S 6 F 4 A w V K 3 k 4 0 p 1 E g e S s Y 3 U z 9 1 h P 5 d g S y r S w t x I 2 p J o y t C E V b Q j e 4 s v L p H l R 9 S 6 r 7 p 1 X r r l 5 H A U 4 h h M 4 A w + u o A a 3 U I c G M F D w D K / w 5 h j n x X l 3 P u a t K 0 4 + c w R / 4 H z + A H e R k A 8 = < / l a t e x i t > θ t e x i t s h a 1 _ b a s e 6 4 = " q m + O T B c S S 9 O N 2 9 Y j p J I T M p 4 d H Z M = " > A A A I d 3 i c t V V P j 5 w 2 F C f J t J n Z / t t t j z k U d d M o B 4 o w A w w c V o r a H n p J u l W z S a R l t D K M G d A A p r b J z o T y s f p h q l 7 b D 9 F b n 2 0 m s + x W v d W X 5 / f 4 / d 4 / 4 + e k K Q s u H O f 3 e / c f T D 7 4 8 O F 0 d v T R x 5 9 8 + t n x y e e v O G 1 Z w j d v j F 3 N 6 9 g 7 g e 2 9 5 N 7 + u z p c J u m x i P j K + O p g Y y F 8 c z 4 w T g 3 L o x 0 8 t v k j 8 m f k 7 8 e / j 3 9 c v p k O m D v 3 x s 4 X x i j N U X / A B U k e T s = < / l a t e x i t >

Figure 2 :
Figure 2: Inference of transcriptional parameters using as input synthetic fluorescent signal data generated by SSA simulations of transcription and fluorescent tagging for 10 4 cells.a. Illustration of the delay telegraph model.The double horizontal line for nascent mRNA removal indicates this is a delayed reaction.b.Illustration showing promoter switching between two states, Pol II binding to the promoter in the ON state and subsequently undergoing productive elongation.Note that the nascent mRNA tail increases as Pol II approaches a stop (termination) sequence in the gene.As Pol II travels through the 14 repeats of the PP7 loops, the intensity of the mRNA increases due to fluorescent protein binding to the mRNA; intensity saturates as Pol II enters the GAL10 region.c.Illustration of the inference algorithm from mature and nascent mRNA data.The orange boxes are only applicable for the inference of the fluorescence signal intensity of nascent mRNAs.A large iteration step N max (≥ 10 4 ) is chosen as the termination condition for the optimizer.d.Distributions of total fluorescence intensity from synthetic data (red dots) fit using the inference algorithm with the delay telegraph model (blue) for 6 different parameter sets.e. Estimates using the inference algorithm with delay telegraph model for the six parameter sets in d.The unit of σ off , σ on and ρ is min −1 .Here the effective transcription rate ρ is defined by ρ f ON and the delay τ = 0.5 min.
inferred parameters using merged mature RNA data < l a t e x i t s h a 1 _ b a s e 6 4 = " M O f S e X N / j y D j P N m H b O z c M N w P n 5 A= " > A A A E O n i c b V N L b 9 Q w E E 6 z P N r l 1 c K R i 0 V L x Q U r z m Y 3 u + J S F Q l x K R S V P q R mt X I c Z 9 d q E k e 2 g y h h / w y / g x / A F W 5 c u S D E l R + A H Y d u H 8 x l X t 8 3 M x l P 4 j J j U n n e 9 y W 3 c + 3 6 j Z v L K 9 1 b t + / c v b e 6 d v 9 A 8 k o Q u k 9 4 x s V R j C X N W E H 3 F V M Z P S o F x X m c 0 c P 4 5 L n J H 7 6 j Q j J e v F W n J R 3 n e F q w l B G s d G i y 5 j 6 L Y j p l R a 2 w p s y 7 A 2 a k 2 M B F M z w r I D A t 9 i v r H u t A l 5 v x E r 0 a a z a D L e 7 h q H O h X G 8 D g j b + + t d 3 u a N l 5 6 D x y n j j I C Z 0 t 5 6 W z 6 + w 7 x P 3 k f n G / u t 8 6 n z s / O r 8 6 v y 3 U X W o 5 D 5 w L 0 v n z F / S f J b 0 = < / l a t e x i t > Merged off on t e x i t s h a 1 _ b a s e 6 4 = " 1 I U a P k B 4 X J D 9 h b K C C R b t x + 4 4 M w M = " > A A A G 3 H i c j Z T P j + M 0 F M e z U y g 7 5 d c M 3 O A S M c u K g y l J 0 + m P Q 6 W R k N g F a a R F y + y u N C k j x 3 l p r C a x s R 2 m 3 d A b N 8 S V f 4 S / h v + G 5 0 w 7 n a Q c s B T 5 P f v 7 P n b s 9 x z J j G v j e f 8 8 O u q 8 8 2 7 3 v c f H v f c / + P C j j 0 9 O P 3 m l R a k Y X D G R C f U m o h o y X s C V 4 S a D N 1 I B z a M M X k f L b + 3 8 6 1 9 B a S 6 K n 8 x a w j y n i 4 I n n F G D Q z e n R 3 + H E S x 4 U R m K I Z v r 5 / P e c c i g M K B 4 s U D 7 f r b M q N p U 2 W + M 2 W / T O z 4 O j Z C q z A D N Z 7 6 L 7 a n 7 J E y p q U K V i s 0 3 8 R M c + I 4 W w k 0 o M 0 K h 9 2 z w P 2 R h a N k s 5 7 G F V / 7 X I 7 v Y S z C u j 8 p R P z h 3 9 + 2 p 6 / e H o 7 2 3 k 5 3 3 z 9 u y 8 X j n 1 S t Y 5 e C / g d 6 0 C R z U w I n f k v m j A 2 B Q A / 1 x S x k E T W B Q A 0 e T t m x 6 A B z i x L A / G b W U A 6 8 J v J N N h + 2 T 8 R v A S 6 C F W y 8 9 D d r A 8 d 7 b y 4 b T N j B o A F + a u J 7 w + u N B Q + k 9 O B z c 4 b 0 s m L R k g / M G M I y E M S K / y 6 o Q i v g + 8 3 o 7 D 7 P 0 5 u T M 6 3 t 1 c w 8 N f 2 u c O d v 2 4 u a 0 c x L G g p U 5 J j b

Figure 3 :
Figure 3: Inference results using four mature mRNA data sets with sample sizes as 2333, 6366, 4550 and 3163 cells, respectively.a. Best fit distributions of mature mRNA data.b.The inferred transcriptional parameters (merged mature RNA data) and the burst size is computed as ρ/σ off .Note that the transcriptional parameters in the first 3 columns are normalised by the degradation rate.c.Inferred effective (normalised) production rate per gene copy ρ and Fano factors (variance of molecule numbers divided by the mean of molecule numbers) for the G1 and G2 cell cycle phases.
t e x i t s h a 1 _ b a s e 6 4 = " n e N f 1 Z j e 0 C t 6 c 5 6 d d + f D + R x Z 5 5 x x z y 6 a g P P 9 C 8 e V p T g = < / l a t e x i t > s on ( 1 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " r o H H j O K X O U S 3 d v 7 S T W e 1 y o d m u I M = " > A A A C I 3 i c b V A 5 T g M x F P W w 7 w Q o a S w i p F A Q z S C 2 E o m G M k g k Q c q E y O N 4 E g s v I / s P I o z m B J y D A 9 D C E e g Q D Q U 1 1 8 C T p G B 7 k q W n 9 / 7 m F y W C W / D 9 d 2 9 i c m p 6 Z n Z u f m F x a X l l t b S 2 3 r A 6 N

Figure 4 :
Figure 4:Inference from the distribution of the normalised intensity of the brightest nuclear spot (nascent mRNA data) constructed by merging all data or else specific to the cell cycle phases G1 and G2.a.Comparison of the three inferred parameter values using merged and cell-cycle specific data.The bar graph shows the mean and the standard deviation of the estimates calculated over the four data sets.b.Best fit distributions for G1 and G2 (top row and middle row respectively) and merged data (bottom row) for data sets 1-4 (from left to right).c.Normalised ACF plots of merged and cell-cycle specific data (middle and left) and their residuals (right).The ACF plots are generated by stochastic simulations using estimated parameters from merged and cell-cycle specific nascent mRNA data for each of the four data sets; these are compared with the ACF measured directly using live-cell data in[6] (green line).We also compare the sum of squared ACF residuals of merged and cell-cycle specific data from each dataset (this is the sum of squared deviations between the measured and estimated normalised ACF where the sum is calculated over all time points).
t e x i t s h a 1 _ b a s e 6 4 = " 8 u 2 U w t k d 9 c u K F X t T q 2 h s TQ s A L X A = " > A A A C D X i c b Z C 7 T s M w F I a d c i v l V m B k i S h I Z a B K E A L G C h b G I t G L 1 I b K c Z 3 W q u 1 E 9 g m i i v I C L L w K C w M I s b K z 8 T a 4 b Q Z o + S V L n / 5 z j o 7 P 7 0 e c a X C c b y u 3 s L i 0 v J J f L a y t b 2 x u F b d 3 G j q M F a F 1 E v J Q t X y s K W e S1 o E B p 6 1 I U S x 8 T p v + 8 G p c b 9 5 T p V k o b 2 E U U U / g v m Q B I x i M 1 S 0 e d D T r C 9 x N O k A f I A m D I E 3 L U x Z M p n f J s Z s e d Y s l p + J

1 )
< l a t e x i t s h a 1 _ b a s e 6 4 = " j G Q 6 X 5 Z t y w C m z L q a H U / W c i iI c C s = " > A A A C D H i c b V D L S s N A F J 3 4 r P V V d e k m W I S 6 s C Q i 6 r L o x m U F + 4 A m l s l 0 0 g 6 d m Y S Z G 7 G E f I A b f 8 W N C 0 X c + g H u / B u n b R b a e m D g c M 6 5 3 L k n i D n T 4 D j f 1 s L i 0 v L K a m G t u L 6 x u b V d 2 t l t 6 i h R h D Z I x C P V D r C m n E n a A A a c t m N F s Q g 4 b Q X D q 7 H f u q d Ks 0 j e w i i m v s B 9 y U J G M B i p W y p 7 m v U F 7 q Y e 0 A d I I 5 l l l S k V T G Z 3 6 b G b H Z m U U 3 U m s O e J m 5 M y y l H v l r 6 8 X k Q o z p q I I I y 9 I R e 0 K v 1 a D 1 b b 9 b 7 t L V g z W Z 2 0 S 9 Y H 9 8 M 6 Z W K < / l a t e x i t > ⇢(min 1 ) t e x i t s h a 1 _ b a s e 6 4 = " h Z w y x n M R S P z w L L w 0 6 t L / 7 i I yb 8 M = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i y e t Y D 8 g D W W z 3 b R L N 7 t h d y K U k J / h x Y M i X v 0 1 3 v w 3 b t s c t P X B w O O 9 G W b m h Y n g B l z 3 2 y m t r K 6 t b 5 Q 3 K 1 v b O 7 t 7 1 f 2 D t l G p p q x F l V C 6 G x L D B J e s B Rw E 6 y a a k T g U r B O O b 6 Z + 5 4 l p w 5 V 8 h E n C g p g M J Y 8 4 J W A l P + p n P R 1 n 9 3 d 5 3 q /

Figure 5 :
Figure 5: Inference results using rejection method.a.Estimated parameters (mean values and standard deviation error bars) by rejecting the first k bins with k = 1, 2, 3, 4 where the fraction of ON time f ON = σ on /(σ off + σ on ).b. Corresponding distributions for G1 (top row) and G2 (bottom row) with the rejection method (only the distributions for k = 4 are shown).The estimated parameters are listed in in SI Table5.c. Normalised autocorrelation function (ACF) predicted by stochastic simulations using the estimated parameters (for k = 4) for each of the four data sets versus that measured directly using live-cell data (green line).Inset shows the sum of squared residuals of the ACF plots.
o z p q I I I y 9 I R e 0 K v 1 a D 1 b b 9 b 7 t L V g z W Z 2 0 S 9 Y H 9 8 M 6 Z W K < / l a t e x i t > ⇢(min 1 ) t e x i t s h a 1 _ b a s e 6 4 = " h Z w y x n M R S P z w L L w 0 6 t L / 7 i I yb 8 M = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i y e t Y D 8 g D W W z 3 b R L N 7 t h d y K U k J / h x Y M i X v 0 1 3 v w 3 b t s c t P X B w O O 9 G W b m h Y n g B l z 3 2 y m t r K 6 t b 5 Q 3 K 1 v b O 7 t 7 1 f 2 D t l G p p q x F l V C 6 G x L D B J e s B Rw E 6 y a a k T g U r B O O b 6 Z + 5 4 l p w 5 V 8 h E n C g p g M J Y 8 4 J W A l P + p n P R 1 n 9 3 d 5 3 q / G b A 8 6 L 8 + 5 8 z F t L T j F z i P 7 A + f w B o S O R f g = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " D 4 F j e v 5 D N p Z g e I H C z / b c A m x a g E 0 = " > A A A H z X i c b V T N j t s 2 E F Y S t 8 4 6 b b N p j r 0I 3 T Y o C q 0 g W b J l t z A Q t E C b y z Y p 0 k 0 C r I w F J V M 2 Y U l U S S p r h 1 G v f Z G + V N + m Q 1 L + 0 W 5 5 m S F n v p n h R 8 4 k V U 6 4 8 L x / 7 9 1 / 0 P v k 0 / 7 D k 8 G j z z 7 / 4 v H p k y / f c F q z F F + m N K f s X Y I 4 z k m J L w U R O X 5 X M Y y K J M d v k / X P y v 7 2 P W a c 0 P I P s a 3 w v E D L k m Q k R Q K O r p 8 8 + C d O 8 J K U U i C A N F c v 5 o O T O M W l w I y U S 9 D 3 1 j p H r J H p x z x V q x m c n M S C V q z O M a i / 1 C r D s 2 f 2 N 3 b M y b J A 1 z I W e C M k z b K m g d O D x d 6 b y o O F r a i t 9 Z 9 q x o X 9 m n z A 2 p C B M y v k y 9 + U a x z b K m t a k I V K K / 3 z S J d R 1 L k g j N 7 I c S O / b + S v v j q 1 A f 8 a C 9 s H 6 b n j K Y i h G w U g g r E b e C B H Q 3 c y 0 t a J Z 8 e x w h j I E K T v e s o W u J 5 C B l M T I Y j c y A S M R h 1 I Y L I E O k u g k S M 3 C l W W s R t N D G T a g Y Q m t Y F E O n r o j p V r 6 L u j y E C i P e Q C o 9 L e V a s h o b n L W G 3 D y B 2 a G i D Z P o t Y 2 A b i T 7 U I f I 3 0 I 3 U 6 d b 3 Q W L 2 h b T B d a u 3 / 4 3 b Y H G 5 g m B i N N V 1 B a G 4 Q q s u P W / q g n J b a A 7 N A k 0 F M x i 1 N q v B R 6 P o t s + E x o i U 2 0 o h I 4 Y N A B Q V E m 9 p T x 0 c I w + t 0 a J 5 C P / i O 3 5 E b T m 4 h D r T q o F C V b + 7 h R y a H / i q d H A d W z R V D X w O H m k 2 / f R 9 g 1 W 9 J T a g Q t D C N E u N y s W 8 m 2 O Y o w b n p v f M S c d V 4 5 5 n u p W a w 8 4 W 2 v D 4 9 g 4 B 6 2 X c V v 1 X O r H a 9 g r 4 + j R c 0 r Q s I m O a I 8 y s U V q j C z P H 9 S s w l Y o K k E H g Q 1 x x X K F 2 j J Z Z L T A s s 2 B Z O d 6 r c w e y K M s E Q E Y 5 d I A Y j Y e Z O S N n F o 4 I X S K w c k G J V K J H R U n C l 8 G 2 R O E l S O L i o V v j P L q 5 C D K l R 5 y w Z q l Y k 5 Q 6 u e E a W u / 1 G 7 W H W L D J H x W c 8 4 9 0 A W U 6 R c P Q w d I D m A t R C 3 d x p m d 4 6 K R H Y S S h d w 8 k t 8 O 6 H O 1 q B I F 2 z K m V X D 6 8 T E D V T v H E s U l q r O S k v 0 O Y C C U Y 2 M G 5 5 I w O v E + F K 1 w X z e c 1 n g t V 4 L l c w h x n D G b h p F U L V l a x Z r j 1 n S V 5 j R / k f b V X 9 x 9 t u y G b w r X 2 c s a Q C w j s g F G 4 u + Y r e r P H 2 9 s W B y w p 9 o N 1 q 9 Y d z N u m C o Z u 5 3 O h M z a D 7 z r V Y J f m 6 i 1 N P P c s o F Z A V c 5 j f j v 7 b + j j J 5 j J F l d D f u h O K l A t 4 p 4 z A 0 F c f f o G z m G F 4 a w Y e z B h l v F d n s V j hw x Z c c 7 S 5 8 5 f 4 m l R w e D s O 9 J B / u 2 P u K m 9 g P I 7 d 8 P f h 2 f O w 7 a a H 1 l f W 1 9 Z 3 l m 9 F 1 n P r h f X K u r T S 3 q O e 3 / u h 9 2 P / Z b / u f + z / Z V z v 3 2 s x T 6 3 O 6 v / 9 H 7 y J S + w = < / l a t

Figure 6 :
Figure 6: Inference results using the fusion method.a.Estimated parameters (mean values and standard deviation error bars) by combining the first k bins with k = 1, 2, 3, 4 where the fraction of ON time f ON = σ on /(σ off + σ on ).b. Corresponding distributions for G1 (top row) and G2 (bottom row) with the fusion method (only the distributions for k = 4 are shown).The red bar represents the combined bin 0-3 when k = 4. c.Normalised autocorrelation function (ACF)predicted by stochastic simulations using the estimated parameters (for k = 4) for each of the four data sets versus that measured directly using live-cell data (green line).Inset shows the sum of squared residuals of the ACF plots.d.Estimated parameters of cell cycle specified data and merged data of nascent mRNAs with fusion method with k = 4 (fusing bins 0-3).These correspond to the fitted distributions in b.The elongation time τ is fixed to 0.785 mins.See the inferred parameters in SI Table6for all other values of k.

Table : the inferred parameters using synthetic mature RNA data
w e H B 5 O v p n 8 O P l l 8 r y H P n w w + H y t j b 7 J y 3 8 B Z i 5 9 B A = = < / l a t e x i t >

Table 1 :
Estimated parameters from the distribution of the normalised intensity of the brightest nuclear spot (nascent mRNA data) constructed by merging all data or else specific to the cell cycle phases G1 and G2.The elongation time τ is estimated to be 0.785 mins, based on measurements of the elongation speed.

Table: the inferred parameters using nascent RNA data with fusion method
e x i t >