Pathway dynamics can delineate the sources of transcriptional noise in gene expression

Single-cell expression profiling opens up new vistas on cellular processes. Extensive cell-to-cell variability at the transcriptomic and proteomic level has been one of the stand-out observations. Because most experimental analyses are destructive we only have access to snapshot data of cellular states. This loss of temporal information presents significant challenges for inferring dynamics, as well as causes of cell-to-cell variability. In particular, we typically cannot separate dynamic variability from within cells (‘intrinsic noise’) from variability across the population (‘extrinsic noise’). Here, we make this non-identifiability mathematically precise, allowing us to identify new experimental set-ups that can assist in resolving this non-identifiability. We show that multiple generic reporters from the same biochemical pathways (e.g. mRNA and protein) can infer magnitudes of intrinsic and extrinsic transcriptional noise, identifying sources of heterogeneity. Stochastic simulations support our theory, and demonstrate that ‘pathway-reporters’ compare favourably to the well-known, but often difficult to implement, dual-reporter method.


Introduction
Noise is a fundamental aspect of every cellular process Shahrezaei and Swain, 2008b. Frequently it is even of functional importance, for example in driving cell-fate transitions. Sometimes it can afford evolutionary advantages, for example, in the context of bet-hedging strategies. Sometimes, it can be a nuisance, for example, when it makes cellular signal processing more difficult. But noise is nearly ubiquitous at the molecular scale, and its presence has profoundly shaped cellular life. Analysing and understanding the sources of noise, how it is propagated, amplified or attenuated, and how it can be controlled, has therefore become a cornerstone of modern molecular cell biology.
Noise arising in gene expression has arguably attracted most of the attention so far (but see e.g. Filippi et al., 2016 andJetka et al., 2018 for the analysis of noise at the signalling level). Generally speaking, gene expression noise is separable into two sources of variability, as pioneered by Swain et al., 2002. Intrinsic noise is generated by the dynamics of the gene expression process itself. The process, however, is often influenced by other external factors, such as the availability of promoters and of RNA polymerase, the influence of long noncoding RNA as a transcriptional regulator Goodrich and Kugel, 2006, as well as differences in the cellular environment. Such sources of variability contribute extrinsic noise, and reflect the variation in gene expression and transcription activity across the cell population. As such, understanding extrinsic noise lies at the heart of understanding cell-population heterogeneity.
So far, identifying the sources of gene expression noise from transcriptomic measurements alone has proven difficult Paulsson, 2004;Pedraza and Paulsson, 2008. The fundamental hindrance lies in the fact that single-cell RNA sequencing, which provides most of the available data, is destructive, so that datasets reflect samples from across a population, rather than samples taken repeatedly from the same cell. As temporal information is lost in such measurements Komorowski et al., 2011, it may be impossible to distinguish temporal variability within individual cells (e.g. burstiness), from ensemble variability across the population (i.e. extrinsic noise). A number of numerical and experimental studies have suggested this confounding effect Jones and Elf, 2018;Jones et al., 2014;Zopf et al., 2013, showing that systems with intrinsic noise alone exhibit behaviour that is indistinguishable from systems with both extrinsic and intrinsic noise. This is examined more formally in Ham et al., 2020a, where we show that the moment scaling behaviour and transcript distribution may be indistinguishable from situations with purely intrinsic noise. The limitations in inferring dynamics from population data are becoming increasingly evident, and a number of studies that seek to address some of these problems have emerged Skinner et al., 2016;Gorin and Pachter, 2020a.
Here we provide adetailed analysis of the extent to which sources of variability are identifiable from population single-cell omics data. We are able to prove rigorously that it is in general impossible to identify the sources of variability, and consequently, the underlying transcription dynamics, from observed transcript abundance distributions alone. Systems with intrinsic noise alone can always present identically to similar systems with extrinsic noise. For practical purposes, the effect does not appear to depend on the precise choice of distribution, but holds more generally. Moreover, we demonstrate that extrinsic noise invariably distorts the apparent degree of burstiness of the underlying system: data which seems 'bursty' is not necessarily generated by a bursty process, if there is cell-to-cell variability across the population. This is a stronger non-identifiability result than has previously been obtained Ingram et al., 2008;Shahrezaei and Swain, 2008a;Khammash, 2009;Munsky et al., 2009;Komorowski et al., 2011, and has important ramifications for our analysis of experimental data: we cannot assess causes of transcriptional variability, if we do not simultaneously assess cell-to-cell variability in the transcriptional machinery. Our results highlight (in fact prove eLife digest In biology, seemingly random variation within or between cells can have significant effects on a number of cellular processes, like how cells divide and develop. For example, how often a gene is switched on, or 'expressed', can randomly fluctuate over time. This 'noise' may lead to a cell having slightly more of a particular molecule, causing it to behave differently to other cells in the population. However, it is currently unclear how this random variation is created and controlled in cells, and what effect this has on biological systems as a whole.
When a gene is expressed, its sequence typically gets copied in to a molecule called mRNA, which is then processed and used to build the protein encoded by the gene. By measuring the levels of mRNA molecules in individual cells, researchers have been able to investigate how gene expression varies within populations. These experiments are carried out on dead cells at a single point in time, and mathematical models are then applied to detect noise in the molecular data.
This approach, however, precludes how noise changes over time, making it difficult to determine the source of cell-to-cell variability. In particular, whether the variation detected is the result of genuine random molecular changes (intrinsic noise), or external factors -such as temperature and pH -fluctuating in the cells environment (extrinsic noise).
Here, Ham et al. have built on previous mathematical models to identify a new approach for investigating the source of molecular noise. They found that for any given gene it is impossible to understand what causes its activity levels to vary just from data on its mRNA levels. Instead, information on other molecules that are affected by expression of the gene (termed 'pathway reporters') can provide a clearer picture of whether molecular variability is the result of intrinsic or extrinsic noise.
The mathematical models developed by Ham et al. reveal what can and cannot be learned about noise from gene expression data. Furthermore, pathway-reporters are easier to measure experimentally than other reporters that are typically used to study the origins and effects of cell-tocell variability. These findings could help researchers design single cell experiments that are better for studying noise, leading to a deeper understanding of how different types of variation impact cell biology. mathematically) the requirement for additional information, beyond the observed copy number distribution, in order to constrain the space of possible dynamics that could give rise to the same distribution.
This seemingly intractable problem can at least partially be resolved with a brilliantly simple approach: the dual-reporter method Swain et al., 2002. In this approach, noise can be separated into extrinsic and intrinsic components, by observing correlations between the expression of two independent, but identically distributed fluorescent reporter genes. Dual-reporter assays have been employed experimentally to study the noise contributed by both global and gene-specific effects Elowitz et al., 2002;Raser and O'Shea, 2005;Raj et al., 2006. A particular challenge, however, is that dual reporters are rarely identically regulated Raj et al., 2006;Quarton et al., 2020, and are not straightforward to set up in every system. More recently, it has been shown that the independence of dual reporters cannot be guaranteed in some systems Naik et al., 2021. As a result, there have been efforts in developing alternative methods for decomposing noise Quarton et al., 2020;Singh, 2014;Lin and Amir, 2021. Here we develop a widely applicable generalisation (and simplification) of the original dual-reporter approach Swain et al., 2002. We demonstrate that non-identical and not-necessarily independent reporters can provide an analogous noise decomposition. Our result shows that measurements taken from the same biochemical pathway (e.g. mRNA and protein) can replace dual reporters, enabling the noise decomposition to be obtained from a single gene. This completely circumvents the requirement of conditionally independent and identically regulated reporter genes. The results obtained from our 'pathway-reporter' method are also borne out by stochastic simulations, and compare favourably with the dual-reporter method. In the case of constitutive expression, the results obtained from our decomposition are identical to those obtained from dual reporters. For bursty systems, we show that our approach provides a satisfactorily close approximation, except in extreme cases. Our methodology is verified mathematically for the most common models of gene transcription, but holds independently of the specific nature of the gene expression dynamics, as we verify in silico across a range of more detailed models.

Materials and methods
A simple model for stochastic mRNA dynamics is the Telegraph model: a two-state model describing promoter switching, transcription, and mRNA degradation. In this model, all parameters are fixed, and gene expression variability arises due to the inherent stochasticity of the transcription process. As discussed above, this process will often be influenced by extrinsic sources of variability, and so modifications to account for this additional source of variability are required.

The telegraph model
The Telegraph model was first introduced in Ko, 1991, and has since then been widely employed in the literature to model bursty gene expression in eukaryotic cells Bahar Halpern et al., 2015;So et al., 2011;Suter et al., 2011;Larsson et al., 2019. In this model, the gene switches probabilistically between an active state and an inactive state, at rates l (on-rate) and m (off-rate), respectively. In the active state, mRNAs are synthesised according to a Poisson process with rate K, while in the inactive state, transcription does either not occur, or possibly occurs at some lower Poisson rate, K 0 ( K. Degradation of mRNA molecules occurs independently of the gene promoter state at rate d. Figure 1A shows a schematic of the Telegraph model. Throughout the discussion here, we will rescale all parameters of the Telegraph model by the mRNA degradation rate, so that d ¼ 1. The steady-state distribution for the mRNA copy number can be explicitly calculated as Peccoud and Ycart, 1995,p T ðn; Þ ¼ K n l ðnÞ n!ð þ lÞ ðnÞ 1 F 1 ðl þ n; l þ þ n; ÀKÞ: (1) Here, q denotes the parameter vector ð; l; K; dÞ, the function 1 F 1 is the confluent hypergeometric function Abramowitz and Stegun, 1965, and, for real number x and positive integer n, the notation x ðnÞ abbreviates the rising factorial of x (also known as the Pochhammer function). Throughout, we refer to the probability mass functionp T ðn; Þ as the Telegraph distribution with parameters q.
Constitutive gene expression is a limiting case of the Telegraph model, which arises when the offrate m is 0, so that the gene remains permanently in the active state. In this case, the Telegraph distribution simplifies to a Poisson distribution with rate K; the distribution PoisðKÞ.
At the opposite extreme is instantaneously bursty gene expression in which mRNA are produced in very short bursts with potentially prolonged periods of inactivity in between. This mode of gene expression has been frequently reported experimentally, particularly in mammalian genes Raj et al., 2006;Bahar Halpern et al., 2015;Suter et al., 2011;Larsson et al., 2019. Transcriptional bursting may be treated as a limit of the Telegraph model, where the off-rate, m, tends to infinity, while the on-rate l remains fixed. In this limit, it can be shown Jones and Elf, 2018;Ham et al., 2020a that the Telegraph distribution converges to the negative binomial distribution NegBinðl; K þK Þ.

The compound distribution
We can account for random cell-to-cell variation across a population by way of a compound distribution Ham et al., 2020bq wherepðn; Þ is the stationary probability distribution of a system with fixed parameters q and f ð; hÞ denotes the multivariate distribution for q with hyperparameters h. Often we will takepðn; Þ to be the stationary probability distribution of the Telegraph model ((1)), and refer to (2) as the compound Telegraph distribution. Sometimespðn; Þ will be the Poisson distribution or the negative binomial distribution, depending on the underlying mode of gene activity. Figure 1B gives a pictorial representation of the compound distribution.
The compound distribution is valid in the case of ensemble heterogeneity, that is, when parameter values differ between individual cells according to the distribution f ð; hÞ, but remain constant over time Filippi et al., 2016. This model is also a valid approximation for individual cells with dynamic parameters, provided these change sufficiently slower than the transcriptional dynamics Lenive et al., 2016. In general, the compound Telegraph distributionqðn; hÞ will be more dispersed representing the state of the gene. Transitions between the states A and I occur stochastically at rates m and l, respectively. The parameter K is the mRNA transcription rate, and d is the degradation rate. (B) The compound model incorporates extrinsic noise by assuming that parameters q of the Telegraph model vary across an ensemble of cells, according to some probability distribution f ð; hÞ. (C) Variation in the parameters across the cell population leads to greater variability in the mRNA copy number distribution. than a Telegraph distribution to account for the uncertainty in the parameters; see Figure 1C. Such dispersion is widely observed experimentally, and as demonstrated in Ham et al., 2020a, reflects the presence of extrinsic noise.
In the context of gene expression, it has been shown experimentally that some of the primary sources of extrinsic noise have an autocorrelation time comparable to the cell cycle Rosenfeld et al., 2005. It is these slow changes in variability that justify the assumptions of the compound model. Typical sources of extrinsic variability for each parameter in the Telegraph model are summarised in Table 1 of Ham et al., 2020a. A further significant source of heterogeneity arises from the differences in cell-cycle phases across the population Skinner et al., 2016. Such effects have been shown to obscure the precise underlying transcriptional dynamics Zopf et al., 2013;Huh and Paulsson, 2011, and impede the inference of transcriptional parameters from experimental data Beentjes et al., 2020. The compound model we consider here is able to capture this variability, provided that the parameter change within each cell phase is relatively slow, and any dynamic parameter changes during the transition between phases can be considered as ephemeral. A more explicit treatment of the mechanisms and changes during the cell cycle is challenging to study analytically, and theoretical modelling is only in its infancy Beentjes et al., 2020;Perez-Carrasco et al., 2020. Later, we verify our proposed noise decomposition on detailed models of gene transcription, incorporating salient features of the cell-cycle, such as gene replication, dosage compensation, binomial partitioning of products due to cell division, and cell-cycle length variability.

Identifiability considerations
Decoupling the effects of extrinsic noise from experimental measurements has been notoriously challenging. In the context of (2), the distribution f ð; hÞ reflects the population heterogeneity, but experimental data provides samples only ofqðn; hÞ. How much can we deduce of the underlying dynamics (that is,pðn; ÞÞ, and the population heterogeneity (f ð; hÞ), from measurements of transcripts from across the cell population (qðn; hÞ)?
Of course, even though we may be able to infer the parameter h from experimental data, the expressionpðn; Þ is really a family of distributions, parameterised by q. This presents two fundamental challenges. The first is the possibility that there are different families of distributionspðn; Þ that can yield the same compound distribution,qðn; hÞ, but which are generated by different mechanisms, that is noise distributions, f ð; hÞ. The second, perhaps more subtle, challenge is that, even for a fixed family of distributionspðn; Þ it may be possible that different choices of the noise distribution f ð; hÞ could still yield the same compound distributionqðn; hÞ. In these situations, we cannot hope to infer a unique solution for the noise distribution. This belongs to the important class of identifiability problems Villaverde, Villaverde and Banga, 2017; it has important ramifications for the Table 1. Summary of the non-identifiability results. in lines 1, 3, and 5 are our contributions, while the remaining representations (lines 2 and 4) are known and can be obtained as special cases of our results. Note that here we use Teleðl; ; KÞ to denote a Telegraph distribution with parameters l; ; K. In lines 3 and 4, the parameter b>0 can be chosen freely and determines the mean burst intensity in the resulting compound system. In line 5, the parameters b; >0 are again mean burst intensities, and b can be chosen freely in the determination of the distribution of q.
interpretability of parameter estimates obtained from experimental data Ingram et al., 2008. Indeed, if two or more model parameterisations are observationally equivalent (in this case, in the form of the transcript abundance distributionqðn; hÞ), then not only does this cast doubt upon the suitability of the model to represent (and subsequently predict) the system, but it also obstructs our ability to infer mechanistic insight from experimental data. An example of the first identifiability problem arises from a well-known example of a compound distribution, (2): when f ð; hÞ is a gamma distribution andpðn; Þ is a Poisson distribution, corresponding to constitutive gene expression, the resulting compound distributionqðn; hÞ is a negative binomial distribution Greenwood and Yule, 1920. But this is the same distribution as that arising from instantaneously bursty gene expression Ham et al., 2020a. Such identifiability instances may be circumvented if there is confidence in the basic mode of gene activity, that is, if there is reason to believe that a gene is not constitutively active, for example. We find, however, that there are numerous instances that can present insurmountable identifiability problems.

Bursty gene expression
We first observe that any Telegraph distribution with fixed parameters can be identically obtained from a Telegraph distribution with parameter variation. As shown in the supplementary material (Appendix Pathway dynamics can delineate the sources of transcriptional noise in gene expression), any Telegraph distributionp T ðn; l; 0 ; K 0 Þ can be written as, where < 0 and f K 0 ðt; l þ ; 0 À Þ is the probability density function for a scaled beta distribution Beta K ðl þ ; 0 À Þ with support ½0; K 0 . Thus, any Telegraph distribution can be obtained by varying the transcription rate parameter on a narrower Telegraph distribution (i.e. with a smaller off-rate) according to a scaled beta distribution. Figure 2A

Instantaneously bursty gene expression
The previous result extends to instantaneously bursty systems. The copy number distribution of an instantaneously bursty system can be obtained from both bursty and instantaneously bursty dynamics, provided that there is appropriate parameter variation. The supplementary material contains the relevant derivations; refer to Appendix Pathway dynamics can delineate the sources of transcriptional noise in gene expression. In the following, we letp NB ðn; r; bÞ denote the probability mass function of a NegBinðr; b bþ1 Þ distribution, where b>0. Then for any negative binomial distribution of the form NegBinðl; b bþ1 Þ we have,p NB ðn; l; bÞ ¼ where f ðt; l þ ; bÞ is the probability density function of a Gammaðl þ ; bÞ distribution. This result generalises the aforementioned well-known representation of the negative binomial distribution Greenwood and Yule, 1920, which corresponds to the particular case of ¼ 0. In Figure 2A (middle panel), we compare the representation obtained in (4) with the corresponding fixed-parameter negative binomial distribution for two different sets of parameters.
We also obtain the following representation for a negative binomial distribution in terms of a scaled beta prime distribution, where f b ðb À 1; l À l 0 ; l 0 Þ is the probability mass function of a scaled beta prime BetaPrime b ðl À l 0 ; lÞ distribution, where b>0 and l>l 0 . This equivalently corresponds to scaled beta noise Beta b ðl À l 0 ; l 0 Þ on the inverse of the expected burst intensity. Thus, the distribution of any instantaneously bursty system with mean burst intensity b can be obtained from one with greater burst frequency, by varying the mean burst intensity q according to a shifted beta prime distribution. Figure 2A (bottom panel), compares the representation obtained in (5) with the associated fixed-parameter negative binomial distribution for two different sets of parameters.

An exception: constitutive expression
It has long been known Feller, 1943 that a compound Poisson distribution uniquely determines the compounding distribution. In the context of (2), this means the full extrinsic noise distribution f ð; hÞ is identifiable fromqðn; hÞ. As we will demonstrate in related work (Ham et al., in preparation) , it is therefore possible to extract the extrinsic noise distribution, f ð; hÞ, from transcript copy number measurements.

Implications for parameter inference
Estimates of kinetic parameters from experimental data suggest that gene expression is often either bursty or instantaneously bursty (i.e., ) l). In turn, the assumption that gene-inactivation events occur far more frequently than gene-activation events is often used to derive other models of stochastic mRNA dynamics Jia and Grima, 2020;Beentjes et al., 2020. The representations given in (3 -5), however, show that both estimating parameters and the underlying dynamics from the form of the copy number distribution alone can be misleading. Noise on the transcription rate will invariably produce copy number data that is suggestive of a more bursty model. To illustrate this, consider an example in which the underlying process is a (mildly) bursty Telegraph system with distributionp T ðn; 2; 3; KÞ. Now assume that noise on the transcription rate parameter K follows the scaled Beta distribution on the interval ½0; 100 with a ¼ l þ ¼ 2 þ 3 ¼ 5 and b ¼ 0 À ¼ 12 À 3 ¼ 9. The shape of this noise distribution closely resembles a slightly skewed Gaussian distribution, with the majority of transcription rates between around 10 and 60. This noise on the transcription rate K within the Telegraph systemp T ðn; 2; 3; KÞ will present identically to the significantly burstier systemp T ðn; 2; 12; 100Þ. It is of practical importance to recognise that, while the non-identifiability results (summarised in Table 1) are dependent on specific noise distributions, for practical purposes any similar distribution will produce a similar effect. To demonstrate this, we replace the various noise distributions required for the representations in (3 -5), with suitable normal distributions truncated at 0. In each case, we sample 1000 data points from the corresponding compound distribution, and compared this with the associated fixed-parameter copy number distribution. The results are shown in Figure 2B. The truncated normal distribution is not chosen on the basis of biological relevance, but rather to demonstrate that even a symmetric noise distribution (except for truncation at 0) produces qualitatively similar results to the distributions used in the precise non-identifiability results. In every case, the effect of varying the transcription rate or burst intensity parameter according to a unimodal noise distribution is to produce copy number distributions that are generally consistent with systems that appear burstier.
Finally, we note that our results explain previous empirical observations that static measurements of mRNA are not always sufficient to infer the underlying dynamics of gene activity. Skinner et al., 2016 address some of these limitations by quantifying both nascent and mature mRNA in individual cells, as well incorporating cell-cycle effects into their analysis of two mammalian genes. A more developed treatment of model identifiability is given in Gorin and Pachter, 2020a, where it shown how a stochastic model incorporating the downstream processing of mRNA can be used to distinguish a particular instance of non-identifiability. More specifically, the authors consider the non-identifiability problem noted in Ham et al., 2020a, arising from the Gamma-Poisson compound representation of the negative binomial distribution; a particular case of (4) above. Despite identical distributions at the nascent level, the marginal distributions of the processed (mature) mRNA are found to be substantially different. It is likely that a similar analysis will be valuable in the context of the other identifiability problems we have given in Table 1. In the next section, we take a more general approach to resolving non-identifibiality and exploit the properties of complex gene expression dynamics to determine not only the presence of extrinsic noise, but also estimate its magnitude.

Resolving non-identifiability
The results of the previous section show that additional information, beyond the observed copy number distribution, is required to constrain the space of possible dynamics that could give rise to the same distribution. One way to narrow this space of possibilities, is to determine the intrinsic and extrinsic contributions to the total variation in the system.

The dual-reporter method
The total gene expression noise, as measured by the squared coefficient of variation h 2 , can be decomposed exactly into a sum of intrinsic and extrinsic noise contributions Swain et al., 2002. The decomposition applies to dynamic noise Hilfinger and Paulsson, 2011, and generalises to higher moments in Hilfinger et al., 2012. Sets of dual reporters at multiple levels of the transcriptional pathway has been shown to achieve a finer breakdown of noise into subcategories Bowsher and Swain, 2012. As shown in Hilfinger and Paulsson, 2011, the noise decomposition is equivalent to the normalised Law of Total Variance (Ross, 2014). Indeed, if X is the random variable denoting the number of molecules of a certain species (e.g. mRNA or protein) in a given cell, then we can decompose the total noise by conditioning X on the state Z ¼ ðZ 1 ; . . . ; Z n Þ of the environmental variables Z 1 ,. . .,Zn, It has been shown Swain et al., 2002;Hilfinger and Paulsson, 2011 that if X 1 and X 2 are random variables denoting the expression levels of independent (conditional on Z) and identically distributed gene reporters, then the extrinsic noise contribution h 2 ext in (6) can be identified by the normalised covariance between X 1 and X 2 , Decomposing noise with non-identical reporters The dual-reporter method requires distinguishable measurements of transcripts or proteins from two conditionally independent and identically distributed reporter genes integrated into the same cell. In practice, however, dual reporters rarely have identical dynamics, which is widely considered to be a significant challenge to interpreting experimental results Quarton et al., 2020. We show that, under certain conditions, the decomposition in (6) can alternatively be obtained from non-identically distributed and not-necessarily independent reporters. Our result relies on the observation that the covariance of any two variables can be decomposed into the expectation of a conditional covariance and the covariance of two conditional expectations (the Law of Total Covariance Ross, 2014). If X and Y denote, for example, the numbers of molecules of two chemical species (eg. mRNA and protein) in a given cell, then the covariance of X and Y can be decomposed by conditioning on the state Z ¼ ðZ 1 ; . . . ; Z n Þ of the environmental variables Z 1 ,. . ., Z n , We will see that in many cases of interest the random variable EðX; ZÞ (as a function of Z) splits across common variables with EðY; ZÞ. By this we mean that EðX; ZÞ ¼ f ðZ X Þh X ðZ 0 Þ and EðY; ZÞ ¼ gðZ Y Þh Y ðZ 0 Þ, where Z X are the variables of Z that appear in EðX; ZÞ but not in EðY; ZÞ, and dually, Z Y are those in EðY; ZÞ that are not in EðX; ZÞ. The variables Z 0 are those variables from Z not in Z X [ Z Y . In these cases, the component of CovðX; YÞ that is contributed by the variation in Z (the extrinsic component) may be written as the covariance of the functions h X ðZ 0 Þ and h Y ðZ 0 Þ. Conveniently, in the cases of interest here, the two functions h X and h Y coincide, and this is the form we use in the following decomposition principle. The supplementary material (Appendix A) contains the proof of this result.

The noise decomposition principle (NDP)
Assume that there are measurable functions f , g, and h such that EðX; ZÞ and EðY; ZÞ split across common variables by way of EðX; ZÞ ¼ f ðZ X ÞhðZ 0 Þ and EðY; ZÞ ¼ gðZ Y ÞhðZ 0 Þ. Then, provided that the variables Z 1 ; . . . ; Z m are mutually independent, the normalised covariance of EðX; ZÞ and EðY; ZÞ will identify the total noise on hðZ 0 Þ (i.e. h 2 hðZ 0 Þ ). As we show in the next section, there are many situations where the random variable EðX; ZÞ is precisely the common part of EðY; ZÞ and EðX; ZÞ (i.e., hðZ 0 Þ ¼ EðX; ZÞ), and the normalised intrinsic contribution to the covariance is either zero or negligible. In these cases, the normalised covariance of X and Y will identify precisely the extrinsic noise contribution h 2 ext to the total noise h 2 X . To see this, consider the situation where EðY; ZÞ ¼ f ðZ Y ÞEðX; ZÞ. Then provided f ðZ Y Þ and ðX; ZÞ are independent random variables, the extrinsic contribution to the covariance of X and Y is given by, If the normalised intrinsic contribution to the covariance is either zero or is negligible, it follows from (8) that CovðX; YÞ EðXÞEðYÞ » VarðEðX; ZÞÞ Thus, under certain conditions, measuring the covariance between two non-identically distributed and not-necessarily independent reporters can replace dual reporters.

The pathway-reporter method
We show that for some reporters X and Y belonging to the same biochemical pathway, the covariance of X and Y continues to identify the extrinsic, and subsequently intrinsic, noise contributions to the total noise. The basis of the pathway-reporter method depends on the emergent covariances between the various species (e.g. nascent/mature mRNA and protein) in the gene expression pathway. Qualitatively, this effect can be seen in Figure 3, which compares simulated sample distributions of a simple four-stage model of gene transcription (refer to the model M 4 below) in the case of moderate extrinsic noise to the case with no extrinsic noise. The plots are the bivariate distributions for nascent-mature, nascent-protein, and mature-protein levels, respectively. This will be made more precise below, where we find that it is possible to extract quantitative information about the extrinsic noise distribution itself from this data. Here, nascent (unspliced) mRNA are shown in red/blue wavy lines; the blue segments represent introns and the red segments represent the exons. Nascent mRNA are synthesised at the rate K N , and spliced into mature mRNA (blue wavy lines) at rate K M . Degradation of the mature mRNA occurs at rate d M . The model M 2 is the wellknown two-stage model of gene expression. The model M 3 is the extension of the two-stage model to include promoter switching. The nodes A (active) and I (inactive) represent the state of the gene, with transitions between states occurring at rates l and m. The remaining parameters are the same as those in the model M 2 . The model M 4 extends the model M 3 by incorporating mRNA maturation. Here, K N is the transcription rate parameter, and K M is the maturation rate. All other parameters are the same as in M 3 . (B) Time series simulation of the copy number and activity state of a gene modelled by M 4 . For ease of visualisation, the parameters were artificially chosen as l ¼ 2, ¼ 2:5, K N ¼ 40, K M ¼ 4, K p ¼ 4 and d p ¼ 1, with all parameters scaled relative to d m ¼ 1. (C) As l approaches 0, we see a higher correlation in the copy numbers of nascent mRNA, mature mRNA and protein. Again, the parameters are artificially chosen to be l ¼ 0:1, ¼ 2:5, K N ¼ 80, K M ¼ 4, K p ¼ 4 and d p ¼ 1, with all parameters scaled relative to d m ¼ 1.
Throughout this section, we assume that extrinsic noise sources act independently i.e., the environmental variables Z 1 ; . . . ; Z n of Z are mutually independent. Additionally, our modelling focuses only on a single gene copy, although the same analysis applies to multiple but indistinguishable gene copies; we refer to the supplementary material (Appendix B) for more details.

Measuring noise from a constitutive gene
We consider first the simplest case where the underlying process is constitutive. We begin with a stochastic model of mRNA maturation, which we denote by M 1 ; Figure 4A (top left) gives a schematic of the constitutive model. In this model, the gene continuously produces nascent mRNA according to a Poisson process at constant rate K N , which are subsequently spliced into mature mRNA according at rate K M . Degradation of mature mRNA occurs as a first-order Poisson process with rate d M . The model M 1 , together with its extensions, has been considered in a number of recent studies Gorin and Pachter, 2020a;La Manno et al., 2018;Gorin and Pachter, 2020b; Bergen2020, and has a known solution for the stationary joint probability distribution Jahnke and Huisinga, 2007 given by, where n is the number of nascent mRNA, m is the number of mature mRNA, and the parameter ¼ ðK N ; K M ; d M Þ. We use X N to denote the number of nascent mRNA, X M the number of mature mRNA produced from the same constitutive gene, and Z ¼ ðK N ; K M ; d M Þ. To simplify notation, we abbreviate the variables in Z XN as Z N , and similarly for Z XM . It follows immediately from (11) that X N and X M are independent conditional on Z, and so the intrinsic contribution to the covariance of X N and X M (the first term of (8)) is 0. It is also easy to see from (11) that ðX N ; Since the extrinsic noise sources are assumed to act independently, it follows that the Noise Decomposition Principle (NDP) of the previous section holds. We then have that CovðX N ; X M Þ ¼ h 2 KN , where h 2 KN is the total noise on the transcription rate parameter K N . Thus, measuring CovðX N ; X M Þ can replace dual reporters to decompose the gene expression noise at the transcriptional level.
To support our mathematical results, we simulate the model M 1 subject to parameter variation using the stochastic simulation algorithm (SSA). Table 2 compares the extrinsic noise contributions found from various simulations with the corresponding theoretical values. In each simulation, the degradation rate d m is fixed at 1, with the other parameters scales accordingly. The maturation rate K M is sampled according to a Gammað8; 0:0125Þ distribution, which has coefficient of variation 0.125. We consider different noise distributions on K N , producing a range of noise strengths. Our theory predicts that pathway-reporters will identify the total noise on K N . Overall, we observe an excellent agreement between the results obtained by the pathway-reporter method, the dual-reporter method and the theoretical noise. There is consistently slightly more variation in the pathwayreporter results compared with the dual-reporter results.
To explore the pathway-reporter method more widely, we consider 60 different parameter combinations to produce a range of mean copy numbers consistent with those reported experimentally. We also consider different noise distributions taken from the scaled Beta distribution family in order to produce a range of noise strengths; refer to Supplementary file 1. The pathway-reporter method performs favourably compared to the dual-reporter method calculated from mature mRNA, and consistently outperforms the dual-reporter method on nascent mRNA.
Next, we consider the simplest stochastic model of gene expression that includes both mRNA and protein dynamics: the well-known 'two-stage model' of gene expression, which, together with its three-stage extension to include promoter switching has been widely studied Raser and O'Shea, 2005;Raj et al., 2006;Thattai and van Oudenaarden, 2001;Friedman et al., 2006;Singh and Hespanha, 2007;Shahrezaei and Swain, 2008a;Bokes et al., 2012;Molina et al., 2013. We denote this model by M 2 ; see Figure 4A (top right) for a schematic of this model. In this model, mRNA are synthesised according a Poisson process at rate K m , which are then later translated into protein at rate K p . Degradation of mRNA and protein occur as first-order Poisson processes with rates d m and d p , respectively. If X m denotes the number of mRNA, X p the number of proteins produced from the same constitutive gene, and if Z ¼ ðK m ; K p ; d m ; d p Þ, then the stationary means and covariance are given by Thattai and van Oudenaarden, 2001;Singh and Hespanha, 2007: It is easily verified that EðX p ; ZÞ ¼ f ðZ p ÞEðX m ; ZÞ, where f ðZ p Þ ¼ Kp dp . Thus, it follows from the NDP that the normalised contribution of CovðX m ; X p Þ contributed by Z will identify the extrinsic noise contribution to the total noise on X m . Now, if we assume that d m is fixed across the cell-population, and all parameters are scaled so that d m ¼ 1, we have the following expression for the intrinsic contribution to the covariance of X m and X p ; refer to the supplementary material (Appendix B) for details.
Since mRNA tends to be less stable than protein, we have d p <1, and often d p ( 1Bernstein et al. Schwanhäusser et al., 2011. So, we can expect a ( 1. Further, for many genes we can expect the number of mRNA per cell (K m ) to be in the order of tens, so 1=EðK m Þ<1. It follows that EðCovðX m ; X p ; ZÞÞ (~1, so that CovðX m ; X p Þ will closely approximate the extrinsic noise at the transcriptional level. We test our theory using stochastic simulations of the model M 2 subject to parameter variation. Table 3 gives a comparison of the results of the mRNA-protein reporters and dual reporters. In each case, we varied K p according to a Gammað5; 0:4Þ distribution and d p according to a Gammað8; 0:125Þ distribution; the corresponding noise strengths are 0.20 and 0.125, respectively. We consider different noise distributions on K m , which produce a range of noise strengths, and the noise distribution parameters are selected to produce a mean mRNA of approximately 50 and a mean number of approximately 1000 proteins in each simulation. As our theory predicts, the mRNA-protein reporters identify the extrinsic noise contribution to the total noise on X m . Again, we can see from Table 3 that there is excellent agreement between the results of the pathway reporters and the dual reporters, with slightly more variation in the pathway-reporter results. A larger exploration of the parameter space reveals similar results; these are provided in Supplementary file 1. Thus, despite mRNA and protein numbers not being strictly independent, they can, for practical purposes, replace dual reporters to decompose the noise at the transcriptional level. Here, PR (NM) gives the results of the nascent and mature pathway reporters, while DR (Mat) gives the results of dual reporters calculated from the mature mRNA. We considered noise on both the transcription rate (K N ) and the maturation rate (K M ). The decay rate is fixed at one, with the other parameters scaled accordingly. In each case, the maturation rate K M is varied according to a Gammað8; 1:25Þ distribution, which has coefficient of variation 0.125. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. Our theory predicts that pathway-reporters will identify the noise on the nascent transcription rate K N (h 2 ext ). The noise distribution parameters are chosen to produce an average nascent mRNA copy number of approximately five and an average mature mRNA copy number of approximately 50. The online version of this article includes the following source data for Table 2: Source data 1. This is an Excel spreadsheet containing the data used to produce the final values in Table 2.
We note that both the pathway-reporter (nascent-mature or mature-protein) and dual-reporter methods show slower convergence when copy numbers are low. Pathway reporters usually show fractionally slower convergence and fractionally more variation than a dual reporter, as suggested by the standard deviations in Tables 2 and 3. A more detailed exploration of convergence is given in the supplementary material (Appendix C).

Measuring noise from a facultative (bursty) gene
The most common mode of gene expression that is reported experimentally is burstiness Jones and Elf, 2018;Raj et al., 2006;Bahar Halpern et al., 2015;Suter et al., 2011;Larsson et al., 2019;Golding et al., 2005, in which mRNA are produced in short bursts with periods of inactivity in between. One example is gene regulation via repression, which naturally leads to periods of gene inactivity. Here, we consider a four-stage model of bursty gene expression, which incorporates both promoter switching and mRNA maturation; we denote this model by M 4 ; see Figure 4A (bottom left). This model has recently been considered in Cao and Grima, 2020, where the marginal distributions are solved in some limiting cases. In this model, the gene switches probabilistically between an active state (A) and an inactive state (I), at rates l (on-rate) and m (off-rate), respectively. In the active state, nascent mRNA is synthesised according to a Poisson process at rate K N , while in the inactive state transcription does not occur. Nascent mRNA is spliced into mature mRNA at rate K M ; this in turn is later translated into protein, with rate K P . Degradation of mRNA and protein occur independently of the promoter state at rates d M and d P , respectively.
For this model, we have three natural candidates for pathway reporters: (a) nascent and mature mRNA (b) mature mRNA and protein, and (c) nascent mRNA and protein reporters. Below we show that nascent mRNA-protein reporters yield consistently good estimates of the extrinsic noise contribution h 2 ext to the total gene expression noise, while nascent-mature and mature RNA-protein reporters are reliable in some restricted cases. We begin by showing that each of the reporter pairs (a), (b), and (c) satisfy the Noise Decomposition Principle. We then demonstrate computationally, that despite a lack of independence between these reporter pairs, the pathway-reporter method can still be used to decompose the total gene expression noise at the transcriptional level. Throughout, we let X N denote the number of nascent mRNA, we let X M denote the number of mature mRNA, and let X P denote the number of proteins produced from the same gene. We also let Z ¼ fl; ; K N ; K M ; K P ; d M ; d P g. Table 3. A comparison of the pathway-reporter method and the dual-reporter method for constitutive expression under the model M 2 . Here PR (MP) gives the results of the mRNA-protien pathway reporters, while DR (Mat) gives the results of dual reporters calculated from the mature mRNA. We considered noise on the transcription rate (K m ), the protein synthesis rate (K p ), and the protein decay rate (d p ). The mRNA decay rate is fixed at one. In each case, we varied K p according to a Gammað5; 0:4Þ distribution and d p according to a Gammað8; 0:125Þ distribution; the corresponding noise strengths are 0.20 and 0.125, respectively. We considered different noise distributions on K m , which produce a range of noise strengths. The noise distribution parameters are selected to produce a mean mRNA of approximately 50 and a mean number of approximately 1000 proteins in each simulation. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. As our theory predicts, the mRNA-protein reporters identify the noise on the transcription rate parameter K m (h 2 ext ). The online version of this article includes the following source data for Table 3: Source data 1. This is an Excel spreadsheet containing the data used to produce the final values in Table 3.
Following Raj et al., 2006, we assume that the transcription rate K N is large relative to the other parameters. We further assume that the maturation rate K M is large (i.e. K M >d M ), which is supported by experiments Cao and Grima, 2020. Then, using the results of Raj et al., 2006 and arguments similar to those in Cao and Grima, 2020, it can be shown that the stationary averages for the nascent mRNA, mature mRNA and protein levels are given by, respectively. The supplementary material (Appendix B) provides more detail on how these expressions can be obtained. We consider first the nascent-mature pathway reporters, case (a). From (14), it is easily seen that  (14) that EðX M ; ZÞ ¼ f ðZ P ÞEðX; ZÞ, where f ðZ P Þ ¼ KP dP . Thus, the NDP holds, and so the normalised covariance of EðX M ; ZÞ and EðX P ; ZÞ will identify the total noise on EðX M ; ZÞ (the extrinsic noise on X M ). For the nascent-protein reporters, case (c), it is easy to see that , where gðZ P Þ ¼ KP dM dP . Thus, again the NDP holds, and the normalised covariance of EðX N ; ZÞ and EðX P ; ZÞ will identify the noise on the transcriptional component K N l ðlþÞ . In order for the pathway-reporter method to provide a close approximation to the extrinsic noise in cases (a), (b), and (c), we require that the normalised intrinsic contribution to the covariance is either zero or negligible. This condition will hold provided there is sufficiently small correlation between the reporter pairs. In the case of (prokaryotic) mRNA and protein, this lack of correlation has been been verified experimentally in E. coli (Taniguchi et al., 2010). More generally, it is possible to provide an intuition about the conditions under which the lack of correlation might hold. The time series of copy numbers for each of nascent mRNA, mature mRNA and protein broadly follow each other, each with delay from its predecessor ( Figure 4B). Parameter values that reduce this delay will tend to increase correlation, and thereby increase the normalised intrinsic contribution to the covariance. The primary example of this effect is seen when d p approaches, or even exceeds d m (or for nascent-mature reporters, when d M approaches the maturation rate). A further contributor to high correlation between mRNA and protein, is when the system undergoes long timescale changes. In this situation, the copy numbers tend to drop to very low values for extended periods. The primary parameter influencing this type of behaviour is the active-rate l, specifically, when l tends to 0 ( Figure 4C). An illustrative example of this can be seen by considering a Telegraph system in the limit of slow switching, which produces a copy number distribution that converges to a scaled Poisson-Bernoulli compound distribution: even without any extrinsic noise, the pathway reporter method will identify the h 2 value of the corresponding scaled Bernoulli distribution.
An extensive computational exploration of the parameter space (Supplementary file 2) supports our intuition, though the strength of the effect varies across the three different reporter pairs. This is further corroborated by the heatmaps shown in Figure 5, which for three fixed values of m and a broad spectrum of d p and l values, give the intrinsic term h 2 int in (8), for fixed Z. Thus, the heatmaps provide an estimate for the overshoot error in the pathway-reporter approach. Note that blue pixels represent an overshoot estimate of less than 0.05.
For nascent-protein reporters, the normalised intrinsic contribution to the covariance is satisfactorily small (less than 0.05) except for (a) high values of d p in unison with (b) low values of l (less than 1, although lower values are acceptable if d p is small). The assumption (a) that d P <d M is known to be true for a large number of genes, and is justified by the difference in the mRNA and protein lifetimes. While there is of course variation across genes and organisms, values of d P 0:5d M and even d P 0:2d M seem reasonable for the majority of genes. In E. coli Taniguchi et al., 2010 andyeast Belle et al., 2006, for example, mRNA are typically degraded within a few minutes, while most proteins have lifetimes at the level of 10 s of minutes to hours. For mammalian genes Schwanhäusser et al., 2011, it has been reported that the median mRNA decay rate d M is (approximately) five times larger than the median protein decay rate d P , determined from 4200 genes. Assumption (b) requires that the gene is sufficiently active. In a recent paper by Larsson et al., 2019, the promoter-switching rates l, m, and the transcription rate K of the Telegraph model are estimated from single-cell data for over 7000 genes in mouse and human fibroblasts. Of those genes with mean mRNA levels greater than 5, we found that over 90% have a value for l of at least 0.5, and over 65% have a value for l greater than 1. In Cao and Grima, 2020, a comprehensive list of genes (ranging from yeast cells through to human cells) with experimentally inferred parameter values are sourced from across the literature (see Table 1 in . After scaling the parameter values of the 26 genes reported there, we find that around 88% have a value for l of at least 0.5, and approximately 58% have a value for l greater than 1. Thus, the heatmaps given in Figure 5 (top panel) suggest that nascent-protein reporters will provide a satisfactory estimate of the extrinsic noise level for a substantial fraction of genes.
The mature nRNA-protein reporters are less reliable, with the requirement of slow protein decay and higher on-rate being more pronounced than for the nascent mRNA-protein reporters; this is evident from Figure 5 (second panel). The performance of the nascent-mature reporter is of course independent of d p , but is only viable in the case of a large on-rate (see Figure 5-figure supplement 1). Figure 5. Heatmaps for the intrinsic contribution to the covariance. These heatmaps estimate the level of overshoot in the pathway-reporter approach for the nascent-protein and mature-protein reporters; blue regions show an overshoot of less than ' 0:05.
Here, the intrinsic contribution is calculated using stochastic simulations of the model M 4 . For the mature-protein and nascent-protein reporters, we consider three different values of the parameter m, specifically ¼ 2, ¼ 10, and ¼ 20. In all cases, the parameter d p and the on-rate l are varied between 0.01 and 0.5, and 0.5 and 5, respectively. The parameters of the model M 4 are scaled so that d M ¼ 1. The maturation rate is fixed at 20, with the parameters K N and K P chosen to produce a mean protein level of 1000, a mean nascent mRNA level of 5 and a mean mature mRNA level of 50. Each individual pixel is generated from a sample of size 3000, although there is still some instability in the convergence for the nascent-protein reporter, particularly as the overshoot estimation starts to increase, and particularly as m is larger. To produce more accurate values, the case of ¼ 2 was averaged over two full experiments while ¼ 20 was averaged over three. This was also done for the mature-protein reporter, however for these images there was almost no visible difference between the various runs of the experiment and their averages. Each of the three m values takes approximately 7-10 hr of computation, depending on lead in time before sampling within a simulation.  We test our approach for each of the pathway reporter pairs (a), (b), and (c) against dual reporters using stochastic simulations. Table 4 shows the results from six simulations across a spectrum of behaviours from moderately slow switching, to fast switching as well as a range of levels of burstiness. For each of the parameters ; l; K P ; d P , we selected a scaled Betað5; 6Þ distribution, with squared coefficient of variation h 2 ¼ 0:1; the scaling is chosen in each case to achieve a mean value equal to the parameter value given in Table 4. It is routinely verified that scaling these distributions does not change the value of h 2 . The parameter K N is given the noise distribution Betað3; 6Þ, which has a slightly higher coefficient of variation h 2 ¼ 0:2. In order to achieve direct benchmarking against the dual reporters, the parameter K M is fixed; we select K M ¼ 10. This is because the nascent-protein pathway reporter estimates noise on the value of K N l lþ , while the mature mRNA dual-reporter measures noise on KN KM l lþ , and these coincide only when K M is fixed. The mean values of K N and K P are chosen to achieve approximate average nascent mRNA levels, mature mRNA levels and protein levels at 5, 50, and 1000 respectively, given the chosen values of l; ; d P .
The results for the nascent mRNA-protein reporters, case (c), given in Table 4 show comparable performance to dual reporters, with only modest overshoot; even in the worst performing case of l ¼ 0:5, ¼ 1 the result of the pathway reporters is within one standard deviation, in a very tight distribution. The error heatmaps of Figure 5 provide an accurate estimate of the overshoot in the nascent-protein results in Table 4. As an example, the first row is most closely matched by the heatmap at top left of Figure 5, which at l ¼ 0:5 and d P ¼ 0:1 is suggestive of an error around the boundary between blue and red (around 0.06). The same accuracy is obtained for the other rows. As predicted, the mature-protein reporters show significantly more overshoot, especially with the less active genes. Improved accuracy can again be obtained by subtracting the estimated overshoot given in the error heatmaps from the obtained value. Thus for example, the error heatmap for ¼ 2 ( Figure 5 lower left) gives an error approximately 0.07 for l ¼ 1; d P ¼ 0:1, which agrees very closely to the actual overshoot of 0.07 shown in the corresponding row of Table 4. An overshoot of approximately 0.06 is suggested by the heatmap for ¼ 2, when l ¼ 2; d P ¼ 0:3, which leads to a correction from 0.35 in Table 4 to a value of 0.29. This is quite consistent with the dual reporter benchmark of 0.27. As expected (based on Figure 5-figure supplement 1), nascent-mature reporters do not perform well on bursty systems except for high l and so the values are not included in Table 4; only in Table 4. A comparison of the pathway-reporter method and dual-reporter method for bursty expression.
Here PR (NP) gives the results of the nascent and protein pathway reporters, PR (MP) gives the results of the mRNA and protein reporters, while DR (Mat) gives the results of the dual reporters calculated from the mature mRNA. We consider noise on all of the parameters except for d M and K M ; see discussion in main text. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. Our theory predicts that pathwayreporters will identify the noise at both the promoter level (l; ) and transcriptional level (K N ); the total extrinsic noise in each case is given by h 2 ext . As before, the noise distribution parameters are chosen to produce an average nascent mRNA copy number of 5 and an average mature mRNA copy number of 50, and an average number of 1000 proteins. The online version of this article includes the following source data for Table 4: Source data 1. This is an Excel spreadsheet containing the data used to produce the final values in Table 4. the case of l ¼ ¼ 10 does the result begin to approach the dual reporter value, returning 0:32 AE 0:03.

Generality of the pathway reporter method
To test the robustness of our pathway reporter approach, we validate our theoretical results on various other gene expression dynamics. (1): We begin by considering a more detailed model of the mRNA maturation process, where the nascent mRNA maturate after a fixed amount of time.
The assumption of a fixed maturation time is sometimes taken to approximate the combined effect of intermediate maturation processes such as initiation, elongation, and release Xu et al., 2016. More specifically, we consider the model M 4 above ( Figure 4D) in the case of constitutive expression (l ¼ 1, ¼ 0), and replacing the first-order reaction K M by a fixed interval of time. Additionally, we explore maturation times sampled from Erlang distributions, to account for the fact that maturation can involve several shorter stochastic processes. We find that the extrinsic noise contribution obtained using the nascent and mature mRNA reporters match closely to the true (dual reporter) values across a range of maturation times; refer to the supplementary material for details.
(2): Next we consider an extension of a model of transcriptional bursting given in Bartman et al., 2019;. The model we consider is the same as in Bartman et al., 2019;, however, is extended to include protein synthesis and degradation. This model captures the transcriptional process at a finer level of detail, and is argued in  to be the model most closely matching experimental observations. In this more nuanced 'multiscale' model, the gene stochastically switches between three states: two active states S 10 and S 11 , and one inactive state S 0 . The activation of the gene occurs in two steps, initially by the binding of transcriptional factors (transition from S 0 to S 10 at rate l 1 , and reversible at rate 1 ), and then as a secondary step, by the binding and pause of the mRNA polymerase (transition from S 10 to S 11 at rate l 2 ). Transitions from S 11 to S 0 also occur at rate 1 , due to detachment of both the transcriptional factors and polymerase. Transcription of nascent mRNA (at rate K N ) occurs only in state S 11 and results in immediate transition to state S 10 . Nascent mRNA maturate at rate K M , and are subsequently translated into protein at rate K p . Degradation of mRNA and protein occur with rates d m and d p , respectively. All reactions are considered to be first-order with exponentially distributed waiting times between successive reactions. A schematic of the model is given in Figure 6 (inner rectangle). In this case, we are again able to prove that the Noise Decomposition Principle holds for all reporter pairs taken from this pathway using existing formulae derived in Cao et al., 2020 for the marginal means. For details refer to the supplementary material (Appendix Convergence of Pathway and Dual Reporters).
(4): The cell cycle is a major contributing factor to extrinsic noise, introducing population heterogeneity (as cells are at different stages of the cell cycle), as well as internal dynamics to parameter values. Here we incorporate the salient features of the cell cycle into model (3), which is measurable as extrinsic noise by our methodology. Specifically, we model the effects of (i) gene replication, (ii) dosage compensation, (iii) binomial partitioning of products due to cell division, as well as (iv) cellcycle length variability. Refer to Figure 6 for a schematic. This model is an extension of that solved in Cao and Grima, 2020. Our model further incorporates the multi-scale dynamics of model (3) and the Erlang-distributed maturation times of model (1). As far as we are aware, this model has not been explored even by stochastic simulations before. A detailed description of how the above cellcycle effects are captured in our model is given as follows. (i) Replication results in a doubling of the gene from one to two at the replication time, t r . This replication occurs over a period which is much shorter than the length of the cell cycle, and we follow the assumptions in Cao and Grima, 2020 by considering it to occur instantaneously. (ii) Dosage compensation changes the rate at which the gene switches from inactive to active (l 1 ) upon replication at time t r . Again following Cao and Grima, 2020, the activation rate after replication is 70% of the rate prior to replication. (iii) Binomial partitioning of nascent mRNA, mature mRNA and protein at cell division. We assume that nascent mRNA, mature mRNA, and protein segregate into the two daughter cells, with independent probability 1=2. We follow just one of the daughter cells, chosen at random with equal probability. (iv) Cell-cycle length variability. The time between successive cell divisions is stochastic, and is assumed to be sampled from an Erlang distribution. This has been shown to be consistent with experimental data . The time to replication, and subsequently to cell division, are both chosen from an Erlang distribution with shape parameter 12, which produces a total cell cycle length distributed according to Erlangð24; lÞ; this matches the Erlangð24; lÞ cell cycle length in  where replication is at the exact halfway point. We similarly choose a rate parameter l ¼ l 1 chosen at a commensurate proportion to our mRNA decay rate of d ¼ 1.
In each of the cases (1 -4) above, we find that the correlations between reporter pairs is negligible, and the predicted contribution of extrinsic noise matches those obtained from the dual reporter method across a range of parameter combinations. Details of the simulation methods and results can be found in the supplementary material (Appendix Convergence of Pathway and Dual Reporters). In summary, the results show that our pathway reporter approach is remarkably insensitive to the specific dynamics of mRNA and protein synthesis. In particular, the correlations between reporter pairs do not strongly depend on the details of the gene expression model used. Figure 6. Multiscale model of transcriptional bursting with additional features of the cell cycle. In this model, the gene stochastically switches between three states: two active states, S 10 and S 11 , and one inactive state S 0 . Gene activation occurs in two steps, initially by the binding of transcription factors (at rate l 1 , reversible at rate 1 ), and then as a secondary step by the binding and pause of the mRNA polymerase (at rate l 2 ). Transitions from S 11 to S 0 also occur at rate 1 , due to detachment of both the transcriptional factors and polymerase. Transcription of nascent mRNA (at rate K N ) occurs only in state S 11 and results in immediate transition to state S 10 . Nascent mRNA mature at rate K M , and are subsequently translated into protein at rate K p . Degradation of mRNA and protein occur with rates d m and d p , respectively. We verify our pathway reporter method on three variations of the multiscale model. First, we assume all reactions are first-order Poisson processes (Case (2) in the main text). We then incorporate further details of the mRNA maturation process, where nascent mRNA occurs after a fixed amount of time (Case (3)). Finally, we incorporate features of the cell-cycle such as gene replication, dosage compensation, cell division, and cell-cycle length variability, as well as incorporating more realistic Erlang distributed maturation times (Case (4)).

Discussion
Despite the proliferation of experimental methods for single-cell profiling, the ability to extract transcriptional dynamics from measured distributions of mRNA copy numbers is limited. In particular, the multiple factors that contribute to mRNA heterogeneity can confound the measured distribution, which hinders analysis. Theoretical innovations that allow us to quantify and help in identifying the causes of these observable effects are therefore of great importance. In this work, we have demonstrated, through a series of mathematical results, that it is impossible to delineate the relative sources of heterogeneity from the measured transcript abundance distribution alone: multiple possible dynamics can give rise to the same distribution. Our approach involves establishing integral representations for distributions that are commonly encountered in single-cell data analysis, such as the negative binomial distribution and the stationary probability distribution of the Telegraph model. We show that a number of well-known representations can be obtained from our results. A particular feature of our non-identifiability results is that population heterogeneity inflates the apparent burstiness of the system. It is therefore necessary to collect further information, beyond measurements of the transcripts alone, in order to constrain the number of possible theoretical models of gene activity that could represent the system. In particular, additional work may be required to determine the true level of burstiness of the underlying system.
We have developed a theoretical framework for estimating levels of extrinsic noise, which can assist in resolving non-identifiability problems. The dual reporter method of Swain et al., 2002 already provides one such approach; but it is experimentally challenging to set up in many systems, and requires strictly identical and conditionally independent pairs of gene reporters. Our Noise Decomposition Principle directly generalises the theoretical underpinnings of the dual reporter method and related approaches Bowsher and Swain, 2012;Jetka et al., 2018; here we have used it to identify a practical approach-the pathway-reporter method-for obtaining an effective and experimentally more easily implementable noise decomposition. Our approach allows us to use measurements of two different species from the transcriptional pathway of a single gene copy instead of having to set up a more cumbersome dual reporter assay. The accuracy of the pathwayreporter method is provably identical for constitutive gene expression, and in the case of nascentmature mRNA reporters, the measurements are readily obtainable from current single-cell data Shah et al., 2018;La Manno et al., 2018;Skinner et al., 2016. For bursty systems, the method in general provides only an approximation of the extrinsic noise. We are, however, able to demonstrate computationally, that one of the proposed pathway reporters provides a satisfactory estimate of the extrinsic noise for most genes. The other pathway reporters also provide viable estimates of the extrinsic noise in some cases. We further validate our theoretical framework on synthetic data for genes with various underlying gene expression dynamics. Our results show that the pathway reporter method is independent of the specific dynamics of mRNA and protein synthesis, and therefore should be applicable to a broad range of experimental scenarios.
Despite the generality of our theoretical contribution, our pathway-reporter approach has some caveats. In particular, the approach relies on the assumption that extrinsic noise sources act independently. Experimentally, however, these may be correlated. For example, it has been suggested Hilfinger et al., 2016;Cole and Luthey-Schulten, 2017 that the transcription and translation rates in E. coli anticorrelate. Additional work is required to determine degree to which the independence of noise sources is a reasonable assumption.
Recent developments in single-cell profiling now allow simultaneous measurements of transcripts and proteins in thousands of single cells Stoeckius et al., 2017;Peterson et al., 2017;Reimegård et al., 2019. As discussed in Gorin and Pachter, 2020a, experimental improvements that would additionally allow measurements of nascent transcripts, coupled with theoretical developments such as those presented here, will allow for identification of noise sources on a genome-wide scale. Our work reveals that extrinsic noise distorts the multivariate copy number distribution of the different species in the gene expression pathway. We have exploited this to yield reliable estimates of noise strength, which we are confident will assist in setting better practices for model fitting and inference in the analysis of single-cell data. A more nuanced analysis of this multivariate distribution may offer even further insight into model and noise identifiability. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Determination of the marginal means for model M 4
In order to establish the Noise Decomposition principle in the case of bursty gene expression, we rely on expressions for the marginal stationary means of the model M 4 ; see Figure 4A (bottom left) and the associated caption for more details. To derive the marginal means for nascent and mature mRNA and for protein (Equation (14) of the main text), we first observe that the nascent mRNA population may treated identically to that of mRNA in general (that is, no distinction between nascent and mature), as in Peccoud and Ycart, 1995, except that mRNA decay is replaced by the sum of decay and maturation. As in the work of Cao and Grima, 2020, the assumption of fast maturation allows us to ignore decay completely in the nascent phase, so that the marginal distribution is identical to that of Peccoud and Ycart, 1995, except with decay replaced by maturation. This leads to a marginal nascent mRNA mean of The marginal means for mature mRNA and protein are derived in Raj et al., 2006 under the assumption that the transcription rate parameter K N is large relative to the other parameters. The expressions are given by Formally, the marginal means in Raj et al., 2006 are for the three-stage model M 3 , which ignores the downstream processing of mRNA, such as splicing. The assumption of fast maturation however, justifies the treatment of the nascent phase of mRNA as an ephemeral step within the Poissonian modelling of mRNA transcription.
There are a number of possibly compounding assumptions on the parameters here, but simulations show that there is a lot of tolerance, with even only moderate maturation and transcription still returning sample means consistent with the formulas.

Measuring noise from an instantaneously bursty gene
Here we use the simple stochastic model of Singh and Bokes, 2012 that includes both instantaneous transcriptional bursting and mRNA maturation. This model can be obtained as a limiting case of the model M 4 ( Figure 4A (top left) in the main text), where the off-rate has tended toward infinity, while the on-rate l remains fixed. Under this condition, transcription is rare enough to be considered instantaneous, leading to 'bursts' of transcriptional activity. The intensity of the bursts M is known to be geometrically distributed with mean burst size B ¼ KN . If we let X N and X M denote the number of nascent and mature mRNA, respectively, then according to Singh and Bokes, 2012, the steady-state marginal means and covariance are given by It is easily seen from (32) that EðX N ; ZÞ ¼ f ðZ N ÞBl and EðX M ; ZÞ ¼ gðZ M ÞBl, where f ðZ N Þ ¼ 1 KM and gðZ M Þ ¼ 1 dM . Thus, the Noise Decomposition Principle holds. Using arguments similar to those given in the above section, it is straightforward to derive the following expression for the error in the pathway reporter approach: We can expect that a » 1, so that unless the burst frequency is substantial, that is l ) 1, the overshoot in the the nascent-mature pathway-reporter approach is significant.

Convergence of pathway and dual reporters
Appendix 4-figure 1. Comparison of convergence of h 2 estimates for low and high mRNA levels by way of nascent-mature reporters and mature-mature reporters. Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases, the simulated genes are constitutive and noise is on all parameters except for d M ¼ 1. The green line gives the squared coefficient of variation for K N , set to 0.2, which is the value the various reporters are expected to estimate. (A) Convergence of the h 2 estimate over the first 2000 samples in the low-and highoutput genes. (B) Convergence of the h 2 estimate over 100; 000 samples in the low-output gene only.
Convergence of both pathway reporters and dual reporters in low output genes is slower than for high output genes, and this affect is most pronounced for reporters taken from nascent mRNA. Fig. Convergence of Pathway and Dual Reporters compares convergence in some low output genes and high output genes for nascent-mature pathway reporters and the mature-mature dual reporter, in the case of constitutive expression. From Equation (11) of the main text and the NDP, these should measure precisely the extrinsic noise on the transcription rate parameter K N . In the low output gene, both the dual reporter and pathway reporters are yet to show accurate measurement of the overall extrinsic noise after 1000 samples (Figure Convergence of Pathway and Dual ReportersA), although they are providing rough estimates after as little as a few hundred samples. Pathway reporters for nascent-mature typically exhibit slightly slower convergence than mature-mature dual reporter; the difference is marginal, but can be seen for both the high output gene (compare the values after 500 samples) and the low output gene (compare the values after 1000 samples). In this figure, all parameters (except the reference parameter d M ) were given scaled Beta distribution noise. The noise on K N is a Betað3; 6Þ distribution scaled to achieve EðK N Þ ¼ 5 in the low output case and EðK N Þ ¼ 50 in the high output case. The squared coefficient of variation is 0.2, and is shown in green in the figure.
Appendix 4-figure 2. Comparison of convergence for low and high mRNA levels by way of mature-protein and mature-mature reporters. Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases the simulated genes are constitutive and noise is on all parameters except for d M ¼ 1. The noise on K N has squared coefficient of variation equal to 0.2, which is shown as the red horizontal line. Our theory shows that mature-protein reporters will return an overshoot that is negligible in the high-output gene (the blue horizontal line), but larger in the low output gene ( Convergence for the mature-protein reporters is considered in Figure Convergence of Pathway and Dual Reporters. Here, Equation (13) of the main text shows that we should expect an overshoot in comparison to the mature-mature dual reporter. We have again compared a low output gene with a high output gene, and with all parameters (except d M ) experiencing noise. The transcription is again given scaled Betað3; 6Þ distribution, having squared coefficient of variation 0.2, with scaling to achieve EðK N Þ ¼ 5 in low output and EðK N Þ ¼ 50 in the high output. The noise distribution for d P is a Betað8; 6Þ distribution, scaled to achieve a mean value of 0.2. Computational sampling from 10 7 samples finds the value of  Figure Convergence of Pathway and Dual ReportersB shows the same over 20,000 samples; this time two low-output two high-output genes are considered, so that the variation around the expected long term value can be seen. It is evident that reasonable estimates are given after a relatively modest number of samples, but there is a very long delay to a highly accurate convergence for the pathway reporters in the low output gene.
given gene copy is exactly as in Case (1) above, but within the overall system described in Case (3). To capture the cell cycle, the SSA is now further modified to handle the two key sporadic changes: gene replication and cell division. After gene replication, we perform two independent modified SSA simulations, with common parameter values. The final copy numbers are obtained as a sum of those from each gene copy. At cell division, we follow just one daughter cell, selecting the inherited copy numbers by way of binomial partitioning from the mother cell values at the point of division (including the maturation register). The length of these cell-cycle phases is sampled from the chosen Erlang distribution, on commencement of the simulation of the phase. The results of pathway reporters for each of the Cases (1 -4) above are given in Appendix 5tables 1-8 below. In all cases, the values given are the average of 20 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. For each model, we consider two parameter sets, each without extrinsic noise (Subcase A) and with extrinsic noise (Subcase B). For Case (1), the model parameters are chosen to produce an average nascent mRNA copy number of 10, and an average number of 2000 proteins; the average mature mRNA copy number varies according to the maturation time. We find that all of the pathway reporters accurately predict the true extrinsic noise levels (as given by dual reporters) across a range of maturation times; refer to Appendix 5-table 1, 2. For Cases (2 -4), the model parameters are chosen to produce an average nascent mRNA copy number of 5 and an average mature mRNA copy number of 100, and an average number of 2000 proteins. We verify computationally that the mature-protein and nascentprotein reporter pairs provide accurate predictions of extrinsic noise contributions across a range of noise levels. For these results refer to Appendix 5-tables 3-8. We mention that the results of Case (4)B (Appendix 5-tables 8) may suggest a slight undershoot in the pathway reporter values in comparison to dual reporters. The difference is, however, within one standard deviation, and are calculated from a relatively small sample size due to the complexity of the simulation and corresponding run time. Investigating the presence and possible causes of this marginal effect for this model and other complicated models may be of interest in further work.
Appendix 5-table 1. A comparison of the pathway-reporter method and dual-reporter method for constitutive gene expression with Erlang-distributed maturation times (Case (1)A).
Here, PR (NP) gives the results of the nascent and protein pathway reporters, PR (MP) gives the results of the mRNA and protein reporters, while DR (Mat) gives the results of the dual reporters calculated from the mature mRNA. The maturation time T mat is chosen to be Erlang distributed with mean length 1=30; 0:05, and 0.1, respectively. We consider the rate parameters for the remaining exponentially distributed times to be constant, so that there is no extrinsic noise. The pathway-reporters correctly identify the zero extrinsic noise contribution. For each of the parameters K P ; d P we selected a scaled Betað5; 6Þ distribution, with squared coefficient of variation h 2 ¼ 0:1; the scaling is chosen in each case to achieve a mean value equal to the parameter value. The parameter K N is given the noise distribution Betað3; 6Þ, which has a slightly higher coefficient of variation h 2 ¼ 0:2. In order to benchmark against dual reporters, the maturation time was fixed in each case. The extrinsic noise contribution predicted by the pathway-reporters matches well with the dual reporter values. Continued on next page