Abstract
Gene expression is stochastic and displays variation (“noise”) both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identically regulated gene pairs in single cells. We examine established formulas [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): “Stochastic gene expression in a single cell,” Science, 297, 1183–1186.] for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model. This allows us to derive alternative estimators that minimize bias or mean squared error. We provide a geometric interpretation of these results that clarifies the interpretation in [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): “Stochastic gene expression in a single cell,” Science, 297, 1183–1186.]. We also demonstrate through simulation and re-analysis of published data that the distribution assumptions underlying the hierarchical model have to be satisfied for the estimators to produce sensible results, which highlights the importance of normalization.
1 Introduction
A gene can have different expression levels in living cells that have the same genetic material and are subject to the same environment (Stegle et al., 2015). During early development of an organism, distinct expression profiles eventually lead to formation of different tissues. Moreover, complex tissues such as brain have many different subtypes of cells with different gene expression profiles. However, variation in expression between cells is reflective not only of distinct biological state, but also of stochasticity underlying many of the processes fundamental to the molecular biology of cell.
In a classic paper on the stochasticity of gene expression in single cells, Elowitz et al. (2002) introduced a clever two-reporter expression assay designed to tease apart “intrinsic” and “extrinsic” variation (also called “noise”) from the overall variability in gene expression: the intrinsic noise is the variation in the expression of the same gene in identical environment, whereas the extrinsic noise is the variation in gene expression due to cellular environment that impacts all the genes at once. The idea is as follows: two identically regulated reporter genes (cyan fluorescent protein and yellow fluorescent protein) are inserted into individual E. coli. cells, allowing for comparable expression measurements within and between cells. If n cells are assayed, this leads to expression measurements c1, … cn and y1, … yn, where the pair (ci, yi) represent the expression measurements for the cyan and yellow reporters in the ith cell. The goal of the experiment is to measure the variance in gene expression from the pairs (ci, yi) (denoted by
where
Hilfinger and Paulsson (2011) later interpreted these estimates in terms of the “law of total variance” (explained in the next section), which sheds light on the statistical basis of the ELSS estimators but does not address questions about their statistical properties. In this paper, we derive the bias and mean squared error of the ELSS estimators and examine their optimality. We also examine the geometric and biological interpretation of the estimators.
The processes that lead to the expression of the reporters (or genes in general) are much more complex than described here, e.g. the models described in the paper ignore the effects of translation. Many studies (e.g. Rausenberger and Kollmann 2008 and Komorowski et al. 2013) have developed detailed mathematical models for these processes. While some of our results may generalize and be relevant in more general settings, we restrict our analysis to the intrinsic and extrinsic noise as examined by Elowitz et al. (2002) and accessible via static reporter expression experiments. Analyses are implemented in the R package noise available on CRAN.
2 A hierarchical model
We begin by introducing a hierarchical model that provides a formal model for the experiments of Elowitz et al. (2002) and that provides insight into the numerators of (1,2,3). They are the key components of the Elowitz et al. (2002) formulas and can be viewed as estimators of true variances. We note that lower case letters such as ci and yi denote observations not only in the ELSS formulas but throughout our paper; we reserve uppercase letters for random variables.
A hierarchical model for expression of the two reporters in a cell emerges naturally from the assumption that reporter expression, conditioned on the same cellular environment, is represented by independent and identically distributed random variables. To allow each cell to be different from the others, we introduce independent identically distributed random variables Zi, for i = 1, …, n that represent the environments of cells [as in Hilfinger and Paulsson (2011)]. Consistent with Elowitz et al. (2002), we posit that the cellular conditional random variables associated to the two reporters have the same distribution F with mean Mi and variance
and
Thinking of a two reporter experiment as “random,” in the sense that the states of cells Z1, … Zn are random, across cells we have
and
where G is the distribution of all the Mis, with mean μ and variance
For any i, the mean of Ci or Yi is μ, according to the following calculation:
The total variance in Ci (or Yi) can be calculated using the “law of total variance”:
Using the notation of the hierarchical model described above, and dropping the subscripts for expectation because they are clear by context, we have, for any i,
With this notation equation (7) becomes
This means that the marginal (unconditional) distributions of Ci and Yi are identical:
where the marginal distribution F′ may or may not be the same as the conditional distribution F.
In the next sections, we will derive the estimators for extrinsic and intrinsic noise, and examine the bias and MSE of each estimator. Specifically, for any estimator S, the MSE of S with respect to the true parameter τ is calculated as follows:
where E[S] − τ is the bias of S.
3 Extrinsic noise
To examine estimators for extrinsic noise, we start with the law of total variance, noting that the within-cell variability Var[E[Ci|Zi]] can be written as:
This connection between the extrinsic noise, the law of total variance and the covariance of Ci and Yi was noted in Hilfinger and Paulsson (2011).
Formula (11) leads to the following unbiased estimator for the extrinsic noise, as it is an unbiased estimator estimator for the covariance:
We note that the ELSS estimator (2) uses the scalar 1/n, which unlike the case of the intrinsic noise estimator (1) leads to a biased estimator in this case.
In order to find the estimator that minimizes the MSE, we consider the following general estimator:
We assume that Mi is normal and that μ = 0 and ϵ = 0. The MSE of Sext is
which is minimized when
The last step in (12) is due to Equations (9), (10) and (11):
It is interesting to note that (12) comprises two parts: the first,
4 Intrinsic noise
Also starting with the law of total variance, the within-cell variability
This leads to the following unbiased estimator for the intrinsic noise:
To find the estimator that minimizes the MSE, we consider estimators of the following general form
Assuming normality of the distribution G (i.e. cell-specific means Mi follow a normal distribution), as well as μ = 0 and ϵ = 0, the MSE is given by
The value of a that minimizes this expression is
See Appendices A and C for the complete derivation.
The analysis above can be simplified with an additional assumption, namely that
The unbiased estimator with this form is easily derived by observing that
Thus, in order for
In order to study the mean squared error and derive an estimator that minimizes it, we again assume normality of G. The MSE of
Assuming again that μ = 0 and ϵ = 0, the MSE simplifies to
which is minimized when a = n + 2 (see Appendices A and D for the complete derivation).
5 Geometric interpretation
Figure 3A of Elowitz et al. (2002) shows a scatterplot of data (ci, yi) for an experiment and suggests thinking of intrinsic and extrinsic noise geometrically in terms of projection of the points onto a pair of orthogonal lines. While this geometric interpretation of noise agrees exactly with the ELSS intrinsic noise formula, the interpretation of extrinsic noise is more subtle. Here we complete the picture.
To understand the intuition behind Figure 3A in Elowitz et al. (2002), we have redrawn it in a format that highlights the math (Figure 1). The projection of a point (ci, yi) onto the line y = c is the point (
The ELSS estimate for the extrinsic noise is the sample covariance. Intuitively, it indicates how the measurements of one reporter track that of the other across cells. The geometric meaning of the sample covariance in Figure 1 is based on an alternative formulation of sample covariance (Hayes, 2011):
This formulation of the sample covariance has the interpretation of being an average of the signed area of triangles associated to pairs of points. Figure 1 illustrates these signed triangles using a randomly selected point (the blue point). This formulation is very different from what might be considered at first glance an appropriate analogy to intrinsic noise, namely the sample variance along the line y = c.
An alternative estimate for the extrinsic noise based on the sample variance of the projected points along the line y = c (using the projected centroid as the mean, which is shown as the green point in Figure 1) turns out to be biased by an amount equal to the total noise. This sample variance averages the squared distances of the data points from the centroid (green point) after projection onto the line y = c; see the distance between the red and green points in Figure 1. Since
the bias is
which is the true total noise.
The above calculation also shows that if the intrinsic and extrinsic noise are both estimated as variances along the projections to the lines y = −c and y = c respectively, then the total noise will be overestimated by a factor of two.
In summary, the caption to Figure 3A in Elowitz et al. (2002) is completely accurate in stating that “Spread of points perpendicular to the diagonal line on which CFP and YFP intensities are equal corresponds to intrinsic noise, whereas spread parallel to this line is increased by extrinsic noise.” However the geometric interpretation of covariance makes it precise how an increase in extrinsic noise relates to the spread of points in the direction of the line y = c.
6 Practical considerations
6.1 Optimal estimators for intrinsic and extrinsic noise
We have derived the estimators that are optimal for minimizing bias or the MSE (summarized in Table 1). The ELSS estimator in (1) is in fact a special case of the general estimator under the assumption that
Similar to the estimators for the intrinsic noise, we derived two estimators for extrinsic noise, optimized for bias and for MSE respectively (Table 1).
Exact estimator for small n | Large n | ||
---|---|---|---|
Minimizing bias (Unbiased) | Minimizing MSE | ||
Intrinsic noise | |||
General | |||
Assuming | (ELSS estimator) | (ELSS estimator) | |
Extrinsic noise | |||
General | where | (ELSS estimator) |
The sample size n is the leading term in the denominator of all the optimal (in either the bias or MSE sense) intrinsic and extrinsic noise estimators. As a result, the unbiased estimator has the same form as the min-MSE estimator for large n (Table 1). For extrinsic noise, the general estimators converge to the ELSS estimate (Table 1). The mean and variance of the estimators are summarized in Table 6 in Appendix E. For intrinsic noise, assuming
As a general rule we recommend computing the inverse squared correlation between the ci and yi values and applying the min-MSE estimators when the sample size is small (e.g. much less than 50).
It is worth pointing out that the correction factor 1/a in the min-MSE estimators tends to be smaller than that in the unbiased estimators (1/(n − 1)) and the asymptotic estimators (1/n; Table 1). This smaller correction 1/a makes the min-MSE estimators “shrinkage” estimators, such that they achieve better MSE despite being biased, just like the Jame-Stein estimator (James and Stein, 1961). Our simulation results confirm this point (Table 2). However, using the sample correlation, instead of the true one, in our min-MSE estimators leads to increased MSE, although the estimates with the sample correlation do not differ much on average from that with the true correlation.
Simulation parameters | |
---|---|
Sample size (n) | 50 |
Intrinsic noise (σ2) | 0.7 |
Extrinsic noise ( | 0.8 |
Distribution of means (G) | N(1, 0.8) |
Distribution of vars (H) | Constant: |
Distribution of Ci|Zi | N(Mi, 0.7) |
Distribution of Yi|Zi | N(Mi, 0.7) |
No. of data sets | 500 |
Extrinsic noise estimate | |
Unbiased | 0.80 (0.25; 0.0604) |
minMSE (true corr) | 0.73 (0.23; 0.0552) |
minMSE (sample corr) | 0.73 (0.24; 0.0634) |
Asymptotic/ELSS | 0.78 (0.06; 0.0582) |
6.2 Data normalization
Our hierarchical model, as well as the ANOVA interpretation, is consistent with the model in Elowitz et al. (2002); both models assume that within each cell there are two distributions for the expression of the two reporter genes and that they have the same true mean and true variance. With the normality assumption, this means that the two reporters have identical distributions. Elowitz et al. measured the single-color distributions of strains that contained lac-repressible promoter pairs, which verified that this was a reasonable assumption in the case of cyan fluorescent protein (CFP) and yellow fluorescent protein (YFP) in their experiment. We also performed simulations under the hierarchical model, with and without identical distribution for the two reporters, and summarized the results in Table 3. Estimates of intrinsic and extrinsic noise are the same as the truth when the identical distribution assumption applies. When this assumption is not satisfied, the theory breaks down and it is unclear what the estimates mean.
Identical distribution | Different distributions | |
---|---|---|
Simulation parameters | ||
Sample size (n) | 1000 | 1000 |
Intrinsic noise (σ2) | 0.7 | 0.7 |
Extrinsic noise ( | 0.8 | 0.8 |
Distribution of means (G) | N(1, 0.8) | N(1, 0.8) |
Distribution of vars (H) | Constant: | Constant: |
Distribution of Ci|Zi | N(Mi, 0.7) | N(Mi, 0.7) |
Distribution of Yi|Zi | N(Mi, 0.7) | N(2Mi, 1.5 × 0.7) |
No. of data sets | 500 | 500 |
Sample correlation | 0.53 (0.02) | 0.60 (0.02) |
Intrinsic noise ( | ||
General | ||
Unbiased | 0.70 (0.03) | 1.54 (0.07) |
minMSE | 0.70 (0.03) | 1.54 (0.07) |
Asymptotic | 0.70 (0.03) | 1.54 (0.07) |
Equal mean | ||
Unbiased/ELSS | 0.70 (0.03) | 2.04 (0.08) |
minMSE | 0.70 (0.03) | 2.04 (0.08) |
Asymptotic/ELSS | 0.70 (0.03) | 2.04 (0.08) |
Extrinsic noise ( | ||
Unbiased | 0.80 (0.06) | 1.60 (0.10) |
minMSE | 0.80 (0.06) | 1.59 (0.10) |
Asymptotic/ELSS | 0.80 (0.06) | 1.60 (0.10) |
General | 0.53 | 0.51 |
Equal mean | 0.53 | 0.44 |
Other studies have adapted this system and used other reporter combinations that may have markedly different distributions. For example, Yang et al. (2014) used CFP and mCherry with vastly different ranges of intensity values: whereas CFP varied from 0 to 6000 (arbitrary units; i.e. a.u.), mCherry could vary from 0 to 9000 (a.u.); see Figure 3A from their paper. In contrast, Schmiedel et al. (2015) normalized the two reporters used in their experiment (ZsGreen and mCherry) to have the same mean. However, the variances, or more generally, the two distributions, also need to be the same. Since the decomposition of the total noise depends on the assumption that both reporters in the same cellular environment have similar variance (see equations 4 and 5), we recommend that in general a quantile normalization which normalizes the reporter measurements to identical distributions be performed before the calculations of noise components. Such a normalization procedure is standard in many settings requiring similar assumptions.
6.3 Assessing the ratio of extrinsic to intrinsic noise from sample correlation
We have seen from (13) that the proportion of the between-cell variability to total variability is the correlation ρ(C, Y). This leads to a simple approach for estimating the relative magnitude of the two types of noise: one can compute the sample correlation of the expression of the two reporters, ρ(c,y), and the ratio of extrinsic to intrinsic noise is then estimated by ρ(c, y)/[1 − ρ(c,y)].
7 Re-analysis of published two-reporter experiment data
Michael Elowitz and Peter Swain have kindly shared with us their data published in Elowitz et al. (2002). Here we focus on the data in Figure 3A of their paper, which contain the unnormalized fluorescence intensities of CFP and YFP in the E. coli. strain D22 and in strain M22. We normalized the data as follows such that the resulting scatterplots are close to Figure 3A:
where
Elowitz et al. data | Yang et al. data | |||
---|---|---|---|---|
D22 | M22 | Figure 3A | Normalized on log2 | |
Sample means | CFP: 1 | CFP: 1 | CFP: 2660 | CFP: 11 |
YFP: 1 | YFP: 1 | mCherry: 3986 | mCherry: 11 | |
Sample correlation | 0.50 | 0.49 | 0.86 | 0.86 |
Intrinsic noise | ||||
General | ||||
Unbiased | 0.79 | 0.36 | 5.44 | 0.11 |
minMSE | 0.78 | 0.35 | 5.44 | 0.11 |
Asymptotic | 0.78 | 0.35 | 5.44 | 0.11 |
Equal mean | ||||
Unbiased/ELSS | 0.78 | 0.35 | 13.72 | 0.11 |
minMSE | 0.78 | 0.35 | 13.72 | 0.11 |
Asymptotic/ELSS | 0.78 | 0.35 | 13.72 | 0.11 |
Extrinsic noise | ||||
Unbiased | 0.78 | 0.34 | 30.29 | 0.68 |
minMSE | 0.76 | 0.33 | 30.29 | 0.68 |
Asymptotic/ELSS | 0.77 | 0.34 | 30.29 | 0.68 |
General | 0.50 | 0.49 | 0.85 | 0.86 |
Equal mean | 0.50 | 0.49 | 0.69 | 0.86 |
Nam Ki Lee and Sora Yang have also kindly shared with us their data published in Yang et al. (2014). Here we analyze the data in Figure 3A of their paper, which are the expression levels (intensities) of two reporters, CFP and mCherry (also see Sec. 6.2). The shared, unnormalized intensities have very different sample means (Table 4). Application of the estimators in Table 1 to these data gives two different estimates of the intrinsic noise, with the ELSS estimate being nearly three times the estimates under the equal mean assumption. To normalize the data, we removed the few negative values, log2 transformed the data, and quantile normalized between the two reporters (see summary statistics in Table 4). Applying our estimators to the normalized data, all estimates are consistent with one another. This analysis illustrates the importance of the equal mean assumption: when this assumption is not satisfied, the ELSS estimator leads to overestimation of the intrinsic noise.
Additionally, we subsampled from these data sets and assessed the performance of the estimators as the sample size decreased. At each sample size, we repeated the subsampling 1000 times and computed the mean and standard deviation of the noise estimates (Table 5). Whereas the means of the estimates do not differ from those obtained using the entire data sets, the variation (measured by the standard deviation) increases quickly with decreasing sample sizes. For the Elowitz et al. data, the standard deviation in the estimates roughly doubles for both types of noise as the sample size halves. Comparing the standard deviation to the mean suggests that 200 is indeed a reasonable sample size for estimates with small variation (compare with their actual sample sizes of 284 and 250 for the two strains). For the Yang et al. data, the increase in the standard deviation is much less drastic, and 200 also appears a decent sample size for reasonably small variation in the estimates.
Elowitz et al. data | Yang et al. data | |||
---|---|---|---|---|
D22 | M22 | Normalized on log2 | ||
Original sample size | 284 | 250 | 40658 | |
n = 200 | ||||
Intrinsic noise | ||||
General | Unbiased | 0.79 (0.06) | 0.36 (0.02) | 0.11 (0.02) |
minMSE | 0.78 (0.06) | 0.35 (0.02) | 0.11 (0.02) | |
Asymptotic | 0.78 (0.06) | 0.35 (0.02) | 0.11 (0.02) | |
Equal mean | Unbiased/ELSS | 0.78 (0.06) | 0.35 (0.02) | 0.11 (0.02) |
minMSE | 0.78 (0.06) | 0.35 (0.02) | 0.11 (0.02) | |
Asymptotic/ELSS | 0.78 (0.06) | 0.35 (0.02) | 0.11 (0.02) | |
Extrinsic noise | Unbiased | 0.78 (0.07) | 0.34 (0.02) | 0.68 (0.09) |
minMSE | 0.76 (0.07) | 0.33 (0.02) | 0.67 (0.08) | |
Asymptotic/ELSS | 0.78 (0.07) | 0.34 (0.02) | 0.68 (0.08) | |
n = 100 | ||||
Intrinsic noise | ||||
General | Unbiased | 0.79 (0.13) | 0.36 (0.04) | 0.11 (0.03) |
minMSE | 0.77 (0.12) | 0.35 (0.04) | 0.11 (0.03) | |
Asymptotic | 0.78 (0.12) | 0.35 (0.04) | 0.11 (0.03) | |
Equal mean | Unbiased/ELSS | 0.78 (0.12) | 0.35 (0.04) | 0.11 (0.03) |
minMSE | 0.77 (0.12) | 0.35 (0.04) | 0.11 (0.03) | |
Asymptotic/ELSS | 0.78 (0.12) | 0.35 (0.04) | 0.11 (0.03) | |
Extrinsic noise | Unbiased | 0.77 (0.14) | 0.34 (0.05) | 0.69 (0.12) |
minMSE | 0.73 (0.14) | 0.32 (0.05) | 0.67 (0.12) | |
Asymptotic/ELSS | 0.76 (0.14) | 0.34 (0.05) | 0.68 (0.12) | |
n = 50 | ||||
Intrinsic noise | ||||
General | Unbiased | 0.78 (0.21) | 0.36 (0.07) | 0.11 (0.04) |
minMSE | 0.75 (0.20) | 0.35 (0.07) | 0.11 (0.04) | |
Asymptotic | 0.77 (0.20) | 0.35 (0.07) | 0.11 (0.04) | |
Equal mean | Unbiased/ELSS | 0.78 (0.21) | 0.36 (0.07) | 0.11 (0.04) |
minMSE | 0.75 (0.20) | 0.34 (0.07) | 0.11 (0.04) | |
Asymptotic/ELSS | 0.78 (0.21) | 0.36 (0.07) | 0.11 (0.04) | |
Extrinsic noise | Unbiased | 0.78 (0.24) | 0.34 (0.09) | 0.68 (0.16) |
minMSE | 0.70 (0.24) | 0.30 (0.09) | 0.65 (0.15) | |
Asymptotic/ELSS | 0.76 (0.23) | 0.33 (0.09) | 0.66 (0.16) |
8 Conclusions and discussion
Our hierarchical model for Elowitz et al. (2002) provides statistically interpretable parameters representing intrinsic and extrinsic noise, and allows for the derivation of estimators with optimality guarantees. Furthermore, the model highlights experimental assumptions that need to be satisfied for the estimators to be valid, specifically that the two reporters need to have the same distribution (within a cell) and hence normalization may be necessary. Whereas similar hierarchical models have been proposed before to study heterogeneity among single cells (see, e.g. Finkenstädt et al., 2013, and Koeppl et al., 2012), our hierarchical model explicitly parameterize the two types of noise, and reveals their equivalence to other quantities, as indicated by (11) and (14), which enable derivation of closed-form estimators of these parameters (summarized in Table 1). We use bias and MSE to explicitly evaluate the performance of different estimators, and recognize the asymptotic equivalence of multiple estimators.
Other experiments have been set up to explore and assess intrinsic and extrinsic noise, and some of our results may be useful in those settings. For example, Volfson et al. (2006) used a single reporter but two Saccharomyces cerevisiae strains, with one strain containing only one copy of the reporter, and the other strain two copies. Assuming no strain effect, which may be thought of as batch effect, the authors applied the following estimators for (unscaled) intrinsic and extrinsic noise (consistent with their notation, and without the denominator of
where V1 and V2 are the variance in the 1-copy and 2-copy strains, respectively, and Vi and Ve are intrinsic and extrinsic noise, respectively. These estimators are in fact consistent with (11) and (14) under our hierarchical model:
Together, (19) and (20) give rise to (17) and (18). Note that (19) and (20) imply that the extrinsic noise is also the covariance here, except that the covariance is between the 1-copy and 2-copy strains with the same reporter; this is also pointed out by Sherman et al. (2015). Additionally, the total (marginal) noise of the reporter is the sum of intrinsic and extrinsic noise (19). However, consistent with our analysis of the assumptions of the hierarchical model, these estimators hold only when the variance for each single copy in the 2-copy strain is identical to that in the 1-copy strain. This is equivalent to assuming no strain (batch) effect, which can be a rather strong assumption.
We note that during the preparation of this manuscript, Erik van Nimwegen independently examined the Elowitz et al. (2002) paper form a Bayesian point of view (van Nimwegen, 2016).
Acknowledgement
This project began as a result of discussion during a journal club meeting of Prof. Jonathan Pritchard’s group that A.F. was attending. We thank Michael Elowitz, Peter Swain, Nam Ki Lee and Sora Yang for sharing their data from Elowitz et al. (2002) and from Yang et al. (2014), respectively. We also thank helpful comments we have received since posting the manuscript online. In particular, we thank Arjun Raj for bringing up the 1- vs 2-copy experiment, and Erik van Nimwegen for helpful discussions. We also thank Editor in Chief Prof. Michael Stumpf and two anonymous reviewers for insightful comments that led to a significantly enriched version. A.F. was partially supported by K99 HG007368 and R00 HG007368 (NIH/NHGRI). L.P. was partially supported by NIH grants R01 HG006129 and R01 DK094699.
A Moments of Mi and Ci under normality
Assuming that
We can compute the third and fourth moments of Mi as follows:
which gives
which gives
For the random variable Ci, since
we have
Further assuming that μ = 0, i.e. the means are all 0, and that ϵ = 0, which means that the variability is the same across cells, we have
and
B Calculating V a r [ S e x t ]
B.1 Calculating V a r [ ∑ i = 1 n C i Y i ]
where
and
Therefore,
B.2 Calculating Var[nC ¯ Y ¯ ]
Assuming normality on Mi and assuming that μ = 0 and ϵ = 0 (constant variance across cells), we have
Also,
Under the assumptions made above, we have
If i = k,
Similarly, we can derive that the covariance is 0 for other cases where j = l or where i ≠ k and j ≠ l. Hence,
Additionally, under the normality assumption and with μ = 0 and ϵ = 0,
Therefore,
B.3 Calculating C o v [ ∑ i = 1 n C i Y i , n C ¯ Y ¯ ]
Putting the terms above together, we have
C MSE of the general intrinsic noise estimator
The general form of the estimator for intrinsic noise is
C.1 Calculating Var[S]
Thus
Below we will assume normality, as well as μ = 0 and ϵ = 0, to facilitate the derivation. Note that
C.1.1 Calculating V a r [ ( C ¯ − Y ¯ ) 2 ]
First, we note that
This is because
Additionally, from Appendix B, we have
For
and
we have
For
and
we have
Additionally,
Therefore,
Furthermore,
In the expression above,
Then we have
Putting the terms together, we have
C.1.2 Calculating C o v [ ∑ ( C i − Y i ) 2 , ( C ¯ − Y ¯ ) 2 ]
Next, we note that
where
and
Additionally,
Therefore,
So we have
The variance of the estimator is then
C.2 Calculating E[S]
The expectation of the estimator is
where
and
Hence,
C.3 Calculating the MSE
The MSE of the estimator is then
The value of a that minimizes this MSE is
D Calculating V a r [ S ~ i n t ]
The individual terms can be computed as follows:
Assuming normality, we have
Assuming additionally that μ = 0 and ϵ = 0, we have
Since Ci and Yi are symmetrically defined, we have
Next, from Appendix B,
Assuming normality, we have
Assuming additionally that μ = 0 and ϵ = 0, we have
The covariance terms are computed as follows:
Assuming normality, we have
Assuming additionally that μ = 0 and ϵ = 0, we have
Finally, since Ci and Yi are symmetrically defined, we have
where
Assuming normality, we have
and therefore,
Assuming additionally that μ = 0 and ϵ = 0, we have
Putting the terms together, we derive the variance as follows, assuming that Mi follows a normal distribution,
Assuming additionally that μ = 0 and ϵ = 0, we have
E Summary of mean and variance of the estimators
We summarize the mean and variance of the estimators in Table 6.
Estimator | Mean | Variance |
---|---|---|
Intrinsic noise | ||
General | ||
Equal mean | ||
Extrinsic noise | ||
References
Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): “Stochastic gene expression in a single cell,” Science, 297, 1183–1186.10.1126/science.1070919Search in Google Scholar PubMed
Finkenstädt, B., D. J. Woodcock, M. Komorowski, C. V. Harper, J. R. Davis, M. R. White and D. A. Rand (2013): “Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: an application to single cell data,” Ann. Appl. Stat., 7, 1960–1982.10.1214/13-AOAS669Search in Google Scholar
Hayes, K. (2011): “A geometrical interpretation of an alternative formula for the sample covariance,” Am. Stat., 65, 110–112.10.1198/tast.2011.09067Search in Google Scholar
Hilfinger, A. and J. Paulsson (2011): “Separating intrinsic from extrinsic fluctuations in dynamic biological systems,” Proc. Natl. Acad. Sci. USA, 108, 12167–12172.10.1073/pnas.1018832108Search in Google Scholar PubMed PubMed Central
James, W. and C. Stein (1961): “Estimation with quadratic loss,” Proc. Fourth Berkeley Symp. Math. Stat. Prob., 1, 361–379.10.1007/978-1-4612-0919-5_30Search in Google Scholar
Koeppl, H., C. Zechner, A. Ganguly, S. Pelet and M. Peter (2012): “Accounting for extrinsic variability in the estimation of stochastic rate constants,” Int. J. Robust Nonlin., 22, 1103–1119.10.1002/rnc.2804Search in Google Scholar
Komorowski, M., J. Mie¸kisz and M. P. Stumpf (2013): “Decomposing noise in biochemical signaling systems highlights the role of protein degradation,” Biophys. J., 104, 1783–1793.10.1016/j.bpj.2013.02.027Search in Google Scholar PubMed PubMed Central
Rausenberger, J. and M. Kollmann (2008): “Quantifying origins of cell-to-cell variations in gene expression,” Biophys. J., 95, 4523–4528.10.1529/biophysj.107.127035Search in Google Scholar PubMed PubMed Central
Schmiedel, J. M., S. L. Klemm, Y. Zheng, A. Sahay, N. Blüthgen, D. S. Marks and A. van Oudenaarden (2015): “MicroRNA control of protein expression noise,” Science, 348, 128–232.10.1126/science.aaa1738Search in Google Scholar PubMed
Sherman, M. S., K. Lorenz, M. H. Lanier and B. A. Cohen (2015): “Cell-to-cell variability in the propensity to transcribe explains correlated fluctuations in gene expression,” Cell Syst., 1, 315–325.10.1016/j.cels.2015.10.011Search in Google Scholar PubMed PubMed Central
Stegle, O., S. A. Teichmann and J. C. Marioni (2015): “Computational and analytical challenges in single-cell transcriptomics,” Nat. Rev. Genet., 16, 133–145.10.1038/nrg3833Search in Google Scholar PubMed
van Nimwegen, E. (2016): “Inferring intrinsic and extrinsic noise from a dual fluorescent reporter,” bioRxiv 049486; doi: http://dx.doi.org/10.1101/049486.Search in Google Scholar
Volfson, D., J. Marciniak, W. J. Blake, N. Ostroff, L. S. Tsimring and J. Hasty (2006): “Origins of extrinsic variability in eukaryotic gene expression,” Nature, 439, 861–864.10.1038/nature04281Search in Google Scholar PubMed
Yang, S., S. Kim, Y. R. Lim, C. Kim, H. J. An, J.-H. Kim, J. Sung and N. K. Lee (2014): “Contribution of RNA polymerase concentration variation to protein expression noise,” Nat. Commun., 5, 4761.10.1038/ncomms5761Search in Google Scholar PubMed
©2016 Walter de Gruyter GmbH, Berlin/Boston