Steady-state fluctuations of a genetic feedback loop with fluctuating rate parameters using the unified colored noise approximation

A common model of stochastic auto-regulatory gene expression describes promoter switching via cooperative protein binding, effective protein production in the active state and dilution of proteins. Here we consider an extension of this model whereby colored noise with a short correlation time is added to the reaction rate parameters -- we show that when the size and timescale of the noise is appropriately chosen it accounts for fast reactions that are not explicitly modelled, e.g., in models with no mRNA description, fluctuations in the protein production rate can account for rapid multiple stages of nuclear mRNA processing which precede translation in eukaryotes. We show how the unified colored noise approximation can be used to derive expressions for the protein number distribution that is in good agreement with stochastic simulations. We find that even when the noise in the rate parameters is small, the protein distributions predicted by our model can be significantly different than models assuming constant reaction rates.


Introduction
Proteins perform a large range of cellular functions and hence it is of great interest to understand how the genes that produce them operate. Autoregulation is a mechanism to regulate gene expression whereby proteins expressed by a certain gene can subsequently bind to the same gene and cause an increase or a decrease in its expression (positive and negative feedback, respectively) [1]. Autoregulation is common; for example in E. coli it is estimated that 40% of all transcription factors are self-regulated [2,3].
For almost two decades, it has been known that gene expression is inherently stochastic [4], and as such the modelling of gene regulatory networks must account for this stochasticity. Noise in gene expression can be split into two contributions: (i) Fluctuations in the number of messenger RNA (mRNA) and proteins due to inherent randomness in the time at which the processes of transcription and translation occur. This is often described as intrinsic noise. For example, if a gene is always active and produces mRNA with rate parameter k then the time between successive transcription events is an exponentially distributed random variable with mean 1/k. (ii) Fluctuations in the number of mRNA and proteins due to fluctuations in the rate parameters themselves. For example the transcription rate k mentioned previously, while often taken to be a constant, it is generally a fluctuating quantity itself, because of fluctuating numbers of polymerases and transcription factors. Another example would be fluctuations in the protein numbers due to fluctuations in the translation rate stemming from fluctuating numbers of ribosomes in the cell [4,5]. This noise is often termed extrinsic noise.
The division of noise into these two categories is of course artificial but it is useful from a conceptual and modelling point of view. The simulation of stochastic biochemical processes is most commonly done using the stochastic simulation algorithm (SSA) [6] which assumes that the rate parameter of a reaction will not change in the interval between two successive reaction events. Hence the prevalent means of stochastic simulation assumes that noise is either principally intrinsic or that if there is extrinsic noise it is operating on long timescales such that it is slowly varying. While this may be the case sometimes, it is not generally true. This is because whenever we have an effective reaction that lumps together a large number of intermediate reactions (a multi-stage reaction process), we are making the inherent assumption that these intermediate reactions occur very fast and hence naturally the effective rate parameter is fluctuating on a fast timescale.
Taking into account these fluctuations is however not a simple feat. The chemical master equation (CME, [7,8]) describing the Markov process simulated by the SSA has been solved exactly or approximately to obtain the protein number distribution in steady-state for a wide variety of models of autoregulation [9][10][11][12][13][14][15][16][17][18][19][20][21], provided the rate parameters are assumed to be constant, i.e., ignoring extrinsic noise. There are however a number of studies that have analyzed stochastic models with fluctuating rate parameters, which we summarize next. Modifications of the linear noise approximation (a type of Fokker-Planck approximation of the CME) incorporating extrinsic noise have proved popular to approximate moments for systems subject to small magnitudes of extrinsic noise with certain properties: (i) for time-independent Gaussian colored noise [22,23] and (ii) more realistic lognormally distributed noise [24]. Wentzel-Kramers-Brillouin (WKB) methods have also been utilised for cases where the correlation time of the colored noise is tending either to zero or to infinity [25]. These methods provide probability distributions for systems where the noise on the rate parameters is drawn from a negative binomial distribution, however their analysis does not easily translate to finding good approximations for steady states probability distributions where the correlation time of colored noise is neither small or large.
The focus of the present article is threefold: (i) to provide a general method by which one can obtain analytical expressions for the steady-state protein distributions of auto-regulatory gene circuits with fluctuating rate parameters, through the use of the unified colored noise approximation (UCNA) [26], (ii) to use this method to investigate the effects that extrinsic noise of different magnitude and timescales has on auto-regulatory gene expression and (iii) to show how the colored noise formalism can be used to describe complex models of autoregulation that involve multi-stage protein production and multi-stage protein degradation. We note that the UCNA was previously utilised in a gene expression context [27] for linear reaction networks that are deterministically monostable and in which there is no feedback mechanism. Our analysis goes further, exploring the addition of colored noise to a nonlinear reaction network which expresses deterministic bistability, whilst also incorporating intrinsic fluctuations from the core gene expression processes.
The structure of our paper is as follows. In Section 2 we introduce the cooperative auto-regulatory reaction scheme that we will study in this article. We also show that for non-fluctuating rate parameters, the analytical protein distribution given by the chemical Fokker-Planck equation provides an excellent approximation of the protein distribution solution of the CME, in the limit of fast gene switching. In Section 3 we add colored noise to each reaction in the auto-regulatory reaction scheme (assuming fast gene switching) and use the UCNA to derive the protein number distribution solution of the chemical Fokker-Planck equation. The solution is shown to be in good agreement with a stochastic simulation algorithm modified to account for extrinsic noise on the rate parameters. We also use the solution to investigate the effect that extrinsic noise has on the number of modes of the protein distribution and clarify the limits of the UCNA derivation, including the three main conditions which cause it to breakdown. In Section 4 we extend the analysis to the limit of slow gene switching by introducing a conditional version of the UCNA. In Section 5 we show two examples of how one can successfully model complex auto-regulatory systems by means of simpler ones with colored noise on the reaction rate parameters, here done for multi-stage protein production and multi-stage degradation.
We conclude in Section 6 with a discussion of our results and further problems to be addressed on this topic.

Approximate solution for autoregulation with non-fluctuating rate parameters
We consider the reaction scheme for a genetic non-bursty cooperative feedback loop, where for simplicity we neglect the presence of mRNA: The reactions G ru −→ G + P and G * r b − → G * + P model the production of protein in each gene state, G * models the binding and unbinding of the gene to the proteins (with cooperativity 2), and P d − → ∅ models the dilution/degradation of proteins inside the cell. It is assumed there is only one gene present in the system and hence we are either in state G or G * at any one time. Before considering the addition of colored noise to the reaction rate parameters above, we first consider the solution with constant rate parameters to provide a reference point for approximations made in Section 3, and to clarify the approximation of a CME by a one variable chemical Fokker-Planck equation (FPE).
The CME for the reaction scheme in Eq. (1) does not have a known exact solution, even at steadystate for constant reaction rate parameters, and so approximations are necessary. Note that in what follows, we will use the terminology "reaction rate parameters" and "rates" interchangeably. We first consider the limit of fast gene switching -i.e., the frequency of gene activation and inactivation events is much larger than the frequency of any other reaction in the system. Later in Section 4 we will discuss approximations for the slow switching limit. Where [g * ] and [g] are the deterministic mean number of bound and unbound gene respectively and [n] is the mean protein number, the rate equations for the reaction scheme in Eq. (1) are: where L = s u /s b . Note that the reaction scheme here described exhibits deterministic bistability over some regions of the parameter space. This equation is consistent with a birth-death process where proteins are produced via a zeroth-order reaction (which is dependent on the number of proteins) and are destroyed by a first-order reaction [28]. The CME for this reduced process is given by: dP a (n, t) dt = T + (n − 1)P a (n − 1, t) + T − (n + 1)P a (n + 1, t) − (T + (n) + T − (n))P a (n, t), where P a (n, t) is the probability that at a time t there are n proteins in the system; T + (n) and T − (n) are the propensities of protein production and degradation respectively. The subscript a denotes that this is the probability for the reduced system, an approximate solution to the master equation of the full system. T + (n)dt is the probability, given n proteins are in the system, that a protein production reaction occurs, increasing the protein number of the system by 1, in the time interval [t, t + dt). Similarly, T − (n)dt is the probability, given n proteins are in the system, that a protein degradation event occurs, decreasing the protein number by 1, in the time interval [t, t + dt). These propensities are given by: T − (n) = d n.
These propensities are deduced directly from the form of the effective rate equation in Eq. (4). Essentially, the probability for a particular reaction per unit time is taken to be the same as the reaction rate in the effective deterministic rate equation with [n] replaced by n. We emphasise that while this appears to be a heuristic rule with no apparent fundamental microscopic basis, it has been shown that the reduced master equation based on it provides an accurate approximation to the SSA of the full reaction system in fast gene switching conditions provided the low protein number states are rarely visited [14,28] . The exact steady state solution of the one variable master equation given by Eq. (5) can be found using standard methods [29]: where P a (0) is the steady state probability evaluated at n = 0 (acting effectively here as a normalisation constant). We can further approximate the reduced master equation in Eq. (5) by the chemical FPE (via the Kramers-Moyal expansion) [29,30]: where a 1 (n) and a 2 (n) are the first two jump moments, given by a 1 (n) = T + (n) − T − (n) and a 2 (n) = T + (n) + T − (n) respectively, and P (n) denotes the FPE solution (a notation used throughout the paper). The purpose of this further approximation by means of a FPE will be made clear in Section 3.1. Eq. (9) has a steady state solution of the form [8]: where N is a normalisation constant. Although the integral in the exponent of Eq. (10) can be solved exactly with propensities of the form of Eq. (6) and Eq. (7) since it is the integral of the ratio of two cubic polynomials, the solution is too complicated to be detailed here. The approximations made by the FPE approximation are that (i) fluctuations in the protein number are small and (ii) we are in the fast switching regime between the gene states. Fig. 1 compares the FPE solution Eq. (10) with the solution of the heuristic CME in Eq. (8) and the solution of the full CME of the reaction scheme in Eq. (1) using the finite space projection method (FSP) [31]. Note that provided the state space is truncated large enough, the FPE solution matches the solution of the heuristic CME almost exactly. Clearly, when gene switching is fast (bottom plot of Fig. 1) all three solutions agree with each other. However, when gene switching is not fast (top and middle plots on Fig. 1) both the reduced CME and FPE solutions are a poor approximation of the true distribution from FSP.  3 Accounting for fluctuating rates using the Unified Colored Noise Approximation Fluctuating rate parameters can be used to include a description of processes not explicitly taken into account in the formulation of a model. In Fig. 2 we illustrate this idea. In this section, we add fluctuations to the rate parameters of the FPE description derived earlier and use the UCNA to obtain a new effective FPE that is valid when the timescale of the noise on the rates is either very small or very large.

Fluctuating degradation rate
We begin by considering the case of a fluctuating degradation rate. These fluctuations could for example stem from details of the degradation machinery that are not explicitly described in the model, e.g multi-step degradation mediated by enzymatic reactions. propensities from Eqs. (6) and (7) is given by [8,30]: where Γ(t) is Gaussian white noise with zero mean and correlator Γ(t)Γ(t ) = δ(t − t ). Now we introduce a fluctuating degradation rate by setting d = d 0 (1 + η(t)), where η(t) is Gaussian colored noise with a mean of zero and correlator η(t)η(t ) = (D/τ ) exp(−|t − t |/τ ) [26,32]. Here, τ is the correlation time of the colored noise, D/τ is the noise strength (the variance of fluctuations) and d 0 is the mean degradation rate. Since D/τ is the noise strength, i.e., D scales the noise strength at constant τ , we occasionally refer to D itself as the noise strength (where τ is a fixed parameter). In the limit of τ → 0 colored noise becomes white noise since lim τ →0 η(t)η(t ) = Dδ(t − t ). Note that |η(t)| 1 such that d is a positive quantity (and hence admits physical interpretation as a rate parameter). The inclusion of colored noise can be shown to be equivalent to the following two component system [26]: where θ(t) is Gaussian white noise with zero mean and correlator θ(t)θ(t ) = 2Dδ(t − t ), and the time dependence on the protein number n(t) and noise η(t) is suppressed for notational convenience. Note that in the argument of the square root above we have replaced η(t) by its mean of zero; this constitutes a mean-field type of approximation, and is necessary such that one can solve Eqs. (12)- (13) analytically -however, where the noise is small this is generally a very good approximation. Note that we also use this mean field assumption in Sections 3.2 and 3.3. For transparency, we rewrite Eqs. (12)-(13) as: with In order to approximately solve Eqs. (14)-(15) we next utilize the UCNA to obtain reduced Langevin equations when the noise η is either very fast or very slow. For completeness, we present a nonrigorous but intuitive proof of the UCNA along the lines found in [26] which essentially consists of a direct adiabatic elimination on the stochastic differential equations (SDEs) in Eqs. (14)- (15). For a more rigorous derivation of a UCNA-like FPE we advise reader to read the seminal work of Fox, who introduced a functional calculus approach to the study of colored noise SDEs [33][34][35][36]. A review of the differing UCNA-like derivations can be found in [37]. It has been discussed in [26,37] that the adiabatic elimination we employ below is exact for τ → 0 (white noise) or τ → ∞ (highly correlated noise) but that it should give a useful approximation for intermediate values of τ . We note that the theory provided by Roberts et al. [25] does not provide such a result as they consider separately the cases of τ → 0 and τ → ∞. For us, the limit of τ → ∞ is not of biological interest, and we will later focus on the limit of τ small, although the derivation shown here holds for large τ too. First, where we use overdots to represent derivatives with respect to time t, one should proceed in rearranging Eq. (14) for η: In what follows we will utilise a mean-field approximation (denoted by the subscript mf) to approximately calculate the time derivative of η(n,ṅ). We start by defining the mean-field approximation of η(n,ṅ) as: Taking the time derivative with respect to non-dimensional timet = t/τ (denoted by the overdot) we obtain: where the prime on each function of n mf denotes the derivative with respect to n mf . In the limit of τ → 0, the second term on the right hand side of Eq. (21) goes to infinity and hence the only way to keep the time derivative finite is to impose the condition: This then implies that in this limit we have: Note that taking the limit of τ → ∞ gives the same result and hence the approximation Eq. (23) is valid in both the limit of small and large τ . This can be shown to be self-consistently true; taking the time-derivative of Eq. (14) alongside a mean-field approximation we get, Assuming Eqs. (20) and (23) to be true one then recovers In Eq. (15) we can now substitute η from (19) andη mf forη from Eq. (23) giving us the UCNA for the system with colored noise on the degradation rate, which is exact in the limits τ → 0 or τ → ∞: where Note that we have dropped off the mf subscript for clarity. Finally, in order to get a simplified Langevin equation, we modify Eq. (26) such that we only have one effective Gaussian white noise term. We begin by proposing: whereΓ(t) is Gaussian white noise with mean zero and correlator Γ (t)Γ(t ) = 2δ(t − t ), and then use relations between the correlators to find our unknown g(n). Note that we assume zero correlation between Γ(t) and θ(t), i.e. Γ(t)θ(t ) = Γ(t )θ(t) = 0. Explicitly, utilising the correlators, we find: which gives us Hence, our final reduced Langevin equation is given by: which corresponds to the result in [32]. Note that Eqs. (26) and (31) are identical. Here we pause to make a couple of comments on C(n, τ ), which can be interpreted as a renormalisation of the Langevin equation in Eq. (14) to account for the addition of colored noise to the rate parameters. In fact, when τ = 0, Eq. (31) recovers the correct Langevin equation for a process with white noise on the rate parameters. One should also note the independence of C(n, τ ) from the strength of the noise D; the renormalisation with respect to the addition of colored noise on the degradation rate is not specific to the size of the noise, it simply accounts for the finite correlation time.
The FPE corresponding to this SDE should be chosen in the Stratonovich form, following from [35,38,39], as this is the physical implementation of an SDE with colored noise having a non-zero correlation time τ . This FPE is: whereh(n) = h(n)/C(n, τ ) andg(n) = g(n)/C(n, τ ). Following Eqs. (9)-(10) in Section 2 and [8], the steady state solution to this equation is then given by: where N is the normalisation constant, chosen over the domain n ∈ [0, ∞).
To test the accuracy of the distributions for colored noise provided by the UCNA in Eq. (33), we compare the UCNA solution to a distribution produced from a modified SSA that explicitly accounts for the colored noise on the degradation rate. This modification is given in full detail in Appendix A. Essentially, the dilution/degradation reaction P → ∅ is replaced by three new reactions alongside the introduction of a ghost species Y , these being (i) ∅ − − Y and (ii) P + Y → Y . The rates of these new reactions are then chosen to ensure the magnitude of effective external noise on the degradation reaction, due to fluctuations in molecule numbers of the ghost species, match the colored noise SDE given in Eq. (13). Fig. 3 shows steady state probability distributions produced by the UCNA for various values of D for a deterministically bistable set of parameters. The UCNA correctly captures the shift of the probability mass from the equilibrium point of higher molecule number (referred to as the upper mode) to the lower equilibrium point (referred to as the lower mode) as D is increased. Importantly, this shows that when gene switching is assumed to be fast, colored noise can induce bimodality -one should keep this in mind for when we look at slow gene switching in Section 4. Readers should also note that the parameter choices have been selected such that the Fokker-Planck approximation is good, notably that the system size is large, i.e., Ω 1, and the mean number of proteins in the system is also large. In all cases D/τ < 1 so that the degradation rate remains positive. The behaviour seen as D increases in Fig. 3 can be explained as follows. When D is small (Fig. 3A) the colored noise η in Eq. (15) is also small compared to the mean number of molecules in the system, and the noise cannot force the system out of the upper mode. As D gets larger (Figs. 3B and 3C) the fluctuations η at the upper mode also become larger, allowing the system to explore the lower mode. When the system is found in the lower mode the pre-factor of the coloured noise in Eq. (14), g 1 (n) ∝ n, is lesser in magnitude, and the fluctuations in η are much smaller than when the system inhabits the upper mode hence the increased probability mass at the lower mode. That the system is less noisy at the lower mode means that the system struggles more to get a fluctuation large enough to propel it into the upper mode. These properties of the system as D increases can be further seen through (i) the increase in probability mass found at the lower mode as D increases thoroughout all of Fig 3(A-D), and (ii) the increased probability mass found in the tail of the distribution for large n (Fig. 3D); while the tail is very slowly decaying it is still exponential and hence the distribution is not heavy-tailed (see the inset of Fig. 3D). This ability to induce bimodality through a more detailed description of the details of the degradation process is important in the context of cellular decision-making. It is hence possible for regions of the reaction rate parameter space previously thought unable to induce multiple phenotypic states to do so with an increasing influence of more complex degradation mechanisms. Note that for the majority of cases in Fig. 3, the UCNA provides a much better approximation than the white noise approximation, hence one cannot simply assume that since the correlation time τ is relatively small that it can be approximated as zero. Fig. 4 shows how the UCNA responds to increasing correlation time τ while the noise strength, D/τ , remains fixed. For all cases where τ is small, the UCNA performs very well. As τ increases however the UCNA starts to predict ever increasing negative probabilities for some values of n. Notably though, are chosen to be large compared to other system parameters such that the frequency of gene activation and inactivation events is much larger than the frequency of other reaction events, i.e. the fast gene switching assumption. Note that for this choice of rate parameters, the rate equations are bistable with equilibrium points at n = 47.4, 360.4. The criterion D/τ < 1 is required to ensure positivity of the degradation rate. As the extrinsic noise is increased, the mass of the distribution shifts from the mode at 360.4 to the mode at 47.4. The inset of D shows the same distribution but with the y-axis on a log scale, emphasising the exponential tail of the distribution for large n. SSA data in each case comes from a single steady state trajectory of 9 × 10 6 s. Fig. 4B shows that even where significant negative probability is predicted at large τ , the UCNA still manages to capture the rest of the distribution. This negativity of C(n, τ ) is commented on in both [26] and [35]. The former deals with this negativity by taking the absolute magnitude of the pre-factor of the exponential in Eq. (33), while the latter comments that the proof of their UCNA-like FPE is only formally valid where C(n, τ ) > 0, ∀ n. Here we choose not to take the magnitude of the pre-factor in Eq. (33), since although this leads to a positive probability for all n it is nonetheless a poor approximation; but we take careful note of the comment made by Fox in [35], as this indicates where the UCNA will perform well. The intuition behind the argument of Fox can be stated as: if for some n, C(n, τ ) < 0 there must be a transitory value of n for which C(n, τ ) = 0, at this point the Eq. (31) becomes physically ill-defined and our solution is invalid.
Finally, we observe that the parameter values chosen for both plots in Fig. 4 correspond to deterministically monostable systems. The bimodality that is observed in Fig. 4 is hence noise induced bimodality. The mode that appears for small τ corresponds to the deterministic equilibrium point, whereas the noise induced mode does not correspond to an equilibrium point of the deterministic system. We notice that the ability to exhibit a noise induced mode as τ becomes large is especially true for monostable parameter sets which are in close proximity to bistable parameter sets in the parameter space. This can be explained by occasional jumps between the monostable and bistable regimes due to sufficiently large fluctuations in the degradation rate. Hence a measure of the distance here is the difference in the magnitude of d 0 needed such that the system is deterministically bistable divided by is increased at constant noise size D/τ . Note that the y-axis shows P (n)/pmax, where P (n) is defined in Eq. (33) and pmax is equal to the maximum value of P (n). (A) Shows the performance of increasing τ for a system with parameters ru = 20, r b = 250, d 0 = 1, su = 3 × 10 2 , s b = 10 3 and Ω = 100. Deterministically this system is monostable with an equilibrium point at n = 194.7, however as τ is increased a shift towards a lower mode is observed. When τ is sufficiently large, the UCNA predicts a negative probability. (B) Shows similar to (A) but with parameters ru = 25, r b = 480, d 0 = 1, su = 8 × 10 2 , s b = 10 3 and Ω = 200. This too is a deterministically monostable system with equilibrium point n = 406.0. As τ increases, the breakdown of the UCNA is more apparent than for (A) with the prediction of negative probability for small n more drastic. Both (A) and (B) show that unless τ is large, while D/τ is small, the UCNA provides a very good approximation, even where the colored noise induces bimodality in deterministically monomodal systems. SSA data in each case comes from a single steady state trajectory of 9 × 10 5 s.
the noise strength, defined as ∆d 0 = |d 0 − d c |/(D/τ ), where d c is the closest value of the mean degradation rate to d 0 expressing bistability. For example, the parameter set chosen in Fig. 4A, although monostable, is very close to a parameter set that exhibits deterministic bistability (∆d 0 = 2.12). On the other hand, the parameter set of Fig. 4B is far from the bistable parameter regime (∆d 0 = 57.5)and hence the bimodality shown is very limited as τ becomes large. The reason for this noise induced bimodality then can be seen by the ability of a system, through fluctuations in the rate parameters, to access parameter regimes which in fact do exhibit deterministic bistability. Importantly, even when it seems bimodality is not induced (e.g., Figs. 3A or 4A), using the extremal equation of P (n) from [32], i.e.,h(n) =g(n)g (n), one can show that the UCNA still predicts the presence of two modes. This explanation of the induced bimodality in cooperative autoregulation is further supported by the lack of noise induced bimodality when colored noise is included on the degradation rate of the FPE describing non-cooperative autoregulation; here the UCNA's extremal equation only ever predicts the existence of one mode for the probability distribution.

Fluctuating effective protein production rates
We now extend the analysis from Section 3.1 to the effective protein production rates. Colored noise on the effective production rates can be used to implicitly model multi-step protein production, including multiple stages of mRNA processing before translation (see Fig. 2). We add colored noise onto the effective protein production rates via, r u = r (0) , which upon substituting in the Langevin equation describing the feedback loop Eq. (11) we obtain the following set of SDEs: where θ 1 (t) and θ 2 (t) are Gaussian white noise terms with zero mean and correlators respectively. Note that here we have used a mean field approximation for the terms under the square root, as was done in Section 3.1. In a similar style to Eq. (28) we now propose a new noise termη(t), which couples η 1 (t) and η 2 (t), satisfying: where is colored noise with zero mean and correlator η(t)η(t ) = e −|t−t |/τ /τ , satisfying the following equation: where where we have assumed that the colored noise on both production rates has the same correlation time but a differing magnitude of noise strength. Using the properties of the correlators of η 1 , η 2 andη we then find: Sharing the notation adopted in Section 3.1, we define the following: This gives us the following SDE which is coupled to Eq. (38): Then, following the same UCNA procedure as in Eqs. (19)- (26), we obtain the following approximate where In this case it is interesting to note that unlike the case of a fluctuating degradation rate, here C(n, τ ) does depend on both the correlation time τ and the strength of the colored noise D 1 , D 2 (unless D 1 = D 2 in which case there is only dependence on τ ). To simplify Eq. (43) further, we again propose: whereΓ(t)Γ(t ) = 2δ(t − t ), and find using the correlators that g(n) = F (n) 2 + g 2 (n) 2 /2. This leads to the final approximate SDE:ṅ which is identical in notation to Eq. (31) but where h(n), C(n, τ ) and g(n) are all defined in this section. The equivalent FPE for this SDE is then: Again our solution for the probability distribution will then be: withh(n) = h(n)/C(n, τ ) andg(n) = g(n)/C(n, τ ). We now describe the modified SSA that takes into account extrinsic noise on the effective protein production rates. This modification replaces the protein production reaction in each gene state, i.e., G k → G k +P where G k represents either G or G * , by three new reactions alongside the introduction of a ghost species Y k for each gene state. These new reactions are ∅ Utilising the LNA (assuming Y k to be abundant), as was done for colored noise on the degradation rate in Appendix A, one finds these rates to be r 1 = 1/(D k Ω), r 2 = 1/τ and r 3 = r in G and G * respectively. Figure 5 shows a good performance of the UCNA when compared to the modified SSA described above. This performance is shown for each differing qualitative behaviour expressed by cooperative bimodality, i.e., (i) monostable positive feedback, (ii) bistable positive feedback, and (iii) monostable negative feedback. In all three plots shown the UCNA matches the modified SSA well, and clearly performs better than if one were to approximate the colored noise with white noise (i.e., τ = 0). We find the same qualitative behaviour of the creation and eventual destruction of bimodality (see Fig. 6A(i-iii)) as the noise strengths, D 1 and D 2 , become large for the colored noise on the protein production rates as was found in Fig. 3 for colored noise on the degradation rate. Note that for the chosen parameter set in Fig. 6A that the white noise approximation performs generally very well compared to the UCNA. For τ ≤ 1, the white extrinsic noise approximation can typically perform quite well compared to the modified SSA, but note that this is not always the case (e.g., see Fig. 3).

Fluctuating binding/unbinding rates
Finally, we apply the UCNA to the case of colored noise added to the binding and unbinding rates of the protein to the gene. This could be utilised to implicitly model the effect of multiple gene states in the transition of G to G * , as has been experimentally and theoretically investigated [40][41][42], accounting for DNA looping via distal enhancers or chromatin conformational states. For convenience we define Substituting Eq. (49) in the Langevin equation describing the feedback loop Eq. (11) (and making a mean-field approximation for the terms under the square root) we obtain the following set of SDEs: where θ 1 (t) and θ 2 (t) are Gaussian white noise terms with zero mean and correlators θ 1 (t)θ 1 (t ) = 2D 1 δ(t − t ) and θ 2 (t)θ 2 (t ) = 2D 2 δ(t − t ) respectively. In order to proceed using the UCNA we must linearise the drift term in Eq. (50) with respect to η 1 and η 2 through the small noise approximation η 1 ,η 2 1: For convenience we now define:  Fig. 3 which express deterministic bistability). SSA data in each case comes from a single steady state trajectory of 9 × 10 5 s.
In terms of these new functions Eq. (53) becomes, Following Section 3.2 we then arrive at the UCNA for colored noise on the binding rates where η 1 ,η 2 1: Then, using the properties of the correlators of θ(t) and Γ(t) we arrive at: where, andΓ(t) is Gaussian white noise with mean zero and correlator Γ (t)Γ(t ) = 2δ(t − t ). Here, as for the UCNA applied to the degradation rate, C(n, τ ) is again independent of the strengths of the colored noise terms. This UCNA, as we shall see, should be a good approximation where both D 1 and D 2 are small -by 'small' we explicitly mean that D 1 and D 2 should be smaller than noise strengths used on the UCNA for protein production rates or the degradation rate. The solution to Eq. (61) is given by: withh(n) = h(n)/C(n, τ ) andg(n) = g(n)/C(n, τ ). Now we evaluate the performance of the UCNA on the binding and unbinding rates, and compare it with the modified SSA. In this case the modified SSA replaces the binding and unbinding reactions, where Y 1 and Y 2 are ghost species. The rates of these reactions are determined via the LNA (assuming the ghost species to be numerous) and are r 1 = 1/(D 1 Ω), r 2 = 1/τ , r 3 = s (0) b D 1 Ω/τ , r 4 = 1/(D 2 Ω), r 5 = 1/τ and r 6 = s (0) u D 2 Ω/τ . In Fig. 6B we test the UCNA on the binding and unbinding rates compared to the modified SSA described above. Clearly, the same qualitative behaviour of the creation and destruction of bimodality, as noise strength is increased, is observed, as was also observed for colored noise on the degradation rate (Fig. 3) and protein production rates (Fig. 6A). The resultant expression of bimodality however, is clearly different than for these cases. Notably, this UCNA does ascribe to an additional limitation compared to the UCNA of degradation or production rates; a limitation due to the further small noise approximation made in Eq. (53). This limitation is seen in Fig. 6B(iii), showing that the UCNA applied to the binding and unbinding rates is much more sensitive to increased noise strength than the other UCNA applications. One also observes that the white noise approximation in Fig. 6B performs almost as well as the UCNA (Figs. 6B(i-ii)) or in some cases better than the UCNA (Fig. 6B(iii)); hence, the white noise approximation may be a safer approximation than the UCNA for colored noise applied to the binding and unbinding rates.

Breakdown conditions of the UCNA
Having now applied the UCNA to approximate distributions for colored noise on the (i) degradation rate, (ii) protein production rates and (iii) the binding/unbinding rates, we now assess the conditions which cause the UCNA to breakdown. The application of the UCNA to colored noise on the protein production rates presents a somewhat more complex problem than the application of the UCNA to colored noise on the degradation rate or the binding/unbinding rates; hence, we more easily see that there are three main conditions for the breakdown of the UCNA -conditions beside the large system size or large molecule number requirement needed to approximate the discrete master equation by a one variable FPE. Below we detail these three conditions, in each case explaining why the disagreement occurs. Note that although the analysis of breakdown conditions below is done for the UCNA on the protein production rates, the same arguments hold for the other applications of the UCNA previously presented.

Condition 1
The first of these conditions concerns the positivity condition required on C(n, τ ), that is C(n, τ ) > 0 ∀ n. We refer to this as condition 1. Since we have already discussed this condition in a previous section we will not repeat the discussion here, and refer the reader to Section 3.1. In Fig. 7A(i) we see a disagreement between the UCNA and the modified SSA for a parameter set that exhibits bimodality, and in Fig. 7A(ii) it is verified that this since C(n, τ ) < 0 where n ≈ 100. Note however, that if C(n, τ ) becomes negative outside of the region containing most of the probability mass that the UCNA can still provide a good approximation to the true modified SSA solution.

Condition 2
The second condition observed for the breakdown of the UCNA concerns the violation of the characteristic 'length' scale (the length here being a distance measure in the n space), which we now discuss. In Appendix B we show in more detail why the arguments we present below hold. Based on the noise intensity of the noise term arising from the colored noise in Eq. (43), we can introduce the noting that the requirement of condition 1 means that this length is always positive. Our approximate one variable FPE in Eq. (47) will then be valid under the condition that the drift term varies slowly with respect to L (following Appendix B), meaning that one needs to satisfy L ∂ n h (n) +g(n)g (n) h (n) +g(n)g (n) in order for the UCNA to hold. More succinctly, this condition is: where we henceforth define the function κ(n, τ ) = F (n) ∂ n h (n) +g(n)g (n) h (n) +g(n)g (n) .
We refer to Eq. (66) as condition 2. In Fig. 7B we explore this breakdown for a parameter set that breaks condition 2 over a large region of the parameter space, between 0 < n < 650. Clearly the UCNA is provides a poor approximation in this regime; note however that, similar to condition 1, if condition 2 is violated (i) outside of the domain where most of the probability mass is contained, or (ii) over a small region of the domain containing most of the probability mass, then the UCNA can still provide a good approximation.

Condition 3
The final condition resulting in the breakdown of the UCNA concerns the underestimation of noise. We refer to this as unaccounted peak noise, and this forms our final breakdown condition, condition 3. The explanation behind condition 3 is that the UCNA in general will always underestimate the Poisson noise for a particular value of n, arising from the necessary neglection of Poisson noise terms in the derivation of the UCNA: (i) neglection of the noise terms under the square root of the Poisson noise pre-factor in Eqs. (34) (a form of mean-field approximation), and (ii) neglection of Poisson noise term g 2 (n)Γ(t) and its time derivative from theη term in Eqs. (19)(20)(21) via the use of another mean-field approximation. However, the error on the UCNA caused by condition 3 will be small when colored noise dominates the Poisson noise. To investigate the degree to which colored noise is dominant, identifying F (n)/C(n, τ ) from Eq. (42) and g(n)/C|(n, τ ) from Eq. (46), we define where, for some n, if γ(n) ≈ 1 then Poisson noise dominates, else if γ(n) ≈ 0 then colored noise dominates. Intermediate values of γ(n) mean that both Poisson and colored noise is apparent in the system. To investigate whether noise is underestimated generally over the region containing most of the probability, defined as D = [n min , n max ], we further define Here, if ρ ≈ 1 then Poisson noise dominates over the entire region D, else if ρ ≈ 0 then colored noise dominates over the entire region D.

Large τ UCNA distributions
Having successfully identified the three main conditions causing the breakdown of the UCNA, we are now able to determine where the UCNA will perform well, even in the large τ limit. In Figure 8 we explore an example of the UCNA performing exceptionally well for τ = 10 2 (see Fig. 8(i)). Clearly the UCNA does not violate any of the three conditions here: (1) C(n, τ ) is not negative in D (see Fig.  8(ii)); (2) C(n, τ ) κ(n, τ ) in D (again, see Fig. 8(ii)); (3) γ(n) is small for all n in D (see Fig. 8(iii)) as evidenced by the small value of ρ = 0.001. As expected, the prediction of white noise on the protein production rates is very poor in the regime where τ 1.

Slow gene switching: the conditional UCNA
In the previous sections we have focused on fast gene switching, whereby a Hill function can then be used to approximate the production of proteins from two different gene states, shown in the reaction  1). We now consider the case where the switching rates s u and s b are very small; small enough that the system has two dominant modes of behaviour, one pertaining to each gene state. The approach followed here is very similar to the conditional linear noise approximation (cLNA) studied in [43], but instead of approximating the distribution conditional on each gene state as a Gaussian we instead utilise the UCNA in each gene state. We shall refer to this method as the conditional UCNA (cUCNA). We begin by stating Bayes' theorem for the marginal distribution of proteins that we are interested in approximating: Here G is the set of possible gene state (in our case G = {G, G * }), P (G i , t) is the marginal distribution of being in gene state G i at a time t and P (n|G i , t) is the conditional probability of having n proteins at a time t given that the system is in state G i . Our task now is to find suitable approximations for P (G i , t) and P (n|G i , t) that allow us then to construct an approximation of the full steady state distribution in Eq. (70). In our case we have two different gene states, G and G * , and hence we can construct the reaction schemes conditional on each gene state. The reaction scheme conditional on gene state G is (i) G ru −→ G + P, P d − → ∅, and the reaction scheme conditional on gene state G * is (ii) This then allows us to approximately find the steady state mean number of proteins conditional on each gene state when s u and s b are very small (where the subscript a denotes approximate): n|G a = r u /d and n|G * a = r b /d. We can use these conditional means to find the marginal probabilities of being in a specific gene state at steady state. Note that in this calculation we will ignore the influence of noise on the rate parameters; the inherent assumption is that extrinsic noise does not much influence the probability of being in each gene state. First we write an approximate master equation for the transitions between differing gene states: We can then solve the above equation at steady state (denoted by the subscript s) by utilising conservation of probability, P s (G) = 1 − P s (G * ), giving: Since now we have the P s (G i ) needed for Eq. (70) we need to find the P s (n|G i ) terms. Here we show how to calculate these terms for noise on the degradation rate, although this can be easily extended to the case where we have noise on the protein production rates. In each gene state, the system we are concerned to study is G i ri − → G i + P, P d − → ∅, where G i and r i represent either gene state G or G * and the corresponding production rate r u or r b respectively. Adding colored noise to the degradation rate, d = d 0 (1 + η), we then have the following set of SDEs in each gene state (here we have applied the mean field approximation to the terms in the square root): where Γ(t) and θ(t) are Gaussian white noise terms, each with zero mean and correlators Γ(t)Γ(t ) = δ(t − t ) and θ(t)θ(t ) = 2Dδ(t − t ) respectively. Processing the usual steps of the UCNA method, detailed explicitly in Section 3, we find the approximate steady state probability for each gene state: where N is a normalisation constant and we have defined, We note that since each gene state is individually considered, each can have it's own associated noise strength D and correlation time τ on the degradation rate. Hence, using Eqs. (72)- (73) and (76) we can now approximate Eq. (70) as: P (n) ≈ P s (G)P s (n|G) + P s (G * )P s (n|G * ).
(78) Figure 9 compares the cUCNA with the modified SSA -which is the same as the modified SSA found in Section 3.1. Fig. 9(i) shows that for small switching rates, the cUCNA can correctly capture the bimodality exhibited where the colored noise on the degradation rate is small. As the noise on the degradation rate gets larger the cUCNA still provides a decent approximation to the true distributions; it is also clear that the bimodality of the protein distribution is destroyed as the size of this noise increases. One can contrast this to the cases observed in Figs. 3 and 6 which showed that where the gene switching rates are fast, increased colored noise strength can in fact induce bimodality. In summary, we find that extrinsic noise on the degradation rate of a slow switching auto-regulatory system generally destroys bimodality, but for fast switching it is common to observe the opposite phenomenon.

Applications
In this section we explicitly show, by means of two examples, how one can use the colored noise formulation that was introduced earlier to describe intricate molecular details of cooperative autoregulation. We first show this for multi-stage protein production with fast gene switching, and then for multi-stage protein degradation with slow gene switching.

Multi-stage protein production
The first example of using colored noise as a form of model reduction is that of mapping multistage protein production onto a simpler system, where colored noise accounts for processes not explicitly considered in the simpler model. Consider multi-stage protein production on the cooperative autoregulatory feedback loop: where it is assumed the system contains only one gene, either in state G or in state G * . The simpler model that we will then map this system onto the cooperative auto-regulatory feedback loop: , and assigning the properties of extrinsic noises η 1 (t) and η 2 (t) such that Eq. (79) can be mapped onto Eq. (80) is the task we have assigned ourselves. One can think of the different M i for i < N as the various stages of nascent mRNA, before it is eventually fully transcribed in stage M N (mature mRNA) where it can then begin translation [44]. Utilising the slow scale linear noise approximation [18] one can show that if Λ i max{Λ N , ρ u , ρ b } for i ∈ [1, N − 1] then the nascent mRNA M 1 , ..., M N −1 are fast species, and the reaction system in Eq.
(79) is consistent with the following reaction scheme describing fluctuations in the slow species G, G * , M N and P : We now apply the van Kampen ansatz to the number of mature mRNA, M N . In gene state G this gives us n 1 (t) = Ωφ 1 + Ω 1/2 1 (t), and in gene state G * this gives us n 2 (t) = Ωφ 2 + Ω 1/2 2 (t), where φ 1 = ρ u /(Λ N Ω) and φ 2 = ρ b /(Λ N Ω) are the steady state solutions to the rate equation describing the mature mRNA in the gene states G and G * respectively, and 1 (t) and 2 (t) describe small fluctuations about these means. Note the occurrence of 1/Ω in φ 1 and φ 2 follows since the concentration of a single gene in a volume Ω is 1/Ω. Using these ansatzes allows us to construct the effective protein production rates in gene states G and G * respectively: One can then see that r and that the noise terms have the form: In order to fully specify η 1 (t) and η 2 (t) we need to find the correlators η 1 (t)η 1 (t ) and η 2 (t)η 2 (t ) , which can be done by application of the linear noise approximation (LNA) [8]. Applying the LNA to n 1 (t) and n 2 (t), whose fluctuations are fully specified by the reactions G gives us the two following one variable FPEs: where Π( i , t) is the probability of having a fluctuation of size i at a time t. These FPEs, combined with Eq. (84) and (85), admit equivalent Langevin equations for η 1 (t) and η 2 (t), given by: where β 1 (t) and β 2 (t) are independent Gaussian white noises with zero mean and correlator β 1 (t)β 1 (t ) = β 2 (t)β 2 (t ) = δ(t − t ). From here one can find the correlators of η 1 (t) and η 2 (t): Comparing to the results of Section 3.2 it is clear that η 1 (t) and η 2 (t) satisfy the definition of colored noise, with noise strengths D 1 = 1/ρ u , D 2 = 1/ρ b and shared correlation time τ = 1/Λ N . This completes the mapping between the full complex system in Eq. (79) and our reduced process in Eq. (80). We can hence utilise our solution for the probability distribution with colored noise on the effective protein production rates in Eq. (48).
In Fig. 10A we show how effective the UCNA can be in approximating the protein distribution from the full system described in Eq. (79), where we have for simplicity assumed that there are three mRNA states: M 1 , M 2 and M 3 (i.e., N = 3). Fig. 10A(i) shows the approximation for a parameter set exhibiting bimodality: the red points represent the standard SSA of the full system in Eq. (79); the black line represents the distribution predicted from the UCNA (i.e., using Eq. (48) with D 1 = 1/ρ u , D 2 = 1/ρ b and τ = 1/Λ N ); the blue dotted line represents the distribution if one put white noise of the same magnitude on the protein production rates (i.e., the UCNA at τ = 0); and the orange line with circles shows the distribution if one was to neglect noise on the reaction rates entirely (i.e., r u = r (0) u and r b = r (0) b ). Clearly, in Fig. 10A(i) the UCNA is the only distribution that fits the SSA prediction, showing both the effectiveness of our model reduction as well as the need to properly account for the correlation time of colored noise in model reduction. This makes sense, since one would expect processes occurring in the full system to be correlated over short times, i.e., that noise events in close temporal proximity are not independent, and one cannot simply neglect these effects. Fig. 10A(ii) instead shows the various approximations for a monomodal parameter set. In this case, white noise is a poor approximation, and it is clear that one cannot neglect the finite correlation time. However, it is interesting to note that properly accounting for the correlation time using the UCNA returns the same distribution as if one had not added noise to the production rates at all -this is due to the small magnitudes of D 1 /τ and D 2 /τ respectively. These two examples shows that where correlation time is finite, it is imperative that one models it correctly.

Multi-stage protein degradation
Proteins in cells are often degraded via multi-step processes. For example, a major degradation pathway in eukaryotic cells is the ubiquitin-proteasome degradation pathway [45], and more recent experiments have shown that a subset of proteins in the mammalian proteome have age-dependent degradation rates [46,47]. From Fig. 2 in [46] we consider a system with two different stages of protein with differing degradation rates combined with the cooperative auto-regulatory feedback loop: SSA data for A(i) and A(ii) come from a single steady state trajectories of length 10 8 s and 9 × 10 5 s respectively. Note that A(i) presents a very long relaxation to the steady state due to the systems inertia in staying in one of the two modes of the distribution. SSA data for B(i) and B(ii) come from a single steady state trajectory of 9 × 10 6 s.
One finds that the effective degradation rate of the sum of protein number P 1 and P 2 is: In the following analysis we consider gene switching to be slow, which allows us to apply the cUCNA from Section 4. We first consider the probability of being in each gene state at steady state P s (G k ), where G k represents either gene state G or G * . Note that we assume both protein stages P 1 and P 2 can bind and unbind to the gene at the same respective rates, and note that n|G = n 1 |G + n 2 |G .
Following the analysis from Section 4 in Eqs. (72-73) we find: We can now proceed to find the probability distribution conditional on each gene state P s (n 1 , n 2 ). In gene state G k the conditional reaction system is: where the protein is always produced in stage P 1 , and r k ≡ r u in gene state G, and r k ≡ r b in gene state G * . Now we employ the van Kampen ansatz [8] on n 1 and n 2 in gene state G k , i.e. n + Ω 1/2 (k) 1 (t) and n (k) To proceed in finding η k (t) in Eq. (100), we first need to find the steady state concentrations φ * 1 and φ * 2 from the deterministic rate equations. These are, where again the 1/Ω in Eq. (104) follows since the concentration of a single gene in a volume Ω is 1/Ω. Enforcing the steady state condition allows us to find φ * 1 and φ * 2 , .
Note that the linear dependence of φ * 1 and φ * 2 on r k means that the effective degradation rate d 0 from Eq. (99) is independent of the gene state. Assuming that both P 1 and P 2 are numerous, we now proceed to the LNA [8,48] of the system in Eq. (97), which will allow us to find the correlators 1 (0) 1 (t) , 2 (0) 2 (t) , 1 (0) 2 (t) and 2 (0) 1 (t) . Where S is the stoichiometric matrix, φ = (φ 1 , φ 2 ) and f (φ) is the macroscopic rate vector one can computationally find the required matrices needed to perform the LNA: (i) the steady state Jacobian matrix A ij = d(S · f (φ)) j /dφ i | φ=φ * , and (ii) the steady state diffusion matrix (B · B T ) ij = S · Diag(f (φ)) · S T | φ=φ * . The Jacobian matrix then allows us to find the time evolution of both 1 (t) and 2 (t) since ∂ t = A · , where = ( 1 (t) , 2 (t)) . Solving these coupled first order ODEs gives us: where −d 2 and −(d 1 + κ) are eigenvalues of A. Clearly, in the limit t → ∞ the fluctuations about the steady state concentrations φ * 1 and φ * 2 , 1 (t) and 2 (t) , tend to zero as required. The final step of the LNA then requires us to find the covariance matrix C at steady state, which has the steady state variances 2 1 and 2 2 as diagonal components and covariance 1 2 = 2 1 in the off-diagonal components. C is then given by the Lyapunov equation [48]: whose solution is given by: From van Kampen [8] p. 259 we assert that for some fluctuation i , i (0) j (t) = i (0) j (t) , and that at t = 0 we have φ = φ * so that i (0) j (0) = i j . For example, for 1 (0) 1 (t) we have, using 1 (t) from Eq. (106) and 2 1 from Eq. (109), 1 (0) 1 (t) = 1 (0) 1 (t) = 2 1 e −(d1+κ)t . Explicitly, one can then calculate all the correlators, which are given by: Now that these correlators have been determined, we can substitute them into Eq. (103) giving us the following for the correlator of η k (t): noting the only dependence on the gene state G k comes from the pre-factor 1/r k . Comparing this equation to the colored noise seen in Eq. (12) in Section 3.1 we see however that we have two exponentials in the correlator. This sum of exponentials in Eq. (114) can be approximated by a single exponential through a small t expansion. This gives us: where D (k) a and τ a are the approximate noise strength and correlation time given by: Clearly, the small t expansion allows us to roughly interpret the noise η k (t), present in gene state G k , as colored noise with strength D a /τ a and correlation time τ a . Note that even when both exponentials equally contribute to the correlator in Eq. (114), this is generally a very good approximation for few protein stages. Knowing D (k) a and τ a for η k (t) we can now substitute them into Eqs. (76-77) in Section 4, then using Eqs. (95-96) we find P (n) ≈ P s (G)P s (n|G) + P s (G * )P s (n|G * ).
(118) Fig. 10B shows two different cases of the cUCNA for predicting distributions for multi-stage degradation and slow gene switching: in B(i) for the case of d 1 > d 2 (true for around 80% of proteins in [46]); in B(ii) for the case of d 2 > d 1 (true for around 20% of proteins in [46]). On the main plots red dots show the standard SSA prediction of the full reaction scheme in Eq. (92), and the black lines show the cUCNA from Eq. (118), which in both cases is almost indistinguishable from the white noise (cUCNA with τ = 0) and no extrinsic noise prediction (discussed further in the following paragraph). The insets show the correlators in gene state G * , where the red dashed line represents η G * (0)η G * (t) a and the black line shows η G * (0)η G * (t) . Note that the correlators for gene state G are not shown because they show very similar to what is seen for state G. Even given the complex model reduction from two protein species to one effective protein species the cUCNA performs very well in predicting distributions from the standard SSA of the full system in Eq. (92). Note that as one considers more protein stages with differing degradation rates it becomes more different to fit the correlator to a single exponential, which presents a limitation of this method for more protein stages.
However, we find that since our analysis is restricted to the large protein number regime, and the noise strength D a is inversely proportional to the production rate r k which is typically large, that D a is typically very small in both gene states. This means that the cUCNA probability distribution is almost identical to probability distributions that assume white noise (cUCNA with τ = 0) or even no extrinsic noise. However, what the analysis in this section provides is the quantitative reasons why one could necessarily neglect the contribution of extrinsic noise in model reduction from the full system in Eq. (92) to the simpler system in Eq. (93).

Conclusion
In this paper we have explored the addition of colored noise onto the reaction rates for a cooperative auto-regulatory circuit. Starting from a reduced chemical Fokker-Planck description, we used the UCNA to derive approximate expressions for the probability distribution of protein numbers in the limits of fast and slow promoter switching. The approximation is valid provided the colored noise on the reaction rates is small and the correlation time is short. By means of stochastic simulations, we verified the accuracy of the approximate distributions; we also verified the predictions of the UCNA, namely that under fast promoter switching conditions the addition of colored noise can induce bimodality whereas under slow promoter switching conditions, noise can destroy bimodality.
We also have shown how complex models of gene expression can be mapped onto simpler models with noisy rates. In particular we have shown that: (i) An auto-regulatory feedback loop with multistage protein production, including different stages of mRNA processing, can be mapped onto an auto-regulatory feedback loop with a single protein production reaction step having colored noised added to its reaction rate. (ii) A feedback loop with multi-stage protein degradation can be mapped onto a feedback loop with a single protein degradation reaction with a fluctuating rate. We have also verified that in many instances, one cannot simply approximate colored noise with white noise, or else neglect it entirely, since this does not match behaviour seen from the full underlying models of multi-stage protein production or degradation.
The UCNA provides an easily extendable analysis for other gene regulatory networks so long as only one species effectively describes the system. Our analysis is the first to our knowledge, to analytically find steady state probability distributions where colored noise is added to a non-linear reaction (the protein-gene binding reaction) in a gene regulatory context; a previous study applied the UCNA to study the effects of extrinsic noise in genetic circuits composed of purely linear reactions [27]. Given that our calculations show that the protein distributions for auto-regulatory circuits with extrinsic noise on reaction parameters can be dramatically different than models assuming constant reaction rates, an interesting future research direction would be to develop UCNA based methods that can directly infer the properties of colored noise on reaction rates from protein expression data.

Acknowledgments
This work was supported by a BBSRC EASTBIO PhD studentship for J.H., a Darwin Trust scholarship for A.G. and a Leverhulme Trust grant (RPG-2018-423) for R.G.

A Stochastic simulations of autoregulation with extrinsic noise
In this paper extrinsic noise is accounted for in the SSA through the introduction of a new ghost species Y and some new ghost reactions. For example, consider the case where we want to model a fluctuating degradation rate d = d 0 (1 + η(t)), where η(t) = 0 and η(t)η(t ) = (D/τ ) exp (−|t − t |/τ ). We will then replace the degradation reaction, P d − → ∅, by the following set of reactions: We now show why this equivalence exists. The propensity for the degradation reaction in Eq. (119) is (n P n Y d 0 DΩ/τ )/Ω, meaning that the effective degradation rate is d = (n Y d 0 DΩ/τ )/Ω. Assuming there are large numbers of Y , we apply the van Kampen ansatz that fluctuations in Y occur around its deterministic steady state mean [8]: Then, employing the system size expansion, and enforcing the linear noise approximation (LNA), we obtain a linear FPE for the probability of having a fluctuation of size (t) at a time t, denoted Π( , t) [8,48]: This FPE then admits an equivalent Langevin equation given by: where β(t) is Gaussian white noise with zero mean and correlator β(t)β(t ) = δ(t − t ). Hence, from Eq. (120) it follows that d goes as: where η(t) = Ω 1/2 D (t)/τ . Eqs. (123) and (124) are consistent with the definition of colored noise described at the beginning of this section. This modified SSA requires that where τ and D are both individually large, that τ D such that slow switching is not enforced between differing numbers of the ghost species.

B Detailed explanation of condition 2
In order to explain the origin of condition 2 -a condition on the length scale of colored noise fluctuation compared to the rate of variation of the drift term in Eq. (47) -we will first consider a more intuitive example. Consider a Brownian particle subject to a force F (x), whose state is specified by both its position x, as well as its velocity v. The set of SDEs governing the state of this particle is then [8]: where m is the mass of the particle, γ is the damping coefficient of the frictional force surrounding the particle (frictional force is −γmv), k b T is the thermal energy of the particle, and Γ(t) is Gaussian white noise with zero mean and correlator Γ(t)Γ(t ) = δ(t − t ). The equivalent multivariate FPE for this set of SDEs is [29]: Now following van Kampen p. 216-218 [8], one can utilise singular perturbation theory assuming that the damping coefficient γ is small (although the same procedure could be done for γ large) in order to reduce the above FPE in two variables to a FPE in the position variable x alone. The result after having done this procedure is: Aside from the requirement that γ must be small, there is another condition required of Eq. (128) such that it reasonably approximates Eq. (127). This condition arises physically since we realise that if we are to approximate Eq. (127) by Eq. (128), then the drift term F (x)/mγ must be approximately constant over the distance that the velocity is damped. One finds that the associated 'length scale' L over which the velocity is damped is simply the pre-factor of diffusion term in Eq. (128), i.e., L = k b T mγ [49]. Enforcing the requirement that F (x) is slowly varying over this length scale we find the inequality L|F (x)| |F (x)|, explicitly: which must be satisfied for our one variable FPE to be a good approximation. We now return to our colored noise problem and recall the set of SDEs that define our system: dn dt = h(n) + F (n)η + g 2 (n)Γ(t), where all functions of n and t are defined in Section 3.2. This set of SDEs has an equivalent bi-variate (Stratonovich) FPE given by: ∂P (n, η; t) ∂t = − ∂ ∂n h(n) + F (n)η − 1 2 g 2 (n)g 2 (n) P + 1 τ ∂ ∂η (ηP ) (132) We now recall Eq. (47), i.e., our UCNA approximated one variable FPE, where η was adiabatically eliminated: ∂P (n; t) ∂t = − ∂ ∂n h (n) +g(n)g (n) P (n, t) + ∂ 2 ∂n 2 g(n) 2 P (n, t) .
Analogously to the case of the Brownian particle, this one variable FPE can only be approximately correct where the variation of the drift term with respect to the length scale of colored noise fluctuations is small. From Eq. (43), we identify our length scale as the pre-factor of the noise term whose origin is the adiabatic elimination of η, i.e., L = F (n) C(n, τ ) .
Hence, C(n, τ ) must satisfy the following length scale condition C(n, τ ) F (n) ∂ n h (n) +g(n)g (n) h (n) +g(n)g (n) , if Eq. (133) is to be a good approximation of Eq. (132), as seen in Eq. (66) from the main text. Note that one can also make the argument that the diffusion termg(n) 2 should also slowly vary with respect to L. However, we generally find that this is satisfied if Eq. (135) is satisfied, and hence we do not include this as an additional condition on the validity of the UCNA.