Time Dependent Stochastic mRNA and Protein Synthesis in Piecewise-deterministic Models of Gene Networks

We discuss piecewise-deterministic approximations of gene networks dynamics. These approximations capture in a simple way the stochasticity of gene expression and the propagation of expression noise in networks and circuits. By using partial omega expansions, piecewise deterministic approximations can be formally derived from the more commonly used Markov pure jump processes (chemical master equation). We are interested in time dependent multivariate distributions that describe the stochastic dynamics of the gene networks. This problem is difficult even in the simplified framework of piecewise-determinisitic processes. We consider three methods to compute these distributions: the direct Monte-Carlo, the numerical integration of the Liouville-master equation and the push-forward method. This approach is applied to multivariate fluctuations of gene expression, generated by gene circuits. We find that stochastic fluctuations of the proteome and much less those of the transcriptome can discriminate between various circuit topologies.

The van Kampen Ω (system size) expansion [51] or equivalently, the central limit theorem [28,16]  The reactions R D act on X D (the corresponding γ i have non-zero coordinates on discrete species, 145 γ D i = 0) and have propensities depending on X D only. The reactions R C act on X C (the corresponding 146 γ i have non-zero coordinates on continuous species, γ C i = 0) and have propensities depending on X C only.

147
The reactions R DC , R CD act on X C and X D , respectively, and their propensities depend on both X D 148 and X C .

149
In this paper we consider gene network models. For each gene, we model the transitions between 150 promoter states, as well as other processes such as transcription, translation, protein folding, and protein 151 8 and mRNA degradation. We will consider that the mRNA molecules and proteins are in sufficiently 152 large copy numbers to justify continuous approximations. The only discrete variables are in this case 153 the promoter states. The set R D contains transitions between discrete promoter states whose rates do 154 not depend on regulatory proteins. The set R CD contains transitions between promoter states whose 155 rates depend on concentrations of regulatory proteins. The set R C contains translation, maturation 156 (folding), degradation reactions. The set R DC contains transcription initiation reactions that depend on 157 the promoter state. 158 We further consider that the copy numbers of continuous species X C and the propensities of reactions 159 in the sets R C and R DC are "extensive", in other words, scale with the system size Ω, X C = Ωx c , with size, unless they are proportional to copy numbers of activator or repressor proteins. For a more 168 complete discussion of these scaling relations we refer to [8].
Using the first order Taylor series

xc,t) ∂xc
we obtain the Liouville-master equation 3) The (3.1) The mRNA and protein variables follow ODE dynamics For the sake of illustration let us consider the simple model of a single constitutive gene controlled 205 by a two state (ON/OFF) promoter. We denote the states of the promoter by 1 and 0 respectively. The 206 transition rate from 0 to 1 is f and from 1 to 0 is h. The protein and mRNA concentrations are x and 207 y, respectively. The transcription initiation rate in the state 1 is k 1 and in the state 0 is k 0 << k 1 .

208
The translation rate is b. The mRNA and protein degradation rates are ρ and a, respectively. The 3) The protein and mRNA concentrations follow the ODEs The probability distribution of the promoter state s results from the dynamics of the two state Markov 212 chain 213 dp 0 dt = −f p 0 (t) + h(1 − p 0 (t)), where p 0 = P[s = 0] = p(0, x, y) dxdy, p 1 = P[s = 1] = p(1, x, y) dxdy. 214 We also define the asymptotic occupancy probabilities p 0 = h h+f and p 1 = f h+f , representing the 215 probabilities, at steady state, that the promoter state is OFF and ON, respectively.

216
The single constitutive gene model and the advection fluxes of the Liouville-master equation are 217 illustrated in Figure 3. More complex, two gene circuits models are represented in Figure 4 and their Liouville-master equations are given in the Appendix 1. The PDP Monte-Carlo method is based on the direct simulation of the PDP process. A simple algorithm 227 has been proposed in [8]. For the sake of completeness we recall here the main steps of this algorithm.

230
(3) Integrate the system of differential equations obtained by adding to (3.2) the equation for the 231 survival function F of the waiting time to the next Markov chain transition   Fokker-Planck equation [39].

265
In this paper we have used a finite-difference predictor-corrector scheme [47] to compute the solution of

298
Let us introduce our model beginning with the MEs describing the dynamics of the first switch, where, P (t) = (P 0 (t), P 1 (t)) T is the probability occupancy vector whose entries are the probabilities to 300 find the first switch in the OFF state (P 0 (t)) or in the ON state (P 1 (t)). The infinitesimal stochastic 301 18 matrix H 1 is given by: This is a basic telegraph process where the rates f 1 where y 1 is a random variable representing the copy number of mRNA in the cell coming from the first gene.

306
The random variable s 1 (t) follows the switch statistics, meaning that with probability P 1 (t), s 1 (t) = 1 at 307 time t and s 1 (t) = 0 with probability P 0 (t), again at time t. The production rate of mRNA is a function 308 of the random variable s 1 (t) following where K 1 is the highest level of mRNA production and K 0 is the basal one. The third equation describing 310 the activity of the first gene is for the random variable representing the protein density associated to it; where α is protein degradation rate and β is the translation rate. The last equation for the coupled gene 312 model is the one governing the probability occupancy of the second gene, where Q(t) = (Q 0 (t), Q 1 (t)) T encodes, in its entries, the information about the probability to find the 314 second gene ON (Q 1 (t)) or OFF (Q 0 (t)). The matrix H 2 (t) is given by (4.10) In the model at hand, the main source of stochasticity is the switching ON and OFF of the gene.

316
This noise is transmitted to mRNA synthesis process through the rate k(s 1 (t)) which is a function of a 317 random variable (s 1 (t)) and, so, a random variable itself. The first step of the push-forward method is 318 to compute the time dependent distribution probability of mRNA molecules y 1 (t) (which is perturbed by 319 the random variable s 1 (t)) once the probability distribution of the perturbation is known. To do so, we 320 begin by presenting the solutions of Eqs (4.5), where P (t 0 ) encodes the initial configuration (given at t = t 0 ) of the switch, and the matrices are (4.12) Explicitly, the solutions are given as are the asymptotic occupancy probabilities to find the gene OFF 324 20 and ON, respectively. Going on, we present the formal solution of the RDE governing mRNA dynamics, (4.14) where we have rescaled time t by the mRNA degradation rate and introduced the new time parameter, 326 τ = tρ, and also the dimensionless parameters k 0 = K 0 /ρ and k 1 = K 1 /ρ. Note that the integral (4.14) is 327 a basic Riemann integral, such that, if we consider a sufficiently fine partition [τ j , τ j+1 ] and the integral in (4.14) is approximated by   solution for protein density is: applied to obtain the time dependent distribution probability for protein density, in an analogous way as 350 for mRNA. The integral that must be partitioned is that over τ , in the interval where, we have used the definition k(s 1 (τ )) = k 0 (1 − s 1 (τ )) + k 1 s 1 (τ ) to simplify the notation. As before,

352
we have illustrated our method by calculating the protein density for the same two regimes of switch 353 flexibility.

354
To analyze the influence of the first gene on the second one we have assumed that the action of the first 355 gene is to activate the second (see (4.10)). To do so, instead of solving the RDE describing the activity 356 of the second gene (it is an RDE because the perturbation x 1 (τ ) is a random variable) we have analyzed 357 the mean value of the occupancy probability of the second gene whose dynamics is given by and Q 1 (τ ) = 1 − Q 0 (τ ) . The general solution for Q 0 (τ ) is given by In Appendix 3 we show, in detail, how to obtain the exact functional shape of x 1 (τ ) . Nevertheless, its 360 structure is x 1 (τ ) = r 0 + r 1 e −τ + r 2 e −aτ + r 3 e − τ and, because of this, the integral in (4.21) cannot be 361 evaluated analytically and a numerical evaluation must be performed. This will also be the case for the 362

23
conditional probabilities that will be expressed as where we have set Q i (τ j−1 ) = s 2 (τ j−1 ) (with i = 0 or 1) expressing the fact that at the instant of time  comparison is shown in Figure 10 for the two gene circuits G 1 → G 2 and G 1 G 2 that differ by the sign 412 of the interaction; one can notice that the protein fluctuation based distance is significant, whereas the 413 mRNA fluctuation distance is not, both for slow/slow and fast/fast genes. 414 We have also tested the significance of the correlation computed from bivariate mRNA or protein probability p < 5%) protein/protein correlation is obtained for moderate cell populations (N c > 100 for 420 p < 5%, see Figure 10). In order to obtain significant mRNA/mRNA correlation one has to use very large 421 numbers of cells (N c > 1000 for p < 5%, see Figure 10). This is possible for single cell sequencing and however, as seen in Section 2, our methods work also for promoters with more than two states.

442
As application of our numerical methods we tested the capacity of mRNA and protein copy numbers →, stand for activation and repression, respectively.

464
The corresponding Liouville-master equations are the following:

465
For the circuit G 1 → G 2 .
For the circuit G 1 G 2 .
For the circuit G 1 G 2 G 1 .
For gene circuits (4.2) read where c i (s), i ∈ [0, n g ] are positive constants depending on the discrete Markov chain state s.

474
The solution of this system is straightforward For the single constitutive gene model c The constants c i , i ∈ {0, 1, 2} for the two gene models are given in the Table 2. For a general gene network, 478 the constants table is precalculated symbolically and used to generate automatically the simulation code.

479
In order to obtain an expression for x 1 (τ ) we must solve a set of ordinary differential equations: where k(s 1 (τ )) = k 0 P 0 (τ ) + k 1 P 1 (τ ). The solution for y 1 (τ ) is: The general solution for protein mean value has the structure, where the coefficients are given by : Model Table 2: Constants c i for computing the next step waiting time in the two gene model.   defined by (4.1). terms are chosen such that then mean mRNA and protein are the same in regulated and constitutive genes. The probability to color map relation is logarithmic. Figure 9: Steady state bivariate histograms of protein copy numbers from two interacting genes in circuits of different types and for four switching regimes of the promoters, obtained with the Monte-Carlo method.
The individual gene parameters are those used in Figure 5; f and h constants in f x i or hx i terms are chosen such that then mean mRNA and protein are the same in regulated and constitutive genes. The probability to color map relation is logarithmic. The parameters of the simulations are those used in Figure 9. 48