Noise reduction facilitated by dosage compensation in gene networks

Genetic noise together with genome duplication and volume changes during cell cycle are significant contributors to cell-to-cell heterogeneity. How can cells buffer the effects of these unavoidable epigenetic and genetic variations on phenotypes that are sensitive to such variations? Here we show that a simple network motif that is essential for network-dosage compensation can reduce the effects of extrinsic noise on the network output. Using natural and synthetic gene networks with and without the network motif, we measure gene network activity in single yeast cells and find that the activity of the compensated network is significantly lower in noise compared with the non-compensated network. A mathematical analysis provides intuitive insights into these results and a novel stochastic model tracking cell-volume and cell-cycle predicts the experimental results. Our work implies that noise is a selectable trait tunable by evolution.


WP256
MATα, leu2::LEU2-P TET   Parameter Value Unit mean and SD of r 1 0.24 ± 0.06 fL min -1 mean and SD of r 2 0.505 ± 0.108 fL min -1 mean and SD of r 2m 0.007 ± 0.024 fL min -1 mean and SD of T1' 11.7 ± 6.271 min mean and SD of T2 14.4 ± 4.493 min mean and SD of T3 50.175 ± 5.820 min mean and SD of  In the above equations, ! represents the average total concentration of the protein Galnp, ! represents the maximal expression rate, ! represents the basal expression level, and ! represents the nonlinearity of TF-promoter interaction for the gene GAL . ! represents the combined protein degradation/dilution rate constant of the protein Gal p. ! , ! , !" , is the function representing the overall GAL network activity level and is defined as where and are nonlinearity coefficients for Gal1/3p-Gal80p interaction and Gal80p-Gal4p interaction, respectively, and ! is the scale of action of the corresponding protein. Based on previous experimental observations 5 , we know that Gal3p:Gal80p and Gal1p:Gal80p interacts with 1:1 stoichiometry, so we fixed = 1. When ! ! + ! ! ≫ 1, the form of simplifies to This form has the important property We now introduce a perturbation factor > 0 representing a global perturbation such as extrinsic noise or a network dosage change, so that the equations become We now show that at steady state for equation set (5), the value of ! , ! , !" , is equal to the value of at steady state for equation set (1).
Let the steady state values of ! , ! , and !" for equation set (1) be ! , ! , and !" respectively, therefore we have Multiplying both sides of each equation by yields Comparing the two sets of equations (6) and (7), we can immediately see that ! , ! , !" are the steady state values of ! , ! , and !" for the system described by equation set (5), and that network activity level is equal to ! , ! , !" , = ! , ! , !" , , which is the network activity level of the system described by equation set (1) at steady state.
In other words, in this system, the network activity level does not change in response to a global perturbation.

B. Activity of the synthetic network is sensitive to global perturbations
We describe the time evolution of the concentrations of proteins expressed from the synthetic network using a similar set of differential equations: In the above equations, !"! , !"! , !"! are the maximum expression level, basal expression level, and TF-promoter interaction nonlinearity of the TET promoter, respectively. !"#$ and !"#$ are the average total concentration and combined degradation/dilution rate constant of the rtTA protein. ℎ !"#$ , is a function representing the activity of the TET promoter as a function of rtTA level, We claim that for any two distinct values ! ≠ ! , the value of at steady state for the system described by equation set (8) are different, and we prove that by contradiction.
From the simplified form of (3) and given the assumption ! , ! , !" , = ( ! , ! , !" , ),, It's easy to see that this equation holds only when !" = !" . Now consider the last equation in (6). We know that, since and are steady state values, Substituting !" = !" , !"#$ = !"#$ , and ! = ! yields Which leads to (assuming !"! ≠ 1 -i.e., the TET promoter is inducible) ≠ 1, this requires h, or the activity of the TET promoter, to be independent of the level of rtTA, which cannot be true for any sensible form of h, and is certainly false for the form we used.
Hence, the assumption ! , ! , !" , = ( ! , ! , !" , ) must be incorrect, and for any two distinct values ! ≠ ! , the value of at steady state for the system described by equation set (8) must be different. Thus, unlike the wild-type GAL network, the activity of the synthetic network is sensitive to global perturbations.

A. The synthetic construct is not toxic under the conditions used
As there have been reports of rtTA-induced toxicity effects 9,10 , we measured the doubling times of all strains by using 0.5% galactose, the highest concentration used in this study, which would be expected to produce the highest levels of rtTA in the synthetic strains. For the synthetic strains, doubling times with and without doxycycline were measured. All strains/conditions displayed the expected doubling time (~90min) and we observed no toxicity effects under the conditions used ( Supplementary Fig. 1).

B. Increasing the sample size for flow cytometry did not substantially affect the expression distributions
To test the effects of increasing the sample size, we induced strains carrying the wild-type (WP128) and synthetic (WP196) topology at 0.15% galactose for 22 hours, increased the sample size tenfold so as to result in about ~20,000 cells on average after gating, and compared the distributions obtained with those obtained using the sample size of ~2,000 cells.
No substantial difference was observed between the two distributions ( Supplementary Fig. 2).

C. Stability of the expression levels at 22 hours of induction
To show that 22 hours of induction is sufficient for the gene expression profiles to stabilize, we induced strains carrying the wild-type (WP128) and synthetic (WP196) topology with a linearregion galactose concentration (0.15% galactose) for either 22

Quantification of the level of network dosage compensation.
We quantified the level of dosage compensation by averaging the absolute difference in induction level between strains with one and two copies of the gene network across all inducer levels ( Fig. 1D, Fig. 2D). Under this metric, lower scores indicate stronger dosage compensation effects. As seen in Supplementary Fig. 4, the compensation score for the wild-type network ( Fig.   1D) is only 5%, as compared to the 25% for the synthetic network (Fig. 2D). When we excluded the basal (0% galactose) and the saturation (0.35% and 0.5% galactose) regions, the score for the synthetic network was even higher at ~35% ( Supplementary Fig. 4B).

SUPPLEMENTARY NOTE 4
The β-estradiol inducible genetic system The strains carrying the β-estradiol inducible system (WP256 and WP258) were grown in synthetic dropout media with the appropriate amino-acid supplements. During the overnight growth period (22 hours in 30°C shaker), 2% glucose was used as the carbon source. The overnight growth period was followed by the induction period ( Using a haploid strain without YFP, we measured cellular autofluorescence, subtracted it from all expression levels, and plotted the results ( Supplementary Fig. 5C). We observed 8-9 fold induction in both WP256 and WP258 ( Supplementary Fig. 5D). For stringency of the comparison, it is important to make sure that the fold-induction range of the estradiol system used here is equal to or more than the fold-induction range of the P GAL80 promoter. Previous work 4 has shown that the P GAL80 promoter is inducible ~5-fold at full induction, validating our choice of the estradiol and doxycycline concentrations for a reliable noise comparison at ranges comparable to the range of rtTA-TET induction under the control of the P GAL80 promoter.
To find out the growth rates of the two strains in the absence and presence of β-estradiol, we measured cellular doubling times after growing the two strains (as described above) overnight followed by induction in different conditions ( Supplementary Fig. 5E). At the end of the 22 hours induction period, frequent OD 600 measurements were taken from exponentially growing cultures. Time dynamic growth curves were constructed and fitted to an exponential growth function, leading to the extraction of the doubling times of the strains in specific growth conditions ( Supplementary Fig. 5E). In the absence of estradiol, the two strains grew at similar rates (with ~105 min doubling time due to fusion-protein expression). In the presence of 300nM

I. Determining the doxycycline concentrations used for the synthetic strains
To determine the doxycycline concentration that would cause the activity of synthetic network to match that of the wild-type GAL network, we used strains WP196 and WP115, and measured

II. Building a volume-and cell-cycle-aware stochastic model to predict noise in network activity
The volume-and cell-cycle-aware stochastic model we built contains two interacting modules.
The first module is for cell volume growth and division, while the second module is for a multicomponent gene network.

A. The volume module
The volume module models cell volume growth and division during cell cycle. Based on previous experimental characterization 11 on the budding yeast cell cycle, the cell grows linearly in the G1 and S/G2/M stages but at different rates. The G1 phase is further divided into two time blocks (T1 and T2), corresponding respectively to the time from the beginning to G1 to start and the time from start to end of G1 ( Supplementary Fig. 8). The previous work has shown that the volume at start is linearly related to the G1 phase growth rate such that the duration of T1 obeys the following equations: At each cell division, a daughter cell was assumed to partially inherit its parent's parameter values. The exact level of inheritance is described by an additional model parameter, , such that for a given parameter , where !"#$% is a value sampled from the distribution of .

B. The gene network module
We define a gene network as a group of genes whose promoters are all under the control of the same master transcription factor. Thus, in wild-type cells, GAL80, GAL3, and GAL1 are all in the same gene network, while in the cells carrying the synthetic construct, P GAL80 -rtTA, GAL3, and GAL1 belong to one gene network, while P TET -GAL80 is in its own network.
The activity of each network is described using a functional form that relates the concentration of activators, inhibitors, and inducer to the current activity level. The activity of the rtTA-TET network in synthetic strains is described using a Hill function 12 : where ! is a scaling factor, is the inducer (doxycycline) concentration, and a nonlinearity coefficient.
The GAL network activity is described using the following functional form, which is derived (see section I, below) from the interactions among the network components: where ! , ! , and !" are scaling factors, is the inducer (galactose) concentration, and and nonlinearity coefficients.
Supplementary Figure 9. The seven stochastic reactions for each gene. The promoter switches between OFF and ON states. mRNA is produced at the max rate from the ON state only, but the off state also produces some mRNA. mRNA is then translated into protein, and both mRNA and protein can be degraded.
For each gene in the network, we construct a set of seven stochastic reactions as illustrated in Supplementary Fig. 9. Each gene's promoter is assumed to switch between two states, OFF and ON. Using to denote the number of mRNA copies, the number of protein copies, !"" ( !" ) the number of OFF (ON) promoter copies, the cell volume, we describe the reactions as: Promoter activation………………………. !" !"" Promoter inactivation…………………...… !"" !" mRNA synthesis from OFF promoter…… ! !"" where !"# is a constant scaling factor equal to the average cell volume in the population, introduced so that the values of the rate parameters more closely match experimental measurements. These seven stochastic reactions are governed by the parameters !"" , !" , ! , , ! , ! , and ! . The rate parameter !" is determined by the following equation: where is a model parameter representing the maximum activation rate of the promoter of the gene at issue, is the current value of the functional form of the gene network containing the gene at issue, and a nonlinearity coefficient characterizing the interaction between the gene network's master transcription factor and the gene's promoter.
We chose to parameterize promoter transitions using the fraction of time a promoter spends in the active state when fully induced, denoted by , rather than the promoter inactivation rate !"" . We define as Further, to facilitate using parameter values obtained from existing experimental measurements, we use the observed mRNA synthesis rate ! ! instead of the actual mRNA synthesis rate ! , the difference being that the observed rate reflects the fact that even in fully induced cells, a gene is only being actively transcribed a fraction ( ) of the time. The two are related by the equation Similarly, we use the observed basal expression level ′, instead of the actual ratio between OFF-state and ON-state transcription rates . The two are related by the equation To summarize, for each gene (including the reporter), there are seven stochastic reactions described by eight parameters: , , , ! ! , ′, ! , ! , and ! .
To simulate global transcriptional noise, for each individual cell a random perturbation is applied to each rate parameter. Rate parameters for the same process (e.g., translation) are perturbed by the same fraction for all genes in the cell.

C. Interaction between the two modules
Cell volume growth is implemented by adding an additional stochastic reaction into the set of stochastic reactions specified by the gene network module; this reaction fires at a rate determined by the cell volume module, and each time the reaction fires, the cell volume is incremented by a small amount. The change in cell volume, in return, will cause the rates of all the other stochastic reactions to change.
To model DNA replication during the cell cycle, upon entry into the S/G2/M phase we double the count of all promoters, on the assumption that an ON promoter would replicate into two ON ones, while an OFF promoter would replicate into two OFF ones.
For cell division, all mRNA and protein contents are distributed between mother and daughter cells following a binomial distribution based on the volume ratio between the mother and daughter compartments.

D. Determining model parameter values from literature
For the volume module, all model parameters except c were adapted from a previous study 11  For the gene network module, we fixed rates of RNA transcription and decay 1 , protein translation 6,7,13 and degradation 13 , as well as basal transcription levels for the GAL network promoters 4 from ranges described in the published literature (Tables S3, S4). As our model applies a fitted scaling parameter to each protein concentration, we do not expect inaccuracies in the fixed parameter values to have significant effects.
We set = 4 for the GAL1 gene and the P GAL1 -YFP reporter, and = 1 for GAL3, GAL80 and P GAL80 -rtTA, based on the number of Gal4p binding sites on the respective promoters 14,15 .
We set = 1 in the GAL network's functional form based on previous work 16 . For P TET -GAL80, we set = 2, as rtTA dimerizes, and = 2 as the promoter used was TET O2 , with two binding sites for rtTA. Finally, we set the global noise level at 10% -that is, the perturbation applied to each rate parameter is sampled from a normal distribution with mean zero and standard deviation equal to 10% of the parameter's value.

E. Stochastic simulations using the model
The model was implemented in custom-written C++ code. Simulations were performed using a modified version 17,18 of the well-established Gillespie algorithm 19 .
Cell populations were simulated for 22 hours, corresponding to the amount of time the cells were grown for experimental measurements. In each simulation, the initial population contains 25,000 cells, and the inducers are introduced at = 0. Inspired by a previous work 20 , we performed random samplings every 40 minutes to maintain the population at the same size.
The age of the initial cells at = 0 is sampled from an exponential distribution whose mean is equal to the average doubling time (90 minutes). The initial state of each cell is set using the steady state values computed from the parameter values.
The simulation is repeated for various inducer concentrations. The reporter protein count in each cell is converted to simulated fluorescence measurements using a fitting procedure described in section H, below.

F. Fitting other model parameters
Model parameters not fixed from literature in section D are determined via fitting, in two steps.
The GAL-network related parameters were determined by fitting the wild-type GAL network model to measurements performed using a haploid WT strain with a P GAL1 -YFP reporter Given that we have fixed the value of at 1 (see section D), the functional form of the GAL network simplifies to: As a result, changing the values of !" , ! and ! proportionally doesn't affect the value of !"# .
We therefore arbitrarily fixed !" = 30000 for the first fitting procedure, and fitted only ! and ! .
The other fitted parameters are (nonlinearity coefficient for Gal80p-Gal4p interaction), (maximum promoter active fraction), and (maximum promoter activation rate), for the GAL80, GAL3, and GAL1 promoters, for a total of nine parameters.
As the rtTA-TET network produced Gal80p from a different promoter (P TET ), to account for inaccuracies, we included !" in the second fit, along with , , and ′ (observed basal expression fraction) for the TET promoter, and ! (rtTA's scale of action), for a total of five parameters.
For each fitting, we first tried a wide range of possible parameter value combinations to select the initial parameter values that yield fluorescence distributions similar to those experimentally observed. Then, fitting is performed using the well-known Nelder-Mead algorithm [21][22][23] implemented in the NLopt library (http://ab-initio.mit.edu/nlopt). The simulation is repeated a number of times (denoted ! ) for each set of parameter values. Using the fluorescence fitting procedure described in section H, the result of each simulation run is scored for how well it matches the experimental results. The score of the parameter values is the mean of the scores of the individual simulations. The fitting algorithm was first run with a smaller ! (in the 8~20 range), and then with a larger ! (100 or higher), in both stages until significant improvement in the score can no longer be observed.

G. Predicting the activity of the natural and synthetic networks in diploid cells
We used the same doubling time for haploid cells and diploid cells, and it is known the diploid cells have twice the volume on average as haploid cells. Thus, the volume of diploid cells should grow twice as fast compared to a haploid cell. Accordingly, in the diploid cell simulation, all volume module parameters that represent growth rates were doubled, along with the initial volume and the parameter . In addition, with the exception of the reporter, diploid cells start with two copies of each gene (doubled to four after S phase), rather than one. The reporter copy number is kept at one -two after S phase. All other parameters, whether fixed from literature or obtained from fitting in haploid cells, were kept the same.

H. Fitting procedure for fluorescence
The goal of this fitting procedure is to convert a set of reporter protein counts in cells = Using the Nelder-Mead algorithm [21][22][23] , we attempt to find the value that minimizes the value of -log ( ) (and so maximizes the value of ). The resulting maximized value of is the likelihood, and the corresponding value is the optimal reporter-to-fluorescence conversion factor. During fitting, the above fluorescence fitting procedure is repeated multiple times for each run of the simulation, and the mean of the minimized values of -log ( ) is used as the score of the run.

I. Derivation of the GAL network's functional form
Consider a gene whose promoter transitions between the OFF and ON states, where the rate at which the OFF→ON transition takes place is affected by the concentrations of the relevant regulatory proteins: The parameter ℎ above represents the timescale at which transitions take place, while ! , ! , ! , !" is a function that relates the concentrations of the GAL proteins ! , ! , ! , !" to the OFF→ON transition rate. We use a functional form to express the activity of the GAL network as a fraction of the maximum activity, which is derived from the interactions of the network components as follows.
Gal4p is the master transcriptional activator of the network, and expressed constitutively.
We represent its interaction with a GAL network promoter as Since GAL4 is a constitutively expressed gene, we take Gal4p's total concentration (denoted by ! ) to be constant. The maximum possible value of is Since the amount of free Gal4p ( ! * ) should be a decreasing function of Gal80p concentration and an increasing function of total Gal4p, we modeled the Gal4p-Gal80p interaction with the following equation: where !" * is the concentration of Gal80p proteins that are not bound by active Gal3p or Gal1p, !" is the scaling parameter, and is the degree of nonlinearity of the Gal4p-Gal80p interaction.
Despite that Gal1p has a ~40-fold lower inducer activity 24  Unlike the other parameters, the value of may vary from gene to gene, as it depends on the promoter. We therefore handle it in the stochastic model directly. Removing the exponent produces the functional form we aimed to derive: