Shadow enhancers mediate trade-offs between transcriptional noise and fidelity

Enhancers are stretches of regulatory DNA that bind transcription factors (TFs) and regulate the expression of a target gene. Shadow enhancers are two or more enhancers that regulate the same target gene in space and time and are associated with most animal developmental genes. These multi-enhancer systems can drive more consistent transcription than single enhancer systems. Nevertheless, it remains unclear why shadow enhancer TF binding sites are distributed across multiple enhancers rather than within a single large enhancer. Here, we use a computational approach to study systems with varying numbers of TF binding sites and enhancers. We employ chemical reaction networks with stochastic dynamics to determine the trends in transcriptional noise and fidelity, two key performance objectives of enhancers. This reveals that while additive shadow enhancers do not differ in noise and fidelity from their single enhancer counterparts, sub- and superadditive shadow enhancers have noise and fidelity trade-offs not available to single enhancers. We also use our computational approach to compare the duplication and splitting of a single enhancer as mechanisms for the generation of shadow enhancers and find that the duplication of enhancers can decrease noise and increase fidelity, although at the metabolic cost of increased RNA production. A saturation mechanism for enhancer interactions similarly improves on both of these metrics. Taken together, this work highlights that shadow enhancer systems may exist for several reasons: genetic drift or the tuning of key functions of enhancers, including transcription fidelity, noise and output.

Partly for this reason, I am skeptical of the authors' definition and calculation of fidelity. That the moments are estimated from ODEs implies that they behave deterministically with time once the network "exists" (i.e., E[R] is not a RV; aside, it seems to be assumed but never stated that the expected values of the molecule counts are taken once the network reaches the stationary distribution). In order for E[R] to be a RV, one possible assumption is that some of the kinetic parameters can take random values that may change with each instantiation of the network (each simulation) but do not change in time once the network "exists" (making E[R] a deterministic function of a RV beta1; let's call this bar{R}(beta1)). Indeed, this seems to be implied by the authors' statements in lines 573-575, in which they explain that they construct R and T1 by sampling values of production rate beta1 and then, for each fixed sample value, estimating the moments of the stochastic reaction network. Yet the authors do not provide any distribution for beta1, as I would expect to be necessary for deriving a distribution for bar{R}(beta1) (or bar{T1}(beta1) or the joint distribution thereof). From this perspective, I do not understand why the authors generated histograms (binning multiple values of beta1 together (!)) to estimate the joint distribution. Moreover, defining one metric (CV) in terms of a single beta1 and another metric across multiple beta1 makes me unsure of how to actually assess the tradeoff in these two system properties, though this appears to form a large part of the authors' concluding arguments. (This could be partially addressed by specifying that the CV is taken at some "representative" tilde{beta1}, justified in relation to the distribution of beta1.) There are many examples in the literature where mutual information has been applied to analyze signaling in genetic networks; see, e.g., Tkacik andWalczak (2011) J Phys Condens Matter andRazo-Mejia et al. andPhillips (2020) Phys Rev E. The authors might want to look into channel capacity as a way to assess how well the mRNA levels can distinguish between different values of input, where in this case the input could be transcription factor concentration (say, T1 alone) or beta1 itself. If the authors would like to depart from this more traditional perspective, then I am open to convincing, but would need much more explanation/justification/clarification than what is present in the manuscript.
We agree that the metric originally presented is not a standard use of the concept of mutual information. In the end, we decided to use the correlation between the TF concentration and the mRNA output as a simpler measure of fidelity. This avoids the use of histograms, different values of beta1, and vector notation for the mRNA. We have re-calculated most of the figures in this paper to include this metric instead of the mutual information from our original submission. Thankfully, most of the broad conclusions remain valid, and little re-interpretation of the results was necessary.
My third point of concern is that the authors have essentially only tested one parameter set (Table S.1), a point they acknowledge in the discussion. It's good that this set was derived from experimental results, so the values should be biophysically relevant. Nevertheless, I would be more convinced at the generalizability of the conclusions if the authors had tested a few different sets of parameters to see which have the most substantial effects on noise and fidelity. It seems important to me to know, for example, how CV varies with beta1. Though a massive computational scan would be out of scope, there are also at least a few other candidate parameters whose relative values might intuitively be expected to have effects. To illustrate, consider the conclusion that changing the number of enhancers does not appear to change noise properties (lines 230-234). This would be fundamentally unsurprising if r1 were about equal to r2 (as in the authors' simulations), since nothing else in the model exists to distinguish a TF bound to one enhancer from a TF bound to another. It is not inconceivable that a substantial difference in r1 and r2 could produce a different outcome.
We have now included a basic parameter variation analysis in a new figure of the Supplementary Material, Figure S6. We randomly selected five parameter sets after varying parameters over multiple orders of magnitude. For each of the additive, subadditive and superadditive cases, we plotted the fidelity and noise of each of the dozens of the modeled system, and we showed that the broad trends were preserved. In particular the values of r1 and r2 vary independently from 10 to 1000 and are significantly different in these parameter sets. The only parameters shown to significantly influence the behavior of this system are the k_on, as the simulations often did not compile if it was changed too much. This parameter was still varied but over a more narrow range.
Overall, the work is tackling an intriguing set of questions and the basic set of experiments is appropriate to address them. If the mathematical notation can be disambiguated and the metrics convincingly justified, then my confidence in the conclusions would be strengthened. Expanding the parameter space under consideration would then serve primarily to bolster the impact of the work by broadening the circumstances under which the conclusions are relevant.
We thank the reviewer for very insightful comments and for suggestions that have significantly strengthened our manuscript.
Minor comments: -In the main text, the authors show only one mathematical implementation each for superadditivity and subadditivity relying upon a linear relationship between kinetic rate and enhancer number (line 281, Figure 3). It should be acknowledged that with these definitions taken strictly, n above a certain value will cause kinetic rates to go negative.
We included this caveat in the discussion -the reviewer is correct and we kept this linear approach for simplicity. Another approach could use exponentially increasing or decreasing values for the k_on and k_off.
Also, I think the authors could consider putting Fig. S.4 in the main text, as it seems important that the noise characteristics can vary so much depending on the mathematical choice of model for implementing super-/subadditivity.
We have now included a new section in the paper before the discussion, with this figure along with a short description.
-Several paragraphs in the methods section read like tutorials. Given that many of these modeling/simulation paradigms are well established and not really the focus of the present work, I feel the level of explanation is more appropriate for the appendix than for the methods.
We have eliminated about half of the subsections of the methods section, and left the other half in place. For instance, the introductions to chemical reactions, the Gillespie method or stochastic modeling have been eliminated from the manuscript.
-The writing/figures could generally use polishing for clarity. For example: => lines 101-102: The way this is written could be interpreted to mean the enhancer is capable of producing different expression levels in response to different TF concentrations OR that it is capable of detecting changes in TF concentration. I assume the former, but it would be nice to have this absolutely clear.
We have clarified this -it is indeed the former.
=> "fidelity" is not defined at first usage (line 133) Thank you for pointing out, we now define it around like 101 where the concept is first described.
We have edited out this text after eliminating the definition of mutual information => line 187: could state explicitly in text that beta1 is T1 production and gamma1 is T2 production This text has been eliminated along with the definition of mutual information. The parameters are defined in Figure 1.
=> We agree, and we have now included this explicitly in that part of the text => line 275: "by decreasing k_on and increasing k_off"-is it really "and" or "or"?
We always did both together, so that it is "and". One could conceivably do only one at a time, but this would have been out of the scope of this manuscript. We considered doing this, and previous versions of the manuscript had different colors. We zeroed in on a single color in order to highlight those features that we found most important, such as the monotonically increasing nature of the graphs => Fig. 3B and 4B (right): I am not convinced a bar plot is the appropriate visualization for fidelity and noise since their quantities are not directly comparable.
We agree that the fidelity and noise numbers are not meant to be compared to each other. Per reviewer suggestion, we have changed the format to a small table on these subfigures, while indicating the two relevant models in cartoon form.
=> Related to the point about mathematical notation, T_n is used to denote at various times the transcription factor, its molecule count, and the number of binding sites present for it on the enhancer. I would recommend replacing it with text in the plot axes and introducing more symbols to disambiguate elsewhere in the text as necessary.
Thank you for this point -we have changed the notation so that the total number of binding sites of the i-th transcription factor is \tau_i. The notation T_i stands for both the i-th TF as well as its molecular count. This is a slight abuse of notation, but one that is common in this literature.
=> Fig. 4B: What are the two datasets being plotted together (both in black)? Figure 4B has changed since we used the new correlation metric for fidelity. The original subfigure had many lines, but they all lied on top of each other and looked like only two graphs. Using correlation rather than mutual information, the fidelity is more spread out although clear monotonicity trends can be seen, as described in the paper. => Fig. 4: It's good to provide the values of d_1 and d_2, but they are hard to interpret when numerical values have not been provided for any of the other parameters at this point in the text.
Per the reviewer's previous comment, we have referenced Table~1 at an earlier location, so that readers know the values of all other parameters.

=> line 521 repeats part of line 520
This text was eliminated from the manuscript per a previous reviewer comment Reviewer #2: The paper by Fletcher et al provides a study of how shadow enhancers may mediate trade-offs between noise and fidelity by means of stochastic simulations and approximate results using moment-closure. This is a study with well defined questions; it is very clearly written and the results are interesting to the transcriptional noise community.

We thank the reviewer for their interest in this work
My main reservations about the paper are the accuracy of the moment-closure approximations used, the lack of investigation of the Fano factor of the system which is a common measure of transcriptional burstiness, some strong assumptions on how TF production is modelled, the lack of relevant literature cited, and the rather lengthy (though nicely written) Methods which can be much condensed by reference to review articles and books. My more detailed comments are as follows: (1) The results of the paper (for the most part) rest upon the use of the zero cumulant momentclosure approximation which involves setting equal to 0 all the cumulants with an order greater a certain value. In the literature this has also been called "Normal moment-closure" or "cumulant neglect moment-closure". While this has been used quite a bit, various studies have also shown its limitations and those of other moment-closure approximations. See in particular Grima, R. "A study of the accuracy of moment-closure approximations for stochastic chemical kinetics. It is important that these and similar studies are discussed since presently by reading the paper one gets the idea that moment-closure does very well but this is generally not the case and once has to be careful with the results obtained. As well, what worries me is that the authors do not show a study of the accuracy of the used moment-closure method over parameter space for this model and hence its difficult to assess its accuracy (as far as I can tell they only refer to Fig 1c to assess accuracy but this is far from an exhaustive study across param space). I suggest a more comprehensive study of the method's accuracy is shown.
Per this comment by the reviewer, we have carried out additional simulations in order to compare the second order zero cumulant method with other moment closure methods, see Figure S5a). In particular we compare this method with a higher order zero cumulant method, as well as with other methods using the CERENA package. This comparison shows some relatively small differences between methods, and in our opinion they together validate the closure method used in the paper. We have also included the references suggested by the reviewer.
(2) The authors use a moment-closure method because the system has second-order reactions and hence moments cannot be computed exactly. I note that there is an alternative method of proceeding which is based on mean-field approximation and which even gives the distributions of mRNA, not simply the moments. I am referring to the Linear Mapping Approximation (LMA) which is specifically designed for reaction systems where the 2nd order reactions involve a protein binding a gene. See the paper Cao et al. "Linear mapping approximation of gene regulatory networks with stochastic dynamics." Nature communications 9.1 (2018): 1-15. Applying this to the present system would be advantageous in that more info can be obtained without using the SSA, the amount of effort is similar to that of moment-closure but its typically more accurate since there are no implicit distribution approximations. In any case, even if they decide not to use this approach, it would be useful to discuss alternative approaches such as the LMA which can give information about the transcript number distribution which is routinely measured.
We have included a short paragraph in the Discussion about LMA and the reference above.. Implementing it in this manuscript was found to be out of the scope of our work.
(3) The mutual information is calculated between R and T1. Why not between R and T2? The CV and the mutual information are studied. However the Fano factor (FF = variance / mean) was not. This is an important measure of transcriptional bursting since FF = 1 implies constitutive expression and deviations from 1 are a common measure of the extent of bursting; see for e.g. Sanchez et al. "Genetic determinants and cellular constraints in noisy gene expression." Science 342.6163 (2013): 1188-1193. Since the FF gives different information than the CV and since it s easy to compute from moment-closure, I strongly suggest the authors to study it. See Fig S2 for the correlation between R and T2. Regarding the Fano factor, we have created a new subfigure including this information, see Figure S5b). The subfigure recreates noise simulations for the additive, subadditive and superadditive cases, using the Fano factor instead of the CV.
(4) I found the Methods far too long and written for a non-expert audience. This is appropriate for a PhD thesis but given the vast amount of information present in books and published papers, I think it would be better to condense it. Reference can be made to recent review articles.
We have eliminated about half the material in the methods section, including introductions to chemical reaction networks, Gillespie simulation, and stochastic models.
(5) The TF's are produced in bursts of fixed size n1 and n2. While I understand they may have appeared in previous publications, it is generally not the accepted way to proceed. TFs being proteins suffer from translational bursting which leads to burst size distributions that are geometric, i.e. n1 and n2 should be random numbers sampled from a geometric distribution with a certain mean burst size. There is extensive literature on this; one of the founding papers on the topic is the following Shahrezaei et al. "Analytical distributions for stochastic gene expression." Proceedings of the National Academy of Sciences 105.45 (2008): 17256-17261. A rigorous derivation is also shown here Jia. "Simplification of Markov chains with infinite state space and the mathematical theory of random gene expression bursts." Physical Review E 96.3 (2017): 032402. Implementing random bursts is also not difficult to do using Cerena. For e.g. instead of 0 -> n1 T1, we would now have a set of reactions 0 -> T1 with rate k_1, 0 -> 2 T1 with rate k_2, ..., 0 -> N T1 with rate k_N where N is some suitably large number (chosen much larger than the mean burst size) and k_i is equal to beta_1 multiplied by P(i) = probability that the burst is of size i using the geometric distribution. I anticipate that the results for the noise and fidelity will change as a result of using a geometric burst size distribution.
We thank the reviewer for this comment, and we appreciate that the suggested change would make the system more realistic. As a guiding principle throughout the modeling of this paper, we have consistently worked to keep the model as simple as possible. Adding N reactions, where N is much larger than the mean burst size, would significantly increase the size of the model, and therefore we have decided that this change is beyond the scope of this work.
(6) I finally note that since TFs are proteins they are also typically stable and their degradation is not principally via an active mechanism but rather due to dilution when cell division occurs -while I am not suggesting that they implement this for their study (since its computationally challenging) it would be useful if they can add a brief discussion of how the explicit modelling of cell division leads to larger noise than using a model where there is no explicit cell division and nffmodel). For reference see for e.g. Beentjes, et al. "Exact solution of stochastic gene expression models with bursting, cell cycle and replication dynamics." Physical Review E 101.3 (2020): 032403.
This particular system of Drosophila embryo development does not particularly feature growth of biomass through cell division. Rather, nuclei are contained in a continuum of cytoplasm inside the egg. TF proteins likely need to be in the vicinity of the promoter to activate transcription, and so in this case the spatial localization, rather than degradation over time, appear to be more relevant in this system. This effect is also found for the mRNA which quickly decreases in concentration due to diffusing away from the site at which it is observed.