Competing constraints shape the nonequilibrium limits of cellular decision-making

Significance It is well established that the eukaryotic transcriptional cycle contains molecular reactions that consume biochemical energy in the form of ATP, yet the impact of these energy-consuming processes on how gene loci sense and respond to cognate transcription factor concentrations remains poorly understood. In this paper, we derive simple models of gene regulation to investigate how energy consumption impacts the rate at which genes can transmit information and drive cellular decision-making. We find that energy can accelerate information transmission by orders of magnitude and predict that gene loci will harness energy in different ways depending on the level of interference from the binding of noncognate transcription factors at the gene locus.


Supplementary Figures
For parameter sweep results in A and B, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.)

Fig. S2. Tradeoffs between sharpness and precision persist for more complex gene regulatory architectures. (A)
Non-equilibrium gains in sharpness and precision for gene circuits with different numbers of activator binding sites (NB) and one activation step. Shaded regions indicate achievable regimes for each system, as determined by no fewer than 10,000 unique simulated gene circuits. Circles indicate location of information-maximizing gene circuits for each mode (top 99.9th percentile). (B) Non-equilibrium gains in sharpness and precision for gene circuits with different numbers of activation steps (NA) and one activator binding site. Dashed lines indicate Hopfield barriers that dictate the limits of equilibrium performance. These limits are the same, irrespective of NA. (C) Scatter plot indicating sharpness levels for equilibrium gene circuits as a function of the number of binding sites. Bounding line is for a function of the form S = NB. (For parameter sweep results in A-C, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.)

Fig. S3. Supplemental analyses for the dependence of IR with non-cognate transcription factor interference. (A)
Extrapolation of minimum decision times for equilibrium gene circuits as a function of number of activator binding sites based on numerical results for circuits with 1-5 binding sites. Analysis indicates that at least 17 sites would be required to achieve plausible decision ties in the context of the mouse system. (B) Parameter sweep results showing the range of achievable information rates as a function of w/c for equilibrium gene circuits with 1-5 activator binding sites and one molecular activation step. (C) Sweep results for non-equilibrium gene circuits with 1-4 activation steps and a single activator binding site. (For parameter sweep results in B and C, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) (Transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) Predicted induction curves for 50 near-optimal non-equilibrium gene circuits when w = 10 3 c * (red lines), as well as the actual induction curves for the sharpest achievable equilibrium curve (solid blue line) and the (incorrect) limit when that would be predicted if w was not accounted for (dashed line). Note that the sharpness of the red curves falls above the true equilibrium limit but below the naive limit. (B) Predicted shift in the production rate resulting from a binding site perturbation that doubles the unbinding rate (k mut u /ku = 2)-equivalent to a energetic difference of 0.7 kBT-for equilibrium gene circuits (blue squares) and IR-maximizing non-equilibrium circuits (red circles). Note that non-equilibrium circuits are far more sensitive than equilibrium circuits when w/c < 10 4 . (C) Predicted sharpness shift upon perturbing the activator binding site. The non-equilibrium shift becomes markedly larger than equilibrium limit when w/c > 10 3 . (For parameter sweep results in B and C, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) rates. For instance, kmn-the element in the mth row and nth column of K-gives the transition rate going from state n to 23 state m. The diagonal elements of K are negative, and are scaled such that each column of K sums to 0. The activity vector 24 a is a binary vector of length N that contains a "1" for each state that is transcriptionally active, and a "0" for inactive states. 25 We assume that both K and a are fixed in time.

26
A.2. A generic expression for the rate of energy dissipation. Equation 1 gives an expression for the rate of energy dissipation (also 27 termed entropy production), Φ, in the context of the four-state model shown in Figure 1C. This is a special case of a more 28 general formula for Φ that applies to arbitrary molecular architectures. From (1, 2), we have [S1] 30 We use Equation S1 to calculate all energy dissipation rates given throughout the main text. In the case of the simple four-state 31 system shown in Figure 1B, we have from (1) that Equation S1 simplifies to 32 Φ = J ln k b η ab kaηuakuki k b kaη ba kuη ib ki , [S2] 33 which further simplifies to 34 Φ = J ln η ab ηua η ba η ib .
[S15] 83 We can now combine Equations S13 and S15 to solve for the first passage time from state i to state j: [S16]  Figure 1D). This is trivial in the case of a simple two state system with a single OFF and ON state and rates kon and k off 88 ( Figure S17). In this case, the burst cycle time is simply [S17] 90 The calculation becomes less trivial for systems with larger numbers of states, however. Fortunately, the concepts outlined 91 above provide us with the tools necessary to derive a generic expression for τ b that applies to systems of arbitrary complexity.

92
The essence of the procedure lies in calculating effective off and on rates (k * off and k * on ) from K using first passage times. We the expected first passage time from OFF state i back into any of the ON states. Specifically, we have that each element, i, is 109 given by πm+1 .
[S20] for the amount of time required for the system to turn back ON following a transition into an OFF state? It's tempting here to 114 use the stead-state probabilities of each OFF state given by π, but this is actually not correct. 115 Instead, the key is to recognize that each OFF state should be weighted by the rate at which ON states switch into it. In 116 other words, we weight OFF states by the probability that they are the initial state the system reaches upon switching out of 117 the ON conformation; the gateway into the OFF states. Mathematically, we encode these weights using the flux vector f OFF , 118 which has M elements, each defined as where aj is the jth element of the activity vector a (1 for ON states and 0 otherwise), kij is the transition rate from state j to 121 state i, and πj is the steady-state probability of state j.

122
Finally, we combine this expression with Equation S20 to obtain an expression for the average reactivation time as a 123 flux-weighted average of the first passage times out of each OFF state: [S22]

125
As noted above, the calculations for k * off follow precisely the same logic, with the roles of the OFF and ON states switched.
simulations. We used these simulations to track the distribution of the apparent average transcription rate for each model 154 realization as function of accumulation time. Figure S6A shows the apparent mean rate across 100 simulations for a single 155 illustrative gene circuit realization. Inset histograms indicate distribution of apparent transcription rates at different time 156 points. As expected, we see that the apparent rates are initially highly dispersed; however, even after 25 burst cycles, we see 157 that p(r) has become a much narrower, roughly symmetrical distribution that appears approximately Gaussian.

158
To systematically assess the rate of convergence to normality, we utilized the simple One-sample Kolmogrov-Smirnov test 159 ("kstest", (8)), which tests the null hypothesis that a vector of transcription outputs from realization i at time t, ri(t), is drawn 160 from a normal distribution. The test returns a p value corresponding to the probability of observing ri(t) if the transcriptional 161 output were truly Gaussian. In standard implementations p ≲ 0.05 is taken to constitute strong evidence that the output is 162 not Gaussian. Thus, to assess convergence to normality, we tracked this p value over time for each of the 500 gene circuit 163 realizations.
164 Figure S6B shows the average kstest p-values across 10 different sets of gene circuits, grouped by their average rate of 165 transcription. In all cases, we see that noise profiles rapidly converge towards normality, such that all systems cross the 166 (relatively conservative) threshold of p = 0.1 within 5 burst cycles (dashed line in Figure S6B). Gene circuits near the tail 167 ends of the induction curve (r ≤ 0.1 and r ≥ 0.9) take the longest to converge, which is likely because it takes longer for 168 distributions near the boundaries to become symmetric about their mean; yet even these converge rapidly.

169
The fastest decisions discussed in the main text ( Figure 4C and D), and most decision times considered are significantly longer 170 than the time for Gaussian convergence revealed by Figure S6B). Thus, we conclude that the Gaussian noise approximation 171 invoked throughout this work is justified. Kolmograv-Smirnov test. Different colors indicate average trends for systems with different average transcription rates. We see that systems near the low and high ends of the induction curve converge to Guassian form most slowly, but even these cross the p = 0.1 line within a handful of burst cycles. Error bars indicate bootstrap estimates of standard error calculated for each group. (For stochastic simulations shown in A and B, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −2 ≤ kτ b ≤ 10 2 and 10 −2 ≤ η ≤ 10 2 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) transmission as the time derivative of the expected Kullback-Leibler (KL) divergence between the two hypotheses (C = c0 and where P(c0|m) and P(c1|m) indicate (respectively) the conditional likelihood that the true value of C is c0 and c1 given the 177 observed output m, and where the angled brackets indicate that we are dealing with the expected value of D KL across many 178 replicates. We refer readers to information theory reference materials for a formal definition of D KL (see, e.g., (10)); however, 179 at an intuitive level it can be regarded as measuring how different two probability distributions are from one another. Thus, 180 with Equation S25, we define the rate of information production as the rate at which the two possibilities (c1 or c0?) become 181 distinguishable from one another given the observed "evidence" (m). 182 We can write out the expected KL divergence from Equation S25 more explicitly as the weighted sum of log probability 183 ratios: where ⟨...⟩i indicates the expectation taken assuming the true value of C to be ci and where p0 and p1 indicate the priors on 186 the true value of C, taken to be equal moving forward (p1 = p0 = 1/2). This formulation provides intuition for the sense in 187 which IR is the information rate: as the conditional probabilities of the observed output given the true (numerators) and false 188 (denominators) hypotheses about C diverge in favor of the true hypothesis, the log ratio terms will become large and positive.

189
Thus a positive derivative corresponds to positive information production.

190
However, here we must recall that our focus here is to understand how the molecular architecture of gene loci impacts the 191 transcriptional response and, ultimately, IR. Thus we wish to work in terms of p(m|c)-the conditional distribution of observed 192 mRNA outputs given some input-rather than p(c|m). To do this, we make use of Bayes' Theorem. We have: This expression becomes an equality if we assumed equal prior probabilities for our two hypotheses (p(c0) = p(c1)): Appendix B), it becomes a straightforward exercise to solve for the expected log ratios in Equation S29. We will solve for the 201 case when C = c1 in full. The c0 case proceeds in precisely the same fashion. To start, we have Recall that m = rt is Gaussian with probability density function: . [S31]

205
Plugging Equation S31 in for ln p(m|c1) yields Where we've recognized that the first integral will simply yield the standard expression for the entropy of a Gaussian random 208 variable. Pulling constant factors out of the second integral leads to Simplifying and recognizing that ⟨m 2 ⟩1 = m 2 (c1) + σ 2 m (c1) leads to: Finally, we recall that m = rt and σ 2 m = σ 2 t, obtaining Performing the same procedure for the case where c = c0 yields: Next, if we assume that the difference between c0 and c1 is small (as stipulated in the main text), then σ(c0) ≈ σ(c1) ≈ σ 2 (c * ) 219 and r(c1) − r(c0) ≈ δcdr/dc, leading to 220 IR = 1 2 δc dr dc Finally, we invoke the definitions of sharpness and precision given in Figure 1B, which leads to Equation 2 from the main text: which relates the normalized precision, P, to the bursting noise, σ, and the fraction of time a gene circuit spends in 228 transcriptionally active states, πa. From Figure 3A, we see that P ≤ 1 for the four-state gene circuit shown in Figure 1C when 229 the system is out of equilibrium, which, from Equation S40, implies that for the 4 state system.

232
Thus, Equation S42 gives a lower bound for the intrinsic variance in gene expression that arises due to transcriptional burst 233 fluctuations at the gene locus. To see how to relate this to noise from mRNA synthesis, we need to take two more steps. First, 234 we must recall that we are working in units of the burst cycle time, τ b . Second, we must further recall that we set the actual 235 rate of mRNA synthesis, r0, equal to 1 throughout the main text. We must do away with these simplifications in order to 236 relate σ 2 to synthesis noise. Accounting for these simplifications, the full expression for the noise floor, in "real" time units and 237 accounting for the true rate of mRNA synthesis is Now, if we assume mRNA synthesis to be a Poisson process (following, e.g., (11)), we have that this component of the 240 variance is simply equal to The key thing to notice about Equation S43 is that mRNA synthesis noise is independent of the bursting timescale τ b . Thus, as 243 τ b increases, σ 2 burst will increase in magnitude relative to σ 2 Poisson . Figure S7A and B illustrate this fact, showing predicted 244 bursting and mRNA synthesis variance components, respectively, as a function of the bursting time scale τ b and the activity 245 level (πa). All calculations assume an mRNA synthesis rate of 20 mRNA per minute, a rate based off of estimates from the 246 fruit fly (12) and that is consistent with measurements from other systems (13). From Figure S7A, we see that σ 2 burst peaks at 247 πa = 0.5 and increases dramatically as we move rightward along the x-axis and the burst cycle time increases. We emphasize 248 that this represents a lower bound for maximally precise non-equilibrium gene circuits; most systems (including IR-optimized 249 systems) will lie above this bound. In contrast Figure S7B shows that noise from mRNA synthesis scales linearly with πa, and 250 is constant in τ b .
[S44] 253 We can use this expression to calculate a lower bound on the relative contribution of mRNA synthesis noise to the overall 254 intrinsic variance in gene expression. Figure S7C shows the results of this calculation. We see that, with the exception of 255 rapidly bursting systems near the saturation (πa ≈ 1), the contribution from Poisson noise due to mRNA synthesis is negligible.

256
Thus, we conclude that noise from transcriptional bursting constitutes the dominant source of gene expression noise for the 257 vast majority of the parameter regimes relevant for the investigations in this paper, and that our decision to neglect Poisson 258 noise from mRNA synthesis is reasonable. data is accruing (14). Shortly thereafter, it was established that SPRT represents the optimal approach to sequential decision 262 problems involving binary decisions (15), meaning that it requires the fewest observations to achieve a desired level of accuracy.

263
In this framework, a downstream receiver (in our case, downstream genes or other cellular processes) tracks the accrual of some 264 signal (mRNA, and eventually protein) over time and compares how likely this accrued signal is under the two hypotheses to 265 be distinguished (e.g., high or low activator concentration). In this work, we use the optimal nature of SPRT to set lower 266 bounds on decision times that could be achieved given the transcriptional output of model gene loci. The essence of the test 267 lies in tracking the relative likelihoods of our two hypotheses (C = c1 and C = c0) over time as more and more transcriptional 268 output, m, accrues: 270 Figure S8A shows a stochastic simulation of how this ratio evolves over time for the output of a single model gene circuit.

271
Although the true concentration in this case is c0, we see that the two hypotheses are essentially indistinguishable early on.

272
This is because the range of possible outputs given high and low activator concentrations overlap significantly early on (leftmost 273 panel of Figure S8B). However, as more and more time passes, the expected outputs (m) given the two possible inputs (c1 274 and c0) start to separate. We see that the ratio in their likelihoods diverges more and more in favor of c0 (P0/P1 >> 1), 275 corresponding to a higher and higher degree of certainty that c0 is the correct choice.

276
This divergence, however, is non-monotonic and noisy, which reflects the stochastic nature of protein production at a single 277 gene locus. It has been shown that the noisy divergence of the log of the probability ratio (which we will call L ) can modeled 278 as a 1-D diffusive process with average drift IR (9) given by In this framework, a "decision" is made when L crosses a so-called "decision boundary" (horizontal dashed lines in Figure S8A).

281
Siggia et al showed that the Gaussian diffusion approximation could be used to obtain an analytic expression for the expected 282 time needed to make a decision. From Equation 15 in the supplement of (9), we have that: where V is the same as IR from above (and in the main text), D encodes the diffusivity of decision process (essentially, how 285 large the fluctuations are about its mean drift trajectory), and K is related to the log of the error tolerance parameter ε, such [S48] 288 We note that Equation S47 assumes equal priors regarding the likelihood of c1 and c0, and also assumes equal error tolerances 289 for choosing incorrectly in either case (16).

290
If we take the accumulated transcriptional output of our gene circuit, m = rt, to be approximately Gaussian (see Appendix 291 B), then it can be shown that D has the form: where mi and σi give the mean and variance in the accumulated transcriptional output, given that C = ci. From Equation S37 294 in Appendix C, we also have that In a previous work, Desponds and colleagues (16) demonstrated that D ≈ V when the difference between hypothesis-δc/c * in 297 our case-is small. Indeed, we see from Equations S49 and S50, that when c1 and c0 are sufficiently close, σ1 and σ0 will be 298 approximately equal, such that: As demonstrated by (16), when D ≈ V , Equation S47 simplifies dramatically, yielding  F. Steady state approximation. Throughout this work, we assume that microscopic reactions within our model gene circuits 306 have reached steady state, such that we can employ the closed form solutions for steady-state system behaviors detailed 307 in Appendix A. In this Appendix, we test this assumption by using stochastic simulations to test whether key gene circuit 308 behaviors converge to their steady state behaviors rapidly relative to the cellular decision timescales considered in this work 309 (e.g., in Figure 4C and D). In particular, we test the convergence of the average production rate (r from Equation S8) and 310 the variance (σ 2 from Equation S10), since these two quantities form the basis for the information rate and decision time 311 calculations presented throughout this work.

312
As in Appendix B, we used stochastic simulations to test how rapidly the average production rate and variance of the 313 accumulated transcriptional output approaches our steady-state predictions. We tracked the output of 500 random realizations 314 of the four-state system shown in Figure 1C for 5,000 burst cycles. Each realization had a unique set of transition rates and, 315 correspondingly, a unique average rate of transcription, r, and variance, σ 2 . For each model realization, we ran 100 stochastic 316 simulations. We used these simulations to calculate the average transcription rate and variance for each model realization as 317 function of mRNA accumulation time (in burst cycles).
circuits converge to their predicted steady state values, we calculated the percent error for each of metric as a function of 320 accumulation time. Specifically, for each of our 500 model realizations, we calculated the percent error in the mean at time t as where ri is the predicted steady state mRNA production rate for the ith gene circuit realization, and whereri(t) is the empirical 323 estimate derived from averaging across the 100 simulations of gene circuit i up to time t. In analogous fashion, we calculate the 324 percent error in the variance as As a final step in constructing our error metrics, we must account for errors in our simulation-based estimates that are due 327 to statistical noise, rather than to a lack of steady-state convergence. This statistical noise results from the fact that we use 328 a finite number of simulations (100) to estimateri andσ 2 i . This leads to noise that has nothing to do with the question of 329 steady state convergence. To account for this, we calculate asymptotic values for ϵr and ϵσ (taken as their values after 5000 330 burst cycles) and subtract them from our percent error estimates. The idea behind this is that, because the systems will have 331 definitively reached steady state by 5000 burst cycles, any residual error will reflect statistical uncertainty in the Monte Carlo 332 estimator, which is irrelevant for assessing steady-state convergence. This gives for the mean production rate and for the variance. shown in Figure S9A have δr < 5% by just 10 burst cycles into the mRNA accumulation process.

343
To gain better intuition for how to interpret these error levels, it is informative to directly compare predicted and simulated 344 production rate values. The inset panel of Figure S9A shows a scatter plot of simulated vs. steady state values for all 500 gene 345 circuits after 32.5 (green circles) and 5000 (gray diamonds) burst cycles. These plots indicate excellent agreement in each 346 case, confirming our steady state predictions accurately reflect the mean production rate after 32.5 burst cycles of mRNA 347 accumulation.
348 Figure S9B shows the average adjusted percent error in the variance in mRNA production rate (Equation S56) for the same 349 10 different sets of gene circuits. Here again, we observe a rapid convergence to the predicted steady state values predicted by 350 Equation S10, with δσ < 10% for all 10 groups after just 10 burst cycles, and with an overall adjusted percent error of just 1% 351 after 32.5 burst cycles. The inset panel of Figure S9B shows a scatter plot of simulated vs. steady state variance values for all 352 500 gene circuits after 32.5 (green circles) and 5000 (gray diamonds) burst cycles. As with the production rate, these plots 353 indicate good agreement between steady state predictions and our simulation results.

354
G. Implementation of parameter sweep algorithm. In this section, we describe the parameter sweep algorithm employed 355 throughout this work to enumerate the performance bounds of gene circuit models. We note that this approach is based on an  Figure S10A illustrates the key steps in this numerical procedure. First, an initial set of gene circuit realizations 358 (typically comprised of 1,000 variants) is generated by sampling random values for each transition rate in the system. We then 359 calculate the performance metrics of interest (S and P for the example in Figure S10A) for each gene circuit realization. This 360 defines an initial set of points ( Figure S10A, Panel i) that collectively span some region in 2D parameter space with area a1.

361
Next (Panel ii), we subdivide parameter space into N different bins along the X and Y axes, with N dictated by the total 362 number of points (10 ≤ N ≤ 50). We subsequently calculate the maximum and minimum point in each X and Y slice (Panel 363 iii). Finally, we randomly select candidate gene circuit models from these boundary points and apply small perturbations to 364 each transition rate to generate a new set of random variants (iv). In general, these variants will lie close to the original model 365 in 2D parameter space and, thus, close to the current outer boundary of parameter space. The key to the algorithm's success is 366 that some of these variants will lie beyond the current boundary (blue points in Figure S10A, Panel iv). This has the effect of 367 extending the boundary outward, leading to an increase in the surface area spanned by our sample points (panel iv). As a 368 result, cycling through steps ii-iv amounts to a stochastic edge-finding algorithm that will iteratively expand the boundary 369 spanned by sample points outward in 2D parameter space until some analytic boundary is reached.

370
The panels in Figure S10B show snapshots of the sweep algorithm's progress exploring sharpness vs. precision parameter 371 space for non-equilibrium realizations of the four-state gene circuit ( Figure 1C). Figure S10C shows the total area spanned by 372 the sample points for this run as a function of sweep iteration. By eye it appears that most of salient parameter space has been 373 explored by step 10 of the algorithm, but we are quite strict with our convergence criteria. We will only terminate a sweep at A and B, transition rate and interaction term magnitudes, k and η, were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) In this case, this convergence criterion is met following step 375 25, leading to the final set of sample points shown in Figure S10D. In general, we run all sweeps until the above criterion is met 376 or some pre-specified maximum number of iterations (usually 50) is reached.  Figure S10A is predicated upon the ability to rapidly 378 calculate performance metric quantities (e.g., S and IR) given a set of transition rate magnitudes. Wherever possible, we use 379 symbolic expressions to perform these calculations; however, this is only feasible for the simple four and six state systems 380 depicted in Figure 1C and Figure 4A. For more complex models, it is infeasible to perform the symbolic operations required to 381 obtain closed-form symbolic expressions. As a results, we use numerical calculations to arrive at performance metrics for all 382 higher-order models. 383 G.2. Enforcing equilibrium constraints. In this work, we make frequent use of comparisons between equilibrium and non-equilibrium 384 gene circuits in order to elucidate how energy expenditure alters gene-regulatory performance. A key step in performing 385 parameter sweeps for equilibrium gene circuits is ensuring that transition rates adhere to the constraints imposed by detailed 386 balance. For the simple four state model shown in Figure 1C, this process boils down to ensuring that the product of the four 387 transition rates moving in a clockwise direction about the square is equal to the product of the four counterclockwise rates. As 388 shown in Appendix A.2, this amounts to enforcing the constraint that where the η factors on the top and bottom of the left-hand-side expression correspond to regulatory interaction terms that 391 modify transition rates in the clockwise and counterclockwise directions, respectively, and where λ is the flux factor that 392 captures the relative magnitudes of clockwise and counterclockwise transitions.

393
To enforce this constraint during the course of a parameter sweep, we add a step to the process outlined above. New gene 394 circuit realizations are generated as before, but now, following its generation, we calculate the initial flux factor, λ0 for each 395 new realization using Equation S57. In general this quantity will not equal one for the new realizations (λ * ̸ = 1). To fix this, 396 we then multiply η ba and η ib each by a factor of λ 1 2 , which leads to a modified system that adheres to the constraint laid out in 397 Equation S57. Next, we check the modified terms to ensure that they adhere to magnitude constraints (typically 10 −5 /τ b ≥ ηi 398 10 −5 /τ b ) and pass all qualifying rates along to the next step in the sweep iteration (step ii in Figure S10A). Finally, we note 399 that, although we have focused on the simple four state system, our assumption that all binding and activation reactions are 400 identical ensures that the exact same approach holds for all higher-order models (N A > 1 or N B > 1) considered in this work. H. Testing the convergence characteristics of the parameter sweep algorithm. Here, we discuss results from a series of tests 403 designed to assess the convergence of our sweep algorithm for key scenarios examined in the main text. This task is the 404 most straight-forward when the algorithm is employed for "two-boundary" sweeps, such as S vs. P ( Figure 3A) and S0 vs f 405 ( Figure 5B), where both parameters examined adhere to finite performance bounds and, thus, where the 2D region of accessible 406 parameter space has a finite area. In this case, our general approach will be to assess whether independent runs of the algorithm 407 (i) converge prior to the 50 run limit and (ii) reach a consistent final estimate for the area of 2D space that is attainable for 408 different model architectures. The task becomes more complicated for "one-boundary" sweeps, such as IR vs. Φ ( limited only by bounds imposed externally as a part of sweep specification. We will begin by assessing convergence for the 411 simpler two-sided case, and will turn thereafter to examining one-sided cases.  Figure 3A and Figure S2A and B show results for parameter sweeps examining tradeoffs 413 between normalized sharpness (S) and normalized precision (P) for systems with 1-5 binding sites and 1-4 activation steps. We 414 note that Figure 3B and Figure S2C also derive from these parameter sweep results. Across the board, we find that nearly all 415 independent runs of the sweep algorithm converge according to the definition laid out above ( Figure S11A and B). Moreover, 416 for simpler architectures, we find that all independent sweep runs converge to essentially the same total area. For instance, 417 Figure S11B shows normalized area as a function of sweep step for 500 non-equilibrium realizations of the baseline four-state 418 model, indicating that all runs terminate near the global maximum found across all runs (dashed line). We take this as strong 419 evidence that the algorithm is consistently exploring the full extend of 2D parameter space.

420
As might be expected, the task of exhaustively exploring parameter space becomes more difficult as models become more 421 complex. Note the larger spread in outcomes for the non-equilibrium five binding site (N B = 5) and 4 activation steps (N A = 4) 422 models in Figure S11C and D, respectively. Nonetheless, we find that a significant number of sweeps converge to a consistent 423 maximum area, even for the most complex models considered. Figure S11E and F give the total fraction of sweeps having a 424 final area within 95% of the global maximum as a function of binding site number and activation step number, respectively.

425
First, we see that 100% of sweeps for equilibrium models uniformly meet this standard for all model architectures considered 426 (squares in Figure S11E and F). Second, our analysis indicates that, even for the extrema (N B = 5 and N A = 5), 13% in Figure 2A, C, and D. Because Φ has no natural barrier in parameter space, the convergence metrics considered above do not 431 provide reliable indicators of model convergence. Instead, we make use of the fact that the IR vs. Φ and the S vs. P parameter 432 sweeps should function (either directly or indirectly) to uncover the maximum achievable non-equilibrium information rate for 433 each model architecture. Thus, as a basic test of sweep performance, we checked for the consistency between IR estimates 434 derived from these different sweep modalities. As illustrated in Figure S12A, we find excellent agreement between the maximum 435 IR values derived from the S vs. P (x-axis) and IR vs. Φ (y-axis) parameter sweeps for all model architectures considered.

436
This provides one indication the IR vs. Φ sweeps are fully exploring the relevant parameter space.

437
As a second check, we compared the IR vs. Φ bounds derived for two separate rounds of parameter sweeps ("round a"  Figure 2D). Figure S12B and C show the  Figure S13B). In this case, however, even the IR vs. r parameter sweeps do not converge reliably, with only 3-4% of sweeps 470 reaching 95% of the global maximum. Thus, we are unable to assess the full extent to which the IR vs w/c is sub-optimal in 471 this case. Once again, though, this claims in the main text rely only on the IR bound when w/c is large (w/c ≳ 10 3 ); a regime 472 in which we find that the sweeps perform reliably (hollow gray square in Figure S13B). Thus, we conclude that the IR vs. w/c 473 sweeps provide a viable basis for the investigations undertaken in this study.

474
H.4. Specificity vs. N A results. We claim in the main text that, out of equilibrium, the specificity is bounded by the number 475 of activation steps, such that f ≤ α N A +1 . Here α is the affinity factor (set to 100) that reflects intrinsic differences in the 476 binding kinetics between cognate and non-cognate factors (α = k w u /ku). Figure S4B shows parameter sweep results in support 477 of this claim. These results are derived from 2D f vs. r sweeps. Figure S14 shows convergence statistics for these runs for 478 non-equilibrium systems with 1 to 4 activation steps and 1 binding site. We find that all 50 sweep runs met their convergence 479 criteria for each run ( Figure S14A) and, further, that no fewer than 76% of runs converged to a 2D area that was within 95% magnitudes (k and η) were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.)  Figure 2C and D from two independent rounds of parameter sweeps comprising 200 and 500 separate runs, respectively. Points reflect IR maxima for Φ values ranging from 0.1kBT to 5000kBT. (Transition rate and interaction term magnitudes (k and η) were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.)  Figure 4C). All results in (C) correspond to non-equilibrium gene circuits (in keeping with Figure 4D). (Transition rate and interaction term magnitudes (k and η) were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) where t indicates temporal distance from the present and τ mRNA is the exponential time constant, given by τ mRNA = t 1 2 / ln(2).

498
Integrating Equation S58, we find that τ mRNA time steps are effectively present in steady-state mRNA levels. Taking 30 499 minutes as the lower bound for bursting timescales, this yields an upper bound of (1800/ log 2)/30 = 86.6 burst cycles. We note 500 that this estimate is not materially different from the 60 cycle estimate that would be obtained by simply dividing 1,800 by 30. and η ib to impact locus activation dynamics. In all cases, we assume a single molecular activation step for multi-binding site 515 models (N A = 1). Finally, we also allow for cooperative interactions between bound activator molecules, which are captured by 516 the interaction term η ub . 517 Figure S15A illustrates what this looks like for a model gene circuit with two activator binding sites. The model has eight 518 total states, with four inactive states (top) and four active states (bottom). There are several features to note. First, the 519 transitions between states 2 and 6, which feature two bound activator molecules, are weighted by squared interaction terms, 520 η 2 ab and η 2 ib , to reflect the regulatory influence of two activators on locus activation dynamics. More generally, if n activators 521 are bound, these weights are raised to the nth power (i.e. η n ab and η n ib ). Second, note that unbinding reactions leading out 522 of states 2 and 6 are multiplied by the additional η ub mentioned above. This reflects interactions between bound molecules.

523
For simplicity, we assume that η ub is the same for both cognate and non-cognate activator species, as well as for interactions 524 between cognate and non-cognate activators. In general, unbinding reactions out of states with n activators bound will be 525 weighted by η n−1 ub to reflect pairwise interactions from the remaining bound factors.

526
Lastly, a key simplifying assumption that we make in this work is that each activator binding site is identical with respect 527 to its regulatory influence on the gene locus. As a result, it does not matter which binding sites are bound, only how many are 528 bound. In the context of Figure S15A, this means that states 1 and 5 are functionally identical to states 3 and 7, respectively.

529
Thus, these states can be combined into single coarse-grained states, which leads to an effective model with 6 states, rather 530 than 8. This ability to coarse grain is invaluable for more complex architectures, since it means that the total number of unique 531 molecular states scales as N A N B , rather than N N B A . to the absence of the nucleosome (26). The terms could also capture conformational shifts in key macromolecules such as mediator (27), or in the topology of the gene locus itself. 542 We assume that each component is required for transcription, such that, in a model with n molecular components only 543 molecular states with all n components engaged are transcriptionally active, and N A = n activation steps are required to achieve 544 locus activation. Furthermore, while in reality each molecular component is likely characterized by heterogeneous dynamics (see, 545 e.g., (5)) we again make the simplifying assumption that each molecular step is identical. As a result, it does not matter which 546 molecular components are engaged, only how many. Figure S15B shows how this logic plays out for the case where N A = 2. As 547 with the N B = 2 case, the model gene circuit has 8 states; however, in this case, only two states (5 and 6)-the ones in which 548 both components are engaged-are transcriptionally active. Note that the binding and unbinding reactions connecting these 549 states are weighted by factors of η 2 ba and η 2 ua , respectively, to reflect the influence of each molecular factor. In general, if n 550 components are engaged, these factors are raised to the nth power. In addition, we allow for cooperative interactions between 551 molecular components (curved arrow in states 1,2 and 4-7), captured by the ηaa and ηia terms in Figure S15B. In general these 552 terms are raised to the power of n − 1, where n is the number of engaged components at the initial molecular states. guarantees that ϕ = 0 (see Equation 1). In general, enforcing equilibrium constraints on higher-order systems that contain 557 multiple loops entails a significantly more complicated set of constraints. However, our assumptions (discussed in detail above) 558 that (i) all binding reactions are identical and (ii) that all activation reactions are identical ensure that, in our case, the exact 559 same approach holds for all higher-order models (N A > 1 or N B > 1) considered in this work.

560
For instance, consider the cycle containing states 3,2,6, and 7 in Figure S15A, which depicts a gene circuit with N B = 2 and 561 N A = 1. From the rates shown in the figure, we see that the ratio of forward to backward rates in this cycle is given by Upon inspection, it becomes clear that this expression can be simplified dramatically by cancelling like terms in the numerator 564 and denominator. This procedure leads to which is exactly equal to the expression for the four state system (Equation S57). Similar simplifications occur for all 567 "multimodal" cycles within higher-order gene circuits, where by multimodal we mean cycles that encompass two distinct classes 568 of molecular reactions. For instance, the cycle we considered above contains both a binding reaction and an activation step.

569
In addition to these multimodal cycles, higher order gene circuits like those shown in Figure S15A and B contain unimodal backward rate products is given by [S61] 576 We find that all terms on the top and bottom cancel, such that 577 λ2 = 1. [S62]

578
The same reasoning applies to cycles that contain multiple activation steps, rather than multiple binding events. Thus, our 579 modeling assumptions ensure that all unimodal cycles are constrained to operate at equilibrium, irrespective of the values 580 chosen for transition rates within the gene circuit. As a result, these cycles cannot contribute to the overall energy dissipation 581 of the gene circuit, and do not need to be considered when applying equilibrium constraints. Thus, despite their increased 582 molecular complexity, we find that the equilibrium constraint originally derived in the context of the simple four state case 583 applies equally well to higher-order models. represents an interesting future direction. We note also that such models should be tractable without need for additional 591 development if limited to operate at equilibrium.

592
In addition, we wish to emphasize the potential importance of allowing for heterogeneity, both in the properties of different 593 binding sites along the enhancer and between different molecular components within the activation pathway. This question seems especially interesting in the context of the molecular activation steps. Our simple model with identical steps likely 595 represents the floor of system performance. How much is to be gained when each reaction can adhere to its own kinetics, and 596 exert a distinct kind of regulatory influence over the gene locus? K. Deriving normalized sharpness and precision metrics. In Figure 1B, the transcriptional sharpness, s, is defined as the first 598 derivative of the transcriptional input-output function multiplied by the activator concentration, c * , such that The transcriptional precision, p, is defined as the inverse of the intrinsic noise in the transcriptional input-output function: where σ is as defined in Equation S10 in the main text. Under these definitions, a key challenge in comparing sharpness and 603 precision levels across different gene circuits is that the upper bounds on both s and p depend on the fraction of time the system 604 spends in the transcriptionally active state, πa (defined in Equation S8). Figure S16A and B illustrate this πa-dependence for 605 equilibrium and non-equilibrium realizations of the four-state system defined in Figure 1C. As an example: the equilibrium 606 bound on s is 0.25 when πa = 0.5, but only 0.09 when πa = 0.1 ( Figure S16A). Since we allow gene circuits to take on different 607 transcription rates (r = πa r0 ) at c = c * , this πa-dependence thus confounds our efforts to understand how the molecular 608 architecture of gene circuits-the number of binding sites, number of molecular steps, and presence or absence of energy 609 dissipation-dictates transcriptional performance.

610
To overcome this issue, we need to normalize s and p such that they are independent of πa. Focusing first on sharpness, we 611 were inspired by previous works (18, 28) to leverage Hill Function as a flexible conceptual tool for extracting generic sharpness 612 measures. The Hill function is defined as: where c is the activator concentration, S is the Hill coefficient, and K d is a constant that dictates the location of the function's 615 half-max point. In general, the input-output functions generated by our model gene circuits will have more complex functional 616 forms, but nonetheless, Equation S65 indicates that we can relate these more complex functions to the Hill function via the 617 shared parameters πa and c.

618
The sharpness of the Hill function has the form: [S66]

620
To better relate this to our input-output function, we need to re-express K d in terms of c and πa. Solving Equation S65 for K d Plugging this in to Equation S66 we obtain, after simplification: This expression tells us that the sharpness (s) of a Hill function with activity level πa at c = c * is equal to the Hill coefficient, 626 S, multiplied by the term πa(1 − πa). By rearranging, we can obtain the Hill coefficient as a function of s and πa 627 S = s πa (1 − πa) .
Thus, for a generic gene circuit input-output function with sharpness s and expression level πa at c = c * we can invoke 629 Equation S69 to calculate the Hill coefficient for the equivalently sharp Hill function ( Figure S16C). This provides us with a 630 generic measure of transcriptional sharpness that is independent of πa and thus can facilitate comparisons across gene circuits 631 that drive differing activity levels at c = c * ( Figure S16D). We refer to this independent sharpness metric as the "normalized Happily, Equation S71 exhibits consistent upper bounds for all πa values, and thus satisfies our second constraint ( Figure S16E). normalized sharpness (S) and precision (P) for a simple 2 state gene circuit ( Figure S17A) with one ON state and one OFF 643 state and two transition rates, k off and kon. We assume that activator binding dictates fluctuations into and out of the ON 644 state, such that kon is proportional to c (kon = ck 0 on ). For this simple system, the rate of transcription is given by Differentiating this expression with respect to c and setting r0 = 1 (as in main text), we find that [S73] 648 Finally, dividing through by b = πa(1 − πa) yields the normalized sharpness, which is simply given by Thus, we see that the two state model is constrained to a normalized sharpness level that represents the upper performance 651 limit for the four-state model operating at equilibrium (blue circles in Figure 3A).

652
Next, we turn to precision. From Equation S10, we find that [S75]

654
Inverting and multiplying by b 2 gives 655 P 2 = ck 0 on + k off 2(ck 0 on + k off ) . [S76] 656 Finally, multiplying through by τ b (Equation S17) and taking the square root gives which, again, is equivalent to the upper limit of the four-state gene circuit at equilibrium ( Figure 3A).

M. Sharp and precise non-equilibrium networks exhibit distinct and incompatible microscopic topologies.
In this appendix 660 section, we expand upon the claim, made in Main Text Section D, that the tradeoff between sharpness and precision 661 maximization indicated in Figure 3A reflects the fact that sharp and precise non-equilibrium gene circuits exhibit distinct 662 and incompatible molecular architectures. One simple way to probe these differences in molecular architecture quantitatively 663 is to measure the degree of heterogeneity (or dispersion) in (a) transition rates and (b) state probabilities. We developed 664 entropy-based dispersion metrics ranging from 0 to 1 to quantify how uniform (0) or heterogeneous (1) transition rates and 665 state probabilities were for different realizations of the four-state network shown in Figure 1C. While crude, these measures can 666 provide useful microscopic insights. For instance, in gene circuits with a state probability score of 0 each microscopic state 667 must be equiprobable (π1 = π2 = π3 = π4 = 1/4), while those with a 1 are maximally heterogeneous. In general, maximal 668 heterogeneity corresponds to the case when one and only one state has a nonzero probability; however, since, for simplicity, we 669 have elected here to focus on gene circuits where r = 0.5, the maximum instead corresponds to a case when two molecular 670 states (one OFF and one ON) have probability πi = 0.5. Similar considerations hold for the transition rate axis. We conducted 671 parameter sweeps to explore the space of achievable dispersion values for 10,000 non-equilibrium gene circuits (gray circles in 672 Figure S18A).

673
From Figure S18A, we can see immediately that precise and sharp gene circuits occupy opposite extremes of dispersion 674 space. Specifically, precise systems exhibit highly uniform state probability and transition rate values, while sharp networks 675 are highly heterogeneous, both with respect to the fraction of time spent in each state and the relative magnitudes of their 676 transition rates. These stark differences, as well as the tight clustering of each motif, suggest that sharpness and precision arise 677 from distinct and non-overlapping microscopic architectures.
678 Figure S18B illustrates the sharp and precise gene circuit motifs revealed by our analysis. In maximally precise gene circuits, 679 we find that all states are (nearly) equiprobable, all clockwise rates are nearly identical in magnitude, and all counter-clockwise 680 rates are negligible. This leads to a clock-like system with four equal steps per cycle, a design which renders the microscopic 681 transitions as deterministic as possible and, as a result, minimizes noise from microscopic fluctuations. In contrast, maximally 682 sharp non-equilibrium gene circuits exhibit an effective two state architecture in which non-equilibrium driving permits activator 683 binding to regulate both the locus activation step and the locus inactivation step; thereby doubling transcriptional sharpness 684 (see Appendix N for details). Thus, maximally sharp and maximally precise non-equilibrium gene circuits require distinct 685 molecular architectures that cannot be realized simultaneously. interaction term magnitudes (k and η) were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) To further examine the contrasting architectures of maximally precise and sharp non-equilibrium gene circuits, we used 687 our parameter sweep algorithm to generate over 90,000 realizations of the four state system in Figure 1C, and examined the 688 constraints that governed the steady-state probabilities of each molecular state and the magnitude of each transition rate for 689 maximally sharp and maximally precise non-equilibrium gene circuits. We quantified the degree to which each gene circuit was 690 optimized for precision using the following metric: where Pi is the normalized precision of the ith gene circuit and Pmax is the maximum achievable non-equilibrium precision 693 (equal to 1 for this system). Note that we squared each quantity to reflect the fact that IR depends on the precision 694 squared (Equation 2). To examine how the steady state probabilities (πi) of each of the four molecular states change as gene 695 circuits become increasingly precision-optimized, we arranged gene circuits into groups according to their pscore, ranging from 696 pscore = 0.5 (the equilibrium maximum) to pscore = 1 (the non-equilibrium maximum). For each group, we calculated the 697 distribution of πi values. Figure S19A depicts the results of these calculations. We see that the distribution of πi values is 698 relatively diffuse for sub-optimal gene circuits towards the left-hand side of panels 1-4, which indicates that no single motif 699 dominates. However, as we move to the right toward more and more precision-optimized gene circuits, we see that each of the 700 four molecular states converge to the same πi value of 1/4. This reflects the fact that, in maximally precise non-equilibrium 701 gene circuits, each of the molecular states has a uniform probability. 702 We next undertook an analogous exercise for each of the eight transition rates that make up the four state gene circuits from 703 Figure 1C. For this exercise, it proves to be more insightful to look at the overall magnitudes of the edges connecting states, 704 rather than of the individual parameters laid out in Figure 1C. For example, the magnitude of the edge from state 0 to state 1 705 is just equal to the binding rate, [c]k b , but the rate going from 1 to 2 is equal to the product of two parameters: η ab ka. To enable comparison, we normalize the transition rates according to the maximum rate in each gene circuit, such that all values 707 should fall between 0 and 1. Figure S19B summarizes our results. We observe a tight convergence toward the maximum value 708 of 1 for each of the four clockwise rates (panels 1-4 in Figure S19B). In contrast, we find that all counterclockwise rates (panels 709 5-8) are at least two orders of magnitude smaller, and follow relatively diffuse distributions, even for maximally precise gene 710 circuits. This makes sense: as long as they are small enough relative to the clockwise rates, their precise value does not matter.

711
Taken together, the results shown in Figure S19A and B indicate that precise non-equilibrium gene circuits exhibit highly 712 uniform molecular architectures wherein each microscopic state is equiprobable, all clockwise transition rates are uniform, 713 and all counterclockwise rates are negligible. This results in a "clock-like" system that maximizes the regularity of molecular 714 transitions ( Figure S18B).
715 Figure S19C and D show the results for a similar analysis examining the molecular architectures of sharp non-equilibrium 716 gene circuits. Figure S19C indicates that the steady state probabilities of states 0 and 2 converge to 1/2 for maximally sharp gene 717 circuits, while the probabilities for states 1 and 3 approach 0. This means that maximally sharp non-equilibrium gene circuits 718 behave as effective 2 state systems. What about transition rates? The results illustrated in Figure S19D indicate that a strict of 719 hierarchy of transition rates spanning multiple orders of magnitude is required in order to maximize transcriptional sharpness.

720
Appendix N (below) provides further details about how the relative magnitudes of transition rates dictate non-equilibrium 721 sharpness. Overall, we believe that the results laid out in Figure S19 confirm that maximally sharp and precise gene circuits these heatmaps, transition rate and interaction term magnitudes (k and η) were constrained such that 10 −5 ≤ kτ b ≤ 10 5 and 10 −5 ≤ η ≤ 10 5 , where τ b is the burst cycle time. η ab and η ib were further constrained such that η ab ≥ 1 and η ib ≤ 1, consistent with our assumption that the transcription factor activates the gene locus.) dissipation opens up a broad spectrum of S and P values that are not attainable at equilibrium. It is difficult to formulate 725 general statements that apply to all gene circuit models inhabiting these spaces beyond the upper equilibrium limit; however, 726 we can learn much by examining the architecture of gene circuits lying at the outer limits of non-equilibrium performance, since 727 these systems tend to distil the logic underpinning non-equilibrium performance gains into relatively simple regulatory motifs.

728
Such is the case for the IR-optimized non-equilibrium four-state systems depicted as gray circles in Figure 3A. In Main Text 729 Section D, we found that the driver of this IR is a twofold increase in sharpness relative to the upper equilibrium limit. To 730 realize this twofold sharpness gain, we find that non-equilibrium driving is harnessed to facilitate effective one-way transitions 731 between the active and inactive conformations-specifically, from states 1 to 2 and 3 to 0 in Figure 1C-ensuring that the system will have a strong tendency to complete transcriptional cycles in the clockwise direction (J > 0).

733
In addition to this non-equilibrium driving, sharpness maximization places strict constraints on the relative magnitudes of 734 microscopic transition rates within the network. To understand these constraints, it is instructive to consider a coarse-grained 735 representation of our network with a single ON state (2) and a single OFF state (0). We can obtain expressions for the two 736 effective transition rates in the network by recognizing that they are equal to the inverse of the mean first passage times 737 between states 2 and 0, which we can calculate using Equation S16 from Appendix A.

738
If we neglect the energetically disfavored transitions from 2 to 1, the effective ON rate (k * on in Figure 3D)  from Appendix L for the above effective two state system will yield an S value of 2 and a P value of 1, in agreement with our 757 numerical results from Figure 3A. We propose that this doubled concentration dependence can be conceptualized as a kind of 758 "on rate-mediated" proofreading. In contrast to classical kinetic proofreading, which works by amplifying intrinsic differences in 759 ligand off rates (29, 30), sharp networks amplify the concentration-dependence carried by binding rates, effectively "checking" the optimal strategy for more complex architectures featuring multiple activator binding sites, we employed parameters sweeps 766 to examine the space of achievable S and P values for gene circuits with 1-5 activator binding sites (and N B fixed at 1).
767 Figure S2A shows the results of this analysis. For ease of comparison across different models, we plot the relative gains in S 768 and P for each model with respect to their maximum equilibrium values. For instance, the maximum equilibrium S value for 769 the N B = 2 model is 2, so a non-equilibrium gene circuit model with two binding sites that exhibits an S value of 2.5 will be 770 calculated to have a sharpness gain of 2.5/2 = 1.25.
771 Figure S2A reveals that the sharpness-precision tradeoff observed for the one-binding site model persists and, indeed, 772 becomes more severe for systems with additional activator binding sites. We see that the non-equilibrium gain in S is fixed at 773 approximately 2. And while the non-equilibrium gain in P increases from √ 2 for N B = 1 to approximately 2.25 for N B = 5, 774 these P maxima (peaks in the upper left quadrant of Figure S2A) occur at lower and lower values of S, which renders them 775 more and more disadvantageous from an IR perspective. As a result, when we plot IR-optimal gene circuits for each value of 776 N B (colored circles in Figure S2A), we find that they are invariably located in regions where S/Seq ≈ 2 and P/Peq ≈ 1. These 777 Figure S2B shows the range of achievable 780 non-equilibrium gains in S and P for systems with 1-4 activation steps (and N B = 1). Once again we observe a strong tradeoff 781 between sharpness and precision, which suggests that this incompatibility is a general feature of transcriptional systems. And, 782 once again, we find that IR-maximizing gene circuits (colored circles) lie at or near the right-most edge of achievable parameter 783 space, indicating that dissipating energy to enhance transcriptional sharpness (rather than precision) remains the best strategy 784 for maximizing the IR.

785
Yet unlike the systems examined in Figure S2A, Figure S2B reveals that the non-equilibrium gain in transcriptional sharpness 786 (S) is not fixed but, rather, increases with the number of molecular steps from a factor of two when N A = 1 to a factor of 787 five when N A = 4. This indicates that increasing the number of dissipative molecular steps in the activation pathway raises  Figure S20A illustrates the second 796 "Tf-centric" scenario for the case of a simple two state network with a single binding site and no possibility of a conformation 797 change at the locus; however the same idea applies equally well for the 4 state network we considered above, as well as more 798 complicated architectures. Here transcriptional specificity is defined as the ratio of the average steady state transcription rates 799 at on-and off-target gene loci: where f TF is the specificity under the TF-centric framing of the problem, and rr and rw indicate the transcription rates at the 802 cognate (right) and non-cognate (wrong) loci, respectively. In (11), the authors show that specificity for the two state system 803 shown in Figure S20A is given by: From Equation S83, we see that the activator specificity is bounded from above by α. Moreover, this upper performance limit 806 is achieved only in an off rate-dominated regime where ku >> [c]k b , which the authors in (11) note leads to a runaway increase 807 in transcriptional noise with increasing specificity under the constraint that the mean transcription rate must remain constant.

808
As a result, the authors conclude that non-equilibrium network architectures are necessary in order to improve specificity and 809 minimize transcriptional noise (11).

810
In analogy to the parallel case outlined above, we employ a "gene-centric" definition ( Figure S20B), which takes specificity 811 as the ratio of the average number of cognate and non-cognate factors bound while the locus is in a transcriptionally productive 812 state, normalized by concentration: In the case of the two state model shown in Figure S20B, this is simply given by the ratio of fractional occupancies of states 1 815 and 1 * : Since in steady state this is necessarily at equilibrium (note the absence of cycles), we can express this ratio as a function of 818 the difference between the energies of cognate and non-cognate factor binding, εc and εw, which leads to [S86] 820 Next, we note that the energies can be expressed as ratios of binding and unbinding rates, such that activators: whenever the cognate activator (green square in Figure S20B) is bound, non-cognate factors cannot bind.

833
A key limitation of this approach is that it neglects the presence of non-specific stretches of regulatory DNA, even at cognate 834 gene enhancers. Thus, to more accurately reflect the specificity challenges faced by real gene loci, a synthesis of the two 835 approaches summarized above will be necessary, which considers competition between cognate and non-cognate factors to bind 836 and activate a gene locus that features both specific binding sites (which favor the cognate activator) and neutral sites (to 837 which all activator species bind non-specifically). One expectation for such a scenario is that the simple equality stated in 838 Equation S90 will no longer hold, and tradeoffs similar to those observed in (11) will again emerge; although, this time, the 839 severity of these tradeoffs will depend on w/c. in Figure 4A is fixed at f eq = α, irrespective of molecular details. For this system, the specificity is simply equal to the 843 concentration-normalized ratio of the occupancies of states 2 and 4: Since we're assuming equilibrium conditions, we can re-express this as a difference between state energies, such that [S92]

847
In each case, we can express the state energies as the sum of the energy due to cognate or non-cognate factor binding with The key is to note the the first terms on the right-hand side of the above expressions are identical to Equations S87 and S88.

854
Since the remaining energy terms are identical, they will cancel out, such that we once again have where we now see that the numerator and denominator are identical in the right-most ratio. Thus, we find that 880 f eq = α. [S100] [S101] 888 where we assume that α ≤ f ≤ α 2 .

889
To arrive at this bound, we make use of insights gained in Appendix N, where we used first passage times to examine the key 890 microscopic conditions for the twofold gain in sharpness away from equilibrium observed in Figure 3A. Even for the simple six 891 state system illustrated in Figure 4A, our system has eight degrees of freedom when operating away from equilibrium. As such, 892 a key part of our approach will be to first reduce this complexity as much as possible while preserving the salient behaviors, 893 namely the possibility for non-equilibrium gains in sharpness and specificity. After this, we identify a tuning parameter, β, that 894 can be used to interpolate between maximally sharp to maximally specific non-equilibrium gene circuit architectures. Since 895 the expressions for non-equilibrium gene circuits are, in general, quite complex, we sketch the key steps here and direct the 896 reader to the Mathematica notebook entitled "sharpness_specificity_bound_derivation.nb" on the project git repository for 897 additional details: https://github.com/nlammers371/noneq-gene-regulation.git. Note that we work in units of c throughout, 898 such that c = 1.

899
To begin, we strip unnecessary dimensions from our system. We set η ib ki and ka to the same generic rate, k1. Next, we set 900 η ab ka, ηuaku, and k b to a second rate parameter, k2. Finally, we set ki equal to βη bs k b , where β is our interpolation parameter.

908
These limits lead to a further simplified system that can be used to investigate fundamental tradeoffs between intrinsic 909 sharpness and specificity. For this stripped-down system, we find that the expression for specificity, f , is quite simple: where we see that all dependence on microscopic transition rates has dropped out, with the exception of our interpolation 912 parameter, β. Furthermore, tuning β causes Equation S103 to shift from equilibrium levels (f = α when β = 0) to the 913 non-equilibrium limits revealed by Figure 5B (f = α 2 when β ≫ α, w).

920
Here again, as with Equation S103, we see that all dependence on the on rate parameters drops away. Further, it is easy to see 921 that this expression goes to 2 when β = 0 and 1 when β ≫ α, w. Thus, when β is small, our system exhibits equilibrium levels 922 of specificity and non-equilibrium levels of intrinsic sharpness and, when β is large, it exhibits non-equilibrium specificity and 923 equilibrium sharpness levels. Thus, we have succeeded in our initial aim to establish a simplified model that can capture the 924 tradeoffs between sharpness and specificity revealed by our numerical parameter sweeps ( Figure 5B).

925
As a final step, we can solve Equation S103 to obtain an expression for β in terms of f : [S107]

927
Plugging this expression into Equation S105 and simplifying yields an expression for S0 as a function of f : where we assume that α ≤ f ≤ α 2 . Thus, we have obtained the final S0 expression depicted in Equation S101. Observe that 930 S0 ≈ 2 when f = α and S0 ≈ 1 when f = α 2 . Equation S108 gives the dashed black curve bounding f vs. S0 sweep results 931 shown in Figure 5B, confirming that it represents the limiting behavior of intrinsic sharpness and specificity for non-equilibrium 932 realizations of the six state model from Figure 4B.