Nearly maximal information gain due to time integration in central dogma reactions

Summary Living cells process information about their environment through the central dogma processes of transcription and translation, which drive the cellular response to stimuli. Here, we study the transfer of information from environmental input to the transcript and protein expression levels. Evaluation of both experimental and analogous simulation data reveals that transcription and translation are not two simple information channels connected in series. Instead, we demonstrate that the central dogma reactions often create a time-integrating information channel, where the translation channel receives and integrates multiple outputs from the transcription channel. This information channel model of the central dogma provides new information-theoretic selection criteria for the central dogma rate constants. Using the data for four well-studied species we show that their central dogma rate constants achieve information gain because of time integration while also keeping the loss because of stochasticity in translation relatively low (<0.5 bits).


INTRODUCTION
Francis Crick described the central dogma of molecular biology as the unidirectional and sequential flow of information from DNA to RNA to protein through transcription and translation, 1-3 which prompts the question: Can we rigorously quantify information transfer in cells from environmental stimuli through transcription and translation? In the past two decades, experimental and computational progress has demonstrated that information transfer can be quantified in biological systems of varying complexity. 4-8 However, a quantitative assessment of Crick's statement requires examining how transcription and translation modulate the information about the environment available in cells.
Quantification of information transfer in biology has been enabled by single-cell measurements. 9-12 Previous work has examined information transfer from the environment to either the transcript or the protein expression. 6,13-15 However, those studies have mainly focused on biological networks, 16 cellular decision making, 17 or intracellular distribution of information. 18 There has been a renewed interest in probing the central dogma in recent years. 19,20 However, a comprehensive information-theoretic treatment encompassing both transcription and translation, which may explain the naturally occurring central dogma rate constants and inform the design of engineered cellular sensing systems, is still lacking.
Here, we use single-cell measurements and information theory to demonstrate that biology achieves nearly maximal information transfer from the environmental input through translation to the protein expression. We find that the information transfer from the environmental input to the protein expression is higher than the information transfer from the same input to the transcript expression. This contradicts an elementary result from information theory that information should be lost through a simple serial connection of information channels. 21 To explain this unexpected observation, we develop an information channel model whose properties are functions of the central dogma rate constants. The channel model highlights two distinct properties that affect the information gain during translation: (1) Time integration of the transcript expression, where the amount of signal integration is set by the ratio between the transcript and the protein decay rate constants, and (2) the translation power, i.e., the ratio between the translation rate constant and the protein decay rate constant, or the steady-state mean protein expression per transcript copy, which determines the mean protein expression level. We estimate the translation loss as the difference between the maximum possible information gain and the true protein-level information gain. By computing the information gain for multiple species, we demonstrate that the naturally occurring central dogma rate constants

Quantification of information transfer
To quantify the transfer of information in the experimental system, we consider transcription and translation as information channels, and we determine the mutual information between the environmental input  20 (B) Sequential channel model of the central dogma process, which receives an input, X, and produces transcripts, m, and then proteins, g, as sequential outputs. (C) Experimental result for the transcript-level mutual information (IðX; mÞ, left), and the protein-level mutual information (IðX; gÞ, right) using data from. 22 (D) IðX; mÞ (left) and IðX; gÞ (right) from simulated expression data using a lac operon-based reaction network. 27 In (C) and (D), the green dots in the top panels are the average expression values. The shaded region bounds the 5%-95% percentiles; the 2D heat maps in the bottom panels show the mutual information values over the space of probability distributions of the input, PðXÞ. The white dots in the heatmap indicate the maximum mutual information. In (C) the transcript value m is in RNA counts/cell and the protein value g is in molecules of equivalent fluorescein. 63 In (D) the transcript and protein values are molecules per cell from Gillespie simulations. The maximum mutual information, or the channel capacity, is associated with an optimal input distribution. 21,24,27 The mutual information is higher near the (mean(X),std(X)) coordinates for the optimal distribution and decreases for input distributions that are away from the optimal one. iScience Article and the transcript expression or the protein expression (method details). Mutual information associated with an information channel depends on both the transition matrix of the channel (PðoutputjinputÞ) and the probability distribution of input signals, 21,24 and the maximum mutual information over all input distributions is the channel capacity. 12,21 In this work, both from the experiments and simulations we obtain samples of the output, either the transcript or the protein expression level, for a set of values of the input. We estimate the conditional distributions for each input, PðoutputjinputÞ, by binning the output samples. Mutual information is obtained for a given input distribution, PðinputÞ, by computing the function Iðinput; outputÞ = P input PðinputÞ P output PðoutputjinputÞlog 2

PðoutputjinputÞ
PðoutputÞ . 24 To compute the channel capacity from the conditional distributions we used the Blahut-Arimoto algorithm, 25 which is an alternating optimization algorithm that has been proven to converge to the maximum mutual information 26 (STAR Methods). The estimated channel capacity can be biased because of the number of output samples and the number of bins used to construct the conditional distributions. We used existing bootstrapping methods to compute the unbiased estimate of the channel capacity 4,12,27,28 (STAR Methods). The estimated channel capacity provides the maximum information transfer rate when the input is fluctuating sufficiently slowly for the output to reach the stationary state. The information transfer rate is lower than capacity for fast fluctuations of the input.
If translation acts as a simple information channel that only degrades the information received from the transcription channel ( Figure 1B), then we expect the protein-level channel capacity to be lower than the transcript-level channel capacity. Surprisingly, we found the opposite: the experimentally observed protein-level channel capacity (cðX; gÞz 1.5 bits) is higher than the transcript-level channel capacity (cðX; mÞz 1.0 bits, Figure 1C). Hence, there exists a gain in information about the input in the translation channel. Moreover, after evaluating both the transcript-level (IðX; mÞ) and protein-level mutual information (IðX; gÞ) for a large set of input distributions, we found that the protein expression always contains higher information about the input compared to the transcript expression ( Figures 1C, S1 and S2).
The observed information gain in IðX; gÞ could be because of two artifacts: (1) The transcript expression measurement could be noisier than the protein expression measurement, and (2) there could be unknown biochemical pathways that transfers information directly from the input to the protein expression, bypassing translation. To show that IðX; gÞ R IðX; mÞ is a characteristic property of the central dogma without requiring the above artifacts, we performed Gillespie (kinetic Monte Carlo) simulations of a biochemical reaction network that represents our experimental gene expression system 27 (method details). The simulated biochemical reaction network contains no unknown reaction pathways and directly provides transcript and protein counts, excluding measurement noise as a factor. Each Gillespie simulation was performed for a fixed value of the input and the resulting protein expression level is an ergodic process. 29,30 The mutual information from the Gillespie simulations data were consistent with the experimental results: IðX; gÞ R IðX; mÞ for all input distributions considered ( Figures 1D and S2). We will demonstrate that the gain in IðX; gÞ is because of time integration of the transcript expression, and also depends on the translation power.
We used additional Gillespie simulation to explore the impact of the central dogma rate constants on the information transfer. We observed that cðX; mÞ increases with increasing transcription power (the ratio of the transcription rate constant to the transcript decay rate constant, or the steady-state mean transcript expression, Figure 2A), and cðX; gÞ increases with increasing translation power ( Figure 2B). These trends are similar to the property of simple information channels, e.g., Gaussian or Poisson channels, where channel capacity increases with channel power. 21,31,32 When the translation power is 1, then cðX; gÞzcðX; mÞ, and higher values of translation power appears to increase cðX; gÞ toward an asymptotic value ( Figure 2B). We will demonstrate that this asymptotic value depends on the ratio of the transcript decay rate constant to the protein decay rate constant. Of interest, at fixed transcription and translation powers (i.e., fixed mean protein expression level), cðX; gÞdecreases with increasing protein decay rate constant ( Figure 2C). So, the increase in cðX; gÞ (i.e., the information gained during translation) depends on the transcript and protein decay rate constants and the translation power.

Channel model
To develop a channel model for the information gain during translation, we start by considering a fundamental result in information theory: Information about the input can only be degraded as it transfers through each information channel. Or, if two information channels are connected in series, then the channel ll OPEN ACCESS iScience 26, 106767, June 16, 2023 3 iScience Article capacity of the combined channel is less than the channel capacity of the first channel. 21,33 However, this result is only true for ''delayless processing'', in which the second channel only receives one symbol at a time from the first channel to produce a response (i.e., there is no accumulation of the first channel's output by the second channel 34,35 ). In the context of the central dogma, the transfer of information from transcription to translation is only delayless if the response times for transcription and translation are equal. In general, however, those response times can be different.
To examine how the difference in the two response times causes a gain in cðX; gÞ, we used a generic but sufficient model for transcription and translation that includes the four central dogma rate constants: transcription, transcript decay, translation, and protein decay ( Figure 1A and method details where g is the number of proteins. From the deterministic ODEs that are obtained by ensemble-averaging the master Equations 1 and 2(method details), the response times for transcription and translation are 1=k d;m and 1=k d;g , respectively. Hence, delayless processing in the central dogma requires k d;m = k d;g . However, in general k d;m R k d;g , typically by a factor of 10. 20,36-42 Consequently, the translation response time is longer than the transcription response time, and the translation channel effectively receives and integrates multiple outputs from the transcription channel. There are existing studies on the emergence of time integration in biochemical reaction networks, 43,44 but these earlier studies mainly used signal-tonoise ratio instead of quantifying the information transferred from the environmental input to the biological output. Information transfer quantifies the biochemical work that can be done because of signal transduction. 45 Therefore, it is important to go beyond a signal-to-noise ratio analysis of central dogma systems and assess the information gain because of time integration.

Maximum possible information gain during translation
To determine the amount of information lost because of stochasticity in translation, we first calculated the maximum possible information gain because of time integration during translation. In the deterministic model of translation, the protein expression gðtÞ is the convolution of the transcript trajectory, mðtÞ, with a time integration kernel, f ðtÞ = e À k d;g t and multiplied by the translation rate constant k g ( Figure S4 and  iScience Article method details). In this deterministic model for time integration, k g only scales the convolution output, without increasing the dispersion in the protein expression level, and therefore does not affect the protein-level channel capacity. So, the result of an ideal, noise-free time integration during translation is the hypothetical protein expression level: g ideal ðtÞhðf Ã mÞðtÞ. We define the ideal channel capacity c ideal ðTÞhcðX; g ideal Þ, as a function of the dimensionless integration time, Thk d;m =k d;g . Because the analytical solution to the transcript expression distribution, PðmjXÞ, is known, 6,46 we can construct an analytical approximation of g ideal . The number of uncorrelated outputs of the transcription channel received by the translation channel within the latter's response time is T. Therefore, we approximate g ideal z P T i = 1 m ðiÞ , where each m ðiÞ are independent and identically distributed with the distribution PðmjXÞ. Because the transcript expression level has a negative binomial distribution with NBðr; pÞ , 46 the ideal integration output g ideal has the distribution NBðrT; pÞ. The ideal channel capacity using g ideal $ NBðrT; pÞ matches the value from numerical convolution of the transcript expression trajectory (Figure S5). Information gain because of time integration has been previously studied for transcriptional cascades, 47,48 but these studies used a Gaussian noise model. We know that transcript expression has a negative binomial distribution, 46 so our estimate of the maximum possible information gain is likely to be more accurate. But the mechanism of information gain, i.e. time integration, is the same in 47,48 and in this work.
We identified the combined effect of k m ;k d;m , and k d;g on c ideal ðTÞ using multiple simulated datasets (Figure 3A and method details). First, we determined the transcript expression distribution as a function of k m and k d;m , and then we determined Pðg ideal XÞ using the analytical approximation to compute c ideal ðTÞ (Figures 3A andS6 and method details). At T = 1, c ideal ðTÞ = cðX; mÞ and at longer integration times, c ideal ðTÞ > cðX; mÞ. Stochasticity in translation reduces the protein-level channel capacity, but this reduction is against c ideal ðTÞ and not against cðX; mÞ. We only explore the ideal channel capacity for T R 1 in Figure 3A, because most of biology exists in this region. 36-42 However, we can estimate the effect of T < 1 using the analytical result for the ideal integration output. Because g ideal $ NBðrT; pÞ, the relative standard deviation is proportional to T À 0:5 . So, for T < 1, dispersion will increase after translation and subsequently reduce the ideal channel capacity.

Information lost because of stochasticity in translation
To determine the loss in information after time integration, we used Gillespie simulations of the operator state transition, transcription, transcript decay, translation, and protein decay reactions together (method details) to obtain PðgjXÞ and then compute cðX; gÞ for a range of integration times. Both Equations 1 and 2 are birth-death processes and the protein expression level from simulations is an ergodic process. 30,49,50 For each input level we estimated PðgjXÞ from the stationary state trajectory data from Gillespie simulations. We computed the protein-level information gain curves, cðX; gÞ vs. T, for five values of the translation power, k g =k d;g (method details). At low translation power, there is relatively more noise in the translation output, and cðX; gÞ is noticeably lower than c ideal ðTÞ ( Figure 3B). As the translation power increases, the protein-level channel capacity asymptotically approaches the ideal channel capacity.
We observed three features in the information gain curves. First, the protein-level channel capacity increases monotonically with integration time but has a plateau at longer integration times. This plateau is most prominent for translation powers k g =k d;g % 100 ( Figure 3B). Increasing the translation power shifts the plateau region to longer integration times. Second, the translation loss, c ideal ðTÞ À cðX; gÞ, is generally small at low integration times (prominently for k g =k d;g R 100 when T % 100), but in the plateau region the translation loss increases significantly with T. Third, for a fixed integration time the increase in protein-level channel capacity diminishes with increasing translation power, as evident from the small difference in the curves for translation powers 10 3 and 10 4 in Figure 3B. This feature agrees with the previously established result of diminishing gain in the signal-to-noise ratio with increasing translation rate constant. 43

Information gain in naturally evolved systems
To estimate the information gain and translation loss for naturally evolved systems, we performed stochastic simulations of the central dogma system, equations Eq. (1) and Equation 2, using typical rate constants for four species from published data: E. coli, Saccharomyces cerevisiae, Mus musculus, and Homo sapiens (Table S1 and method details). 20,36-42 We used the decay rate constants to determine the distribution of the dimensionless integration time for each species ( Figure 3C). The median T for E. coli is 20 (5%-95% percentile range: 5 to 44). For eukaryotic species, the median T is lower: 6 (1-53) for S. cerevisiae, 5 (1-24) for ll OPEN ACCESS iScience 26, 106767, June 16, 2023 5 iScience Article M. musculus, and 6 (2-21) for H. sapiens ( Figure 3C and method details). Bacillus subtilis appears to have a similar T value as E. coli. 51,52 So, prokaryotes may generally have longer integration times than eukaryotes. However, the dimensionless integration time is relative to the transcription response time, 1= k d;m . Because k d;m is larger for prokaryotes, the duration of integration in the units of time is still shorter for prokaryotes compared to eukaryotes. Based on the gene ontology enrichment analysis for the M. musculus data, 40 genes with relatively high integration times are associated with dephosphorylation and RNA processing, and genes with relatively low integration times are associated with defense response, homeostasis, and proteolysis -processes that may require faster response times.
We computed c ideal ðTÞ and cðX; gÞ for each species to determine the translation loss ( Figure 3D, Tables S2  and S3). Within the typical range of integration times (the 5%-95% percentile range of T), the translation loss is less than 0.5 bits ( Figure 3D), or the translation power is nearly adequate to transfer the maximum amount of information. Our simulation results demonstrate ( Figure 3B), that it is possible to have central dogma rate constants with high translation loss (> 1 bit), but we do not observe such rate constants combinations in the literature data. So, low translation loss could be an evolutionary selection criterion for the central dogma rate constants. As a corollary to the observation of low translation loss, the naturally occurring central dogma rate constants do not achieve the plateauing information gain possible for the given translation power. So, maximizing the information gain for a fixed mean protein expression level, by increasing the integration time, is probably not an unconstrained selection criterion. The high translation loss associated with the plateauing protein-level channel capacity can be a constraint against large iScience Article integration times. Moreover, the observed integration times do not span the full range of low translation loss and stop well below the onset of the plateau region ( Figure 3D). Hence, although low translation loss might be an evolutionary selection criterion, it does not appear to be the only criterion.

DISCUSSION
To understand why naturally evolved systems do not have integration times near and beyond 100, we considered the fluctuation time period of the environmental input. High integration times correspond to longer translation response times, and central dogma systems only operate at channel capacity if the environmental fluctuations are slower than the translation response time (Figures S9 and S10, method details). When the input fluctuation period is less than or comparable to the translation response time, then the translation channel output is correlated with the previous outputs. The correlated outputs decrease the effective channel capacity in a way that is analogous to the reduced capacity of slow-fading information channels. 53,54 Using simulated Pðg ideal XÞ data from a fluctuating input protocol, we found that environmental fluctuations have to occur roughly 10 times slower than the integration time for the ideal mutual information IðX; g ideal Þ to be close to capacity cðX; g ideal Þ ( Figure S10). Thus, we speculate that naturally evolved central dogma systems use time integration for information gain but remain within the relatively fast translation response times. Earlier work has shown that increasing the number of integration channels in a linear network will keep on increasing the mutual information between the input and the terminal output. 47 But a linear accumulation of time integration channels also increases the response time of the terminal output, and information transfer is low for fluctuations that are faster than the response time of the output.
The information-theoretic criteria of translation loss and response time emerge as additional constraints for optimal protein expression, which has been previously explored using bioenergetics 55,56 and resource allocation. 57 Of interest, the speed of response has also been identified as a selection criterion using energy and fitness-based analysis. 58,59 Because of the connection between thermodynamics and information, 60 and between information and fitness, 24,61 it is likely that these constraints on the optimal protein expression are dependent on each other. Our findings encourage new studies on controlling the fluctuation time period of the input in evolution experiments and observing the subsequent change in the integration time. Evolution experiments in slowly changing environment can reveal if high integration time is a selection criterion and if the high translation loss region is indeed forbidden. Similar alternating environment experiments have been performed to study the impact on fitness. 62 Even under a matched response and fluctuation time period, we can start with a central dogma system at high translation loss and observe if the mean protein expression increases to reduce the loss. We expect that the energetic and resource constraints will affect the magnitude of reduction in the translation loss.
Information gain because of time integration is directly applicable to gene regulatory networks, where one gene (input) controls the expression of another gene (output). However, we need additional studies to determine the effect of feedback on ideal channel capacity and translation loss. We had previously analyzed the protein-level information transfer, IðX; gÞ, through synthetic biochemical reaction networks (BRNs) under positive feedback. 27 We found that positive feedback stabilizes the protein-level mutual information to the same value for a large set of input distributions. The synthetic BRNs in that study 27 were based on the lac operon, where feedback directly controls the operator state. Information gain because of time integration of the transcript expression level is a separate phenomenon. Therefore, we expect information gain during translation to persist even under feedback.
Our information gain model shows that time integration during translation is a general feature that results in a gain of information at the protein expression level. Furthermore, the central dogma rate constants for four well-studied species are typically in a regime where the information loss because of stochasticity in translation is relatively low, suggesting that time integration with low translation loss while avoiding slow response times may be selection criteria for naturally evolved central dogma systems. Our findings also suggest that the typical integration time is lower in eukaryotes than prokaryotes, and decay rate data for additional species could confirm whether this trend is universal. Because the translation loss for natural central dogma systems is small, the ideal channel capacity provides a fast estimate of the protein-level channel capacity that does not require protein expression data, which could be useful for large surveys of biological information transfer.
Beyond transforming the central dogma process from a set of biochemical reactions to an information acquiring and integrating system, these insights are also relevant for engineering synthetic biological

Limitations of the study
Our model does not include positive or negative feedback during the gene induction process. Quantification of the maximum information gain because of time integration and translation loss in feedbackcontrolled systems will require additional mathematical and computational analysis. We have used existing techniques to correct the estimated channel capacity for finite-sampling bias and bin-size selection, but to the best of our knowledge there is no proof that these methods completely remove the estimation error. We need the central dogma rate constants data for more species to confirm the universality of our findings about the integration time and translation loss in naturally occurring central dogma systems.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
We would like to especially thank David Ross for constructive discussions during the research and preparation of this manuscript. We would also like to thank Samuel Schaffter, Peter Tonner, Elizabeth Strychalski, and Charles Camp for thoughtful feedback on this manuscript.

AUTHOR CONTRIBUTIONS
S.S. and J.R. conceptualized the study. J.R. performed the experiments. S.S. and J.R. performed the data curation. S.S. performed the data analysis and the simulations. S.S. wrote the manuscript with contributions from J.R.

DECLARATION OF INTERESTS
The authors declare no competing interests. This research was conducted when both authors were employed at the National Institute of Standards and Technology. S.S. is currently employed at the Georgetown Lombardi Comprehensive Cancer Center. J.R. is currently employed at Booz Allen Hamilton. The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy, opinion, or position of their current employers.

Lead contact
Further information and request for resources should be directed to and will be fulfilled by the lead contact, Swarnavo Sarkar (ss4235@georgetown.edu).

Materials availability
This study did not generate new unique reagents.

Data and code availability
All original code has been deposited at https://github.com/sarkar-s/InCens. 65 and is publicly available as of the date of publication. DOIs related to any other data sources are listed in the key resources table. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Computation of mutual information and channel capacity
To assess the transfer of information from the environmental input to either the transcript or the protein expression levels we computed the mutual information in bits as IðX; gÞ = X , mutual information also depends on the input distribution, PðXÞ. The maximum possible mutual information through an information channel for all possible input distributions is the channel capacity, which is calculated by maximizing the mutual information with respect to the input distribution: The conditional transcript and gene expression distributions, PðmjXÞ and PðgjXÞ, respectively, are input-to-output transition matrices. The information channel model of X/m or X/g are defined by the respective transition matrices. Throughout this work we computed the channel capacity from the transition matrices using the well-established Blahut-Arimoto algorithm, as described in. 4,12,25,27 We are providing a short review of the Blahut-Arimoto algorithm, which we have also summarized in an earlier publication. 28 Given the input distribution p : = PðXÞ and the channel transition matrix Q : = PðgjXÞ, the mutual information Iðp; QÞ : = IðX; gÞ is the solution to the following maximization problem, where R is a variable output-to-input transition matrix or PðinputjoutputÞ. From Equations 3 and 5, the maximum information transfer or the channel capacity is the solution to the double maximization problem, The Blahut-Arimoto algorithm to compute the channel capacity for a given input-to-output transition matrix, Q, is built using two properties.
1. For a fixed input distribution, p, the trial output-to-input transition matrix that maximizes Jðp; Q; RÞ is 2. For a fixed output-to-input transition matrix, the input distribution that maximizes Jðp; Q; RÞ is The Blahut-Arimoto algorithm is an iterative method that uses Equations 7 and 8 to compute the input distribution that solves the maximization problem in Equation 4.
The channel transition matrix Q is obtained by binning the samples of the protein expression level for a fixed input value. The estimated channel capacity can be sensitive to the choice of bin size. If the bins are too coarse, then the channel capacity is underestimated. If the bins are too fine, producing combed conditional distributions, then the channel capacity will be overestimated due to finite-sampling bias. 4, 12 We corrected for finite-sampling bias as described in. 4,27,28 Briefly, this procedure consists of sampling fractions of multiple sizes of the data, and multiple replicates for each fraction. The channel capacity values for all the fractional samples are computed, which provides a set of biased channel capacity values as a function of the inverse sample size. The unbiased channel capacity is then obtained by extrapolating to the infinite samples limit. This process of estimating the unbiased channel capacity is repeated for increasing number of bins to identify the number of bins that are large enough to capture the variance but not too large to produce combed distributions that can overestimate the channel capacity. 4 The mutual information landscapes shown in Figures 1C and 1D were computed from the transcript and the protein expression distributions, PðmjXÞ and PðgjXÞ, respectively, using Sparse Estimation of Mutual Information Landscapes (SEMIL) as described in. 27

Transcript and protein-level mutual information landscapes for biological replicates
To check the reproducibility of the transcript and the protein-level mutual information values we computed the mutual information landscapes with data from 3 biological replicates of the inducible gene expression system. 23 As mentioned in the main text, the transcript-level expression was measured using microscopy of FISH probes, and the protein-level expression was measured using flow cytometry. 22 The transcript and the protein-level mutual information landscapes for the 3 replicates are shown in Figure S1. To check that the central dogma reactions, consisting only of sequential transcription and translation, can cause the gain in the protein-level mutual information, we computed the mutual information landscapes for a simulated reaction network that is analogous to our experimental system. This reaction network was developed from a previously-published model of the lac operon, 27,29 consisting of the same set of reactions involving the input (IPTG) and the operator as present in the lac operon but without the positive feedback due to the lacY gene, as our experimental system does not contain feedback. We used Gillespie simulations to compute the transcript and the protein expression levels for this model reaction network. We used exactly the same set of reactions and rate constants shown in the SI of 27 in Tables 4 and 5, respectively. We subsequently used the transcript and the protein expression data to compute the mutual information landscapes as described in. 27 The transcript and the protein-level mutual information landscapes for the simulated inducible gene expression system are shown in Figure 1D of the main text.

Difference in the transcript and protein-level mutual information
To show that the gain in the protein-level mutual information exists for all input distributions, we present the difference between the protein-level and transcript-level mutual information landscapes in Figure S2.
The protein-level mutual information is higher than the transcript-level mutual information for all the input distributions in the landscape, so the gain in protein-level mutual information is not confined to the input probability distribution that causes maximum information transfer, but is most likely true for all input distributions.

Generic stochastic model of transcription and translation
To generate the stochastic transcript and protein expression data for a generic central dogma system, we considered that the information transfer from the environment to the protein expression level occurs through the following three processes.
The set of processes, (9)-(11), is a sufficient but minimal model of the central dogma system. The interaction of the environmental input with the operator is condensed into the rate constants, k ON and k OFF . In this work, we consider the following form for the k ON and k OFF rate constants k ON = aðð1 À lÞX + lÞ k OFF = að1 À lÞð1 À XÞ (Equation 12) where 0 % X % 1 is the environmental input value. l is the leakiness, or l = k ON =ðk ON + k OFF Þ when X = 0, which determines the leaky transcription in the absence of the environmental input. a controls the rate or the frequency of switching between active and inactive operator states, as k ON + k OFF = a independent of the input value X. The transcript and the protein-level channel capacity depends only on the expression distributions, PðmjXÞ and PðgjXÞ, respectively. The shape of mean dose-response curves, CmD-vs-X or CgD-vs-X, does not change the channel capacity, as long as the distributions PðmjXÞ and PðgjXÞ remain unchanged.
To explain the transition in the transcript-level channel capacity observed in Figure 2A of the main text, we present a parametric simulation study on the growth in the transcript-level channel capacity, cðX; mÞ, as a  Figure S3). We computed cðX; mÞ-vs-k m =k d;m , for a fixed transcript decay rate constant, k d;m = 0:5 min À 1 and three values of the frequency parameter a = f0:1; 1; 10g min À 1 . We determined the transcript distributions PðmjXÞ as a function of k m ; k d;m ; k ON , and k OFF using the analytical result for transcript expression distribution, 46 which we subsequently used to compute cðX; mÞ. We observed that when a is comparable to or lesser than k d;m , then there is a sharp decrease in the rate at whichcðX; mÞ increases with k m =k d;m ( Figure S3). This transition occurs because at low k m = k d;m , the transcript expression distribution PðmjXÞ is more Poissonian, Fano factor z1, but becomes increasingly more over-dispersed, Fano factor > 1, for higher k m =k d;m . 6 When a [ k d;m , then the transcript expression distribution remains close to Poissonian for a higher range of transcription powers. Therefore, we do not observe a change in the growth rate of cðX; mÞ for a = 10 min À 1 in Figure S3. The transition will still occur for a = 10 min À 1 , but at a higher value of k m =k d;m .

Master equations
The master equation for the active state of the operator is Since the operator state can be either active or inactive at a time, the master equation for the inactive state of the operator is similar to Equation 13, with the right hand side multiplied by À1, i.e., The master equation for the transcript expression level (or transcript copy number), m, is dPðmjOÞ dt = k m OPðm À 1jOÞ + k d;m ðm + 1ÞPðm + 1jOÞ À k m OPðmjOÞ À k d;m mPðmjOÞ

(Equation 15)
The master equation for the protein expression level (or protein copy number), g, for a fixed valued of transcript expression, m, is dPðgjmÞ dt = k g mPðg À 1jmÞ + k d;g ðg + 1ÞPðg + 1jmÞ À k g mPðgjmÞ À k d;g gPðgjmÞ (Equation 16) Both Equations 15 and 16 are one-step master equations.

Governing equations for mean transcript and protein expression levels
The governing equation for the ensemble-averaged mean transcript expression level, CmjOD, for a fixed operator state O is obtained by multiplying the transcription master Equation 15 with the transcript expression and summing over all possible values as gðg À 1ÞPðgjmÞ À k g mCgjmD À k d;g Cg 2 mD = k g m À k d;g CgjmD

(Equation 21)
The solution to Equation 21 is which has a relaxation time constant 1 k d;g . The convolution kernel f accounts for the time integration of the stochastic transcript expression that occurs due to the response time of the translation process, 1=k d;g . We omit the translation rate constant k g from the operator in Equation 25, because it only scales the output of the convolution without introducing more stochasticity to the output, which is necessary to have any impact on the information transfer. Example of the convolution output, which we have named the ideal integration output, as a function of the ratio k d;m = k d;g is shown in Figure S4. iScience Article Using Equation 26 we transform mðtÞ to g ideal ðtÞ. From a transcript trajectory we obtain the transcript expression distribution, PðmjXÞ, and from the ideal integration output trajectory we obtain the distribution, Pðg ideal XÞ. The channel capacity of the integrated output cðX; g ideal Þ is a function of, k d;g , or more specifically of the ratio k d;m =k d;g . cðX; g ideal Þ as a function of the integration time T = k d;m =k d;g is the information gain due to deterministic time integration during the translation process.

Translation output for a time-dependent transcript expression
We constructed an analytical approximation of the ideal integration output of the transcript expression. We approximated that after every interval of the transcription response time, 1=k d;m , the transcript expression is represented using independent and identically distributed random variables, all with the distribution PðmjXÞ. Since the total number of intervals of 1=k d;m during the translation response time is T = k d;m = k d;g , we define the ideal integration output as where k ON and k OFF were calculated for each X from Equation 12. When b > 1, then the transcript expression distribution is NBðr;pÞ, with p = ðb À 1Þ=b, and r = CmD=ðb À 1Þ. When b = 1, then transcript expression distribution is PoisðlÞ with l = CmD.
Effect of number of input levels on the estimate of c ideal ðT Þ The estimate of the ideal channel capacity is bounded from above by log 2 jXj, where jXj is the number of input values X for which we have the distributions Pðg ideal XÞ, either from numerical convolution or analytical approximation. Since we used 11 values of X for Figure S5, all the information gain curves peak at log 2 11z3:5 bits. However, the correct value of c ideal ðTÞ is not bounded by the number of input values.
To remove the underestimation of c ideal ðTÞdue to the number of input values, we systemically increased the number of input levels as jXj˛f4; 8; 16; 32; 64; 128g for X˛½0; 1, which increases the entropy of the input as HðXÞ˛f2; 3; 4; 5; 6; 7g bits, respectively. For each set of input values, we computed c ideal ðTÞwhich is shown in Figure S6. In the range of integration time, T˛½1; 2500, 64 input levels is adequate to accurately estimate c ideal ðTÞ, becaus increasing the number of input values to 128 produces no noticeable difference (less than 0.04 bits). It is necessary to check the convergence in the estimated c ideal ðTÞ for increasing number of input levels jXj. Once we is sufficiently high we will avoid underestimation of c ideal ðTÞ, and also avoid the underestimation of the protein-level channel capacity cðX; gÞ, because cðX; gÞ < c ideal ðTÞ.

Stochastic simulations of central dogma master equations Parameters for the generic information gain curves
To produce the information gain curves in Figures 3A and 3B, the following parameters were chosen to model the central dogma system, leakiness, l = 0:01, frequency parameter a = 1:0 min À 1 , and input values X˛½0; 1, which determines k ON and k OFF using Equation 12. For the ideal information gain curves, c ideal ðTÞ, in Figure 3A, the transcript decay rate was k d;m = 0:1 min À 1 , and the transcription rate constant k m = f0:01; 0:1; 1:0; 10:0gmin À 1 was determined using the transcription power k m =k d;m values shown in Figure 3A. The set of integration times was T = f1; 2; 5; 10; 20; 50; 100; 200; 500; 1000g. The distribution of the ideal integration output was obtained using 27, which was then used to compute the ideal channel capacity, c ideal ðTÞ.
To produce the protein-level information gain curves in Figure 3B the following central dogma rate constants were used: k m = k d;m = 0:1 min À 1 , close to the median transcript decay rate constants for E. coli and S. cerevisiae. The protein decay rate was kd,g = {0.1, 0.05, 0.02, 0.01, 0.005, 0.002, 0.001, 0.0005, 0.0002 ;0:0001g min À 1 , and the translation rate constant was determined using the value of the translation power, k g =k d;g = f1; 10; 10 2 ; 10 3 ; 10 4 g. For integration time T % 20, we chose 4 bits sized input or 16 uniformly-spaced values of X in ½0; 1. For higher integration times, we chose 64 uniformly-spaced values of X in ½0; 1. We performed Gillespie simulation of the operator state change and the central dogma reactions, transcription, transcript decay, translation, and protein decay together (9), (10), and (11) to obtain the protein expression distribution for each input value, PðgjXÞ, which was then used to compute the channel capacity, cðX; gÞ. Each stochastic simulation was run for a duration of 10 6 min and 10 5 samples of the protein expression level g was obtained at a time interval of 10 min. The conditional distributions PðgjXÞ were obtained from empirical distributions by binning the protein expression data. The number of bins, n b , were 8 for T = 1 and 32 for T = 1000, and a linearly increasing function of log T for the intermediate integration times.

Parameters for the information gain curves for the four species
The transcription and the translation rate constants were obtained from sources reported in the key resources table. We determined the distribution of the dimensionless integration time, T = k d;m = k d;g , from the paired transcript and protein decay rate constants for each species ( Figure S7). The distribution of integration times are shown as violin plots in Figure 3C. For E. coli, the effective protein decay rate was determined using a doubling time of 2 h. 66 The 5%-95% confidence interval was computed using numpy's percentile function in python, which ranks the samples and determines the percentile value using linear interpolation. The percentile values for the integration time, T, reported in the main text have been rounded off to the nearest integer. The violin plots in the main text were determined using matplotlib library's violinplot function with the 'scott' method for density estimate bandwidth. To determine the ideal and the protein-level information gain for the four species we performed stochastic simulation of the central dogma reactions (9), (10), and (11), using the central dogma rate constants in Table S1. The number of input levels, or values of X in ½0; 1 for the Gillespie simulations, was chosen based on the integration time. Hence, for higher integration times a larger set of protein distributions, PðgjXÞ was obtained from stochastic simulations to compute cðX; gÞ. The number of input values X was selected by uniformly dividing the domain ½0; 1 into 2 HðXÞ intervals as reported in Table S2.
Gillespie simulations for each X were mainly performed for 10 6 min with the protein expression value sampled at an interval of 10 min to obtain 10 5 samples of the protein expression level g. Except for M. musculus and H. sapiens when 50 < T % 200 the sampling interval was 100 min, and when 500 % T the sampling interval was 200 min.
The protein expression distributions PðgjXÞ were determined as the empirical distributions from the protein expression trajectory gðtÞ. The number of bins n b used to construct the empirical distributions were chosen as n b = nearest integer larger than 2 c ideal ðTÞ+h , where h > 0, to use a larger number of bins than 2 c ideal ðTÞ . The value of h for each species and integration time is in Table S3. For a more elaborate discussion on the selection of number of bins for computing channel capacity check. 4,12,27 The protein expression trajectory from Gillespie simulations is an ergodic stationary process. 29 As described above, the total duration of the sampled protein expression trajectory was between 10 6 min to 2310 7 min depending on the species and the integration time. To test that we have captured the trajectory for a sufficient duration we performed a convergence analysis, by taking fractions of the protein expression trajectory data and computing the channel capacity. Specifically, we took the following fractions of the full trajectory data, 1=2; 1=5; 1=10; 1=20; 1=50 and 1=100. Figure S8 shows the estimated protein-level channel capacity from the smaller trajectories along with the channel capacity from the full trajectory data. Smaller trajectories can overestimate the channel capacity, because the data from smaller trajectories overestimates the relative entropy between the conditional distributions PðgjXÞ for different values of X. For all the four species the estimated channel capacity from the ''Full'' and the ½ trajectory are indistinguishable, establishing convergence in the estimate. If the duration of a trajectory is too small to accurately estimate the conditional distributions, PðgjXÞ, then doubling the trajectory length will produce a substantive difference in the estimated channel capacity. We observe this artifact of small trajectory data in Figure S8 when we compare the estimated channel capacity between the 1/50 and the 1/100 trajectories for M. musculus and H. sapiens.

Effect of fluctuation time period on information transfer
To determine how the time period of fluctuation of the input affects the information transfer from the input to the protein expression level, we determined the mutual information between the input and the ideal integration output IðX; g ideal Þ under fluctuating protocols of the input, X ( Figure S9). We chose the same a; l; k m ; and k d;m used in the simulation study for Figure S5, and selected two values of the protein decay rate constant k d;g = k d;m =10 and k d;m =100, which has integration times T = 10 and 100, respectively. For each of those two integration times we computed the ideal channel capacity c ideal ðTÞ and the associated optimal input distribution, P opt ðXÞ -the input distribution that achieves the channel capacity ( Figure S10A). Then we considered a range of fluctuation time period for the input, t X = 1 k d;g f0:1; 0:2; 0:5; 1; 2; 5; 10; 20; 50; 100; 200g. For each t X we performed a Gillespie simulation to capture the transcript trajectory mðtÞ when the input X fluctuates with time period t X assuming values according to the distribution P opt ðXÞ, the total duration of each simulation was 10 4 t X . The stochastic transcript trajectory, mðtÞ, was then convoluted with the integration kernel e À k d;g t with t˛½0; 4 =k d;g , to obtain g ideal ðtÞ, which was subsequently used to compute the mutual information IðX; g ideal Þ. An example of the stochastic trajectories under a fluctuating protocol of the input, X, is shown in Figure S9.
For the two integration times T = 10 and 100, we obtained a set of ideal mutual information values, IðX; g ideal Þ, as a function of the fluctuation time period of the input, shown in Figure S10B. We notice in Figure S10B, when the time period of fluctuation t X is larger than translation response time 1=k d;g , almost by a factor of 10, then the ideal mutual information value IðX; g ideal Þ approaches the ideal channel capacity for ll OPEN ACCESS iScience 26, 106767, June 16, 2023 iScience Article that integration time. When the fluctuation time period is smaller than 5=k d;g , then the ideal mutual information is less than half of the ideal channel capacity value. So, a relatively slow fluctuation in the environmental input is necessary to achieve the information gain possible due to integration of the transcript expression.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
The single-cell transcript and protein expression data for the inducible gene expression system were from a recently published manuscript by J.R. 22 All experiments were performed with E. coli strain NEB 10-beta (New England Biolabs, MA, C3019) and the plasmids used in that work, pAN1201, pAN1717 and pAN1818 are already available on Addgene.