Optimal temporal patterns for dynamical cellular signaling

Cells use temporal dynamical patterns to transmit information via signaling pathways. As optimality with respect to the environment plays a fundamental role in biological systems, organisms have evolved optimal ways to transmit information. Here, we use optimal control theory to obtain the optimal dynamical signal patterns that can transmit information efficiently (low energy) and reliably (high accuracy). Adopting an activation-inactivation decoding network, we reproduce several dynamical patterns found in actual signals, such as steep, gradual, and overshooting dynamics. Notably, when minimizing the energy of the input signal, the optimal signals exhibit overshooting, which is a biphasic pattern with transient and steady phases; this pattern is prevalent in actual dynamical patterns. We also identify conditions when these three patterns (steep, gradual, and overshooting) confer advantages.

Introduction-Cells transmit information through signal transduction and transcription networks [1,2]. Recent studies have revealed that, along with the identity and static concentration of molecules, cells also encode information into dynamical patterns [3][4][5][6][7][8][9]. Examples of dynamical patterns include extracellular signal-regulated kinase (ERK), the yeast transcription factor Msn2, the transcription factor NF-κB, a protein kinase (AKT), and calcium signaling. Many studies have used nonlinear and stochastic approaches to investigate properties of dynamical cellular information processing [10][11][12][13][14][15][16]. Because signal transduction plays central and crucial roles in the survival of cells, the time course of dynamical patterns is expected to be highly optimized so that they can efficiently and accurately transmit the information. Although the advantages of dynamical signals over static ones have been extensively studied [8,17], there has been little investigation into determining which dynamical signals are the best. We assume that two principles prevalent in many biological systems govern the optimality of signal patterns: energetic efficiency (low energy) and reliability (high accuracy). A major cause of interference with reliability is molecular noise, which degrades the quality of transmitted information. Biological systems are often characterized by low energy consumption. For instance, neuronal systems are known to function with remarkably low energy consumption ( [18,19] and references therein). As these two principles are of significance in biological systems, the dynamical transmission of information has evolved in such a way that it optimally satisfies these principles. In the present Letter, we will use optimal control theory [20,21] to investigate optimal signal dynamics that are energetically efficient and reliable by viewing the signal as a controlling function and the information decoding network as the system to be controlled ( Fig. 1(a)). For decoding the dynamical signals, we adopt two inactivation-activation networks: two-stage ( Fig. 1(b)) and three-stage models ( Fig. 1(c)). Receivers (decoding networks) may also co-evolve to maximally extract information from signals, but we here focus on optimization of senders, i.e., the dynamical signals. According to information theory, receivers cannot extract more information than that transmitted by senders. Therefore, the information transmission rate is bounded by the performance of the senders. We identify three basic patterns: steep ( Fig. 1(d)), gradual ( Fig. 1(e)), and overshooting ( Fig. 1(f)). We show that the steep pattern minimizes the energy, whereas the gradual pattern minimizes the variance. Intriguingly, when minimizing the energy of a dynamical pattern while achieving a higher output concentration or under limited molecule concentration, the optimal pattern exhibits overshooting, which can be often found in actual dynamical signals. We identify the conditions in which these three patterns (steep, gradual, and overshooting) confer advantages.
Methods-Dynamical signal transduction is typically separated into two stages [7]: encoding of extracellular stimuli into intracellular dynamical patterns and decoding of the dynamical patterns into the response. Our study focused on the latter process, namely, decoding of the dynamical patterns. In cells, dynamical signals are decoded by molecular networks. We consider a molecular network consisting of N molecular species X 1 , X 2 , ..., X N , including both input and output, and we define x i as the concentration of X i . The input signal is carried by a molecule U , whose concentration u(t) follows a dynamic pattern, and its onset is t = 0. Let the i out th molecular species X iout be the output of the network. The network reads the information from the input u(t) and outputs the result as the concentration of X iout at time T (T > 0) ( Fig. 1(a)), i.e., x iout (T ) carries information about the input signal. Consider the evolutionary design of a molecular network that attains the desired concentration of an output molecule at t = T after re- ceiving the input signal u(t). Although there might be many possible dynamics for u(t) (0 ≤ t ≤ T ) that result in the desired output concentration, the most biologically preferable ones are selected. We can expect that the signals with lower energetic cost will be selected. In addition, biochemical reactions are subject to noise due to the smallness of the cells. The noise degrades the information, and hence more-accurate transmission is desirable. Considering the energy of the input and the accuracy of the concentration of the output molecule, we wish to find a signal u(t) that minimizes a performance index R, defined as is the variance of the concentration of the ith molecule at time t [µ i (t) = x i (t) is the mean], Π u is the energetic cost of the signal, and w is a weight parameter in the range 0 < w < ∞ which represents the importance of the energy for the performance index. We define the energetic cost as the free energy dissipation during controlling the concentration u(t) to follow the desired temporal dynamics. In order to calculate the energetic cost of U , we consider a simple biochemical model. U is synthesized from U , whose concentration is u, and the total concentration u tot = u + u does not change with time. U and U undergo the following reaction with rate constants α u and β u : Here, we assume α u β u so that the equilibrium concentration u = u eq is very low. Following Ref. [22,23], we find that the free energy dissipation is where k B and T are the Boltzmann constant and temperature, respectively (for a detailed derivation, see the supplementary material). Because it is difficult to use the exact representation of Π u in the optimal control calculation, we approximate Π u with a more tractable expression. Π u crosses near the origin as it vanishes at u = u eq which is very low, and Π u increases superlinearly as u increases from an equilibrium point. Taking into account the conditions and computational feasibility, we use the approximation Π u = q´T 0 u(t) 2 dt, where q is a proportionality coefficient, and we set q = 1 because the scaling of q is offset by w. With this approximation, we set u(t) = 0 for t < 0 and u(t) ≥ 0 for t > 0 (we define that the onset is the time when u(t) becomes positive). As denoted, the output molecule X iout has the target concentration at time t = T . Therefore, the mean concentration of the output X iout , which we denote as µ iout , must attain the predefined target concentration µ T iout at time t = T , i.e., is a boundary condition for the optimal control analysis.
Molecular networks comprise interplay between mR-NAs and proteins, and their dynamics can be captured by the following rate equation: is the reaction velocity of the th reaction, and N r is the number of reactions. Due to the smallness of the cells, chemical reactions are subject to stochasticity. We describe the noisy dynamics by the following Fokker-Planck equation (FPE) [24,25]: where P = P (x; t) is the probability density of x at time t, and Q is the noise intensity related to the volume V via Q = (2V ) −1 . Optimal control theory and related variational methods have been employed by many researchers [26][27][28][29][30]. Although stochastic optimal control theory has been applied in various biological contexts [30], it is difficult to apply the method to multivariate nonlinear models. Instead, we describe the dynamics with the time evolution of moments derived from Eq. (3) [31,32]. For general nonlinear models, naive calculation of moment equations results in an infinite hierarchy of differential equations. However, as our adopted models are linear with respect to x, we can obtain closed differential equations (see the supplementary material). For moments of up to the second order [mean µ i (t), variance γ i (t), and covariance ρ ij = (x i − µ i ) (x j − µ j ) ], we have the following moment equation with respect to z = (µ 1 , ..., µ n , γ 1 , ..., γ n , ρ 12 , ρ 23 , ..., ρ 1n ): where the dimensionality of z is M = N (N + 3)/2. The moment equation yields reliable solutions when the noise intensity is sufficiently small. With the moment equation (4), we can reduce the stochastic control problem to a deterministic one. We wish to obtain the optimal control u(t) that minimizes R of Eq. (1) while satisfying Eq. (4) and the predefined target mean concentration of Eq.
Then, by virtue of optimal control theory [20,21], we minimize an augmented where ν and λ = (λ 1 , λ 2 , ..., λ M ) are Lagrange multipliers that force the constraints. Using the calculus of variations [20,21], finding an optimal signal u(t) is reduced to solving the differential equations given by Eq. (4) anḋ where H is the Hamiltonian [20,21]: . We assume vanishing initial values for all moments: For the boundary conditions, λ N +iout (T ) = 1 and λ i (T ) = 0 (i = i out , N + i out ) are required, and z iout (T ) = µ T iout for the final value of z i . There are boundary conditions at both t = 0 and t = T ; this two-point boundary value problem can be solved numerically by using a general solver [33].
Results-We consider an activation-inactivation decoding motif ( Fig. 1(b)): an inactive molecule X 1 (e.g., the transcription factor) is activated (e.g., by phosphorylation) to become X 1 , where the activation is dependent on the input molecule U . The activated molecule X 1 synthesizes an output molecule X 2 , and hence X 2 reports the result (i.e., i out = 2). The input signal u(t) has to yield dynamics that satisfy the constraint that the target mean concentration of X 2 at time t = T is µ T 2 [i out = 2 in Eq. (2)]. This type of decoding motif is prevalent and can be found in several biochemical systems [13,34,35]. The corresponding rate equations are given bẏ where x tot is the total concentration x 1 + x 1 = x tot , which does not change with time (x 1 is the concentration of X 1 ); α 1 and α 2 are the rates of synthesis; and β 1 and β 2 are the rates of degradation. We call this a two-stage model. By incorporating intrinsic noise due to a low number of molecules, we have a corresponding FPE from Eq. (3) (see the supplementary material). We then calculate the moment equation from the FPE. Using the two-stage model, we calculated the optimal signal u(t). Figure 2 shows the optimal signal u(t) as a function of t and log 10 (w) for two target concentrations: The other parameters are x tot = 1.0, α 1 = 1.0, α 2 = 2.0, β 1 = 1.0, β 2 = 0.25, Q = 0.001, and T = 1.0; typically, the time-scale of X 1 is shorter than that of X 2 , i.e., β 1 > β 2 . The range of w is assumed to be 1.
The minimum of w is determined so that u(t) satisfies u(t) ≥ 0, and the maximum is 10 2 times the minimum [36]. When w is relatively large, Fig. 2 shows that the optimal signals steeply increase at t = 0 and gradually decay as time elapses. For µ T 2 = 1.2, the decay right after t = 0 is especially rapid, and this is followed by a plateau state (t = 0.3-0.7); this is a typical overshooting pattern, similar to that shown in Fig. 1(f). Conversely, for smaller values of w, the optimal signal does not exhibit overshooting but gradually increases for both values of µ T 2 . As w decreases, the optimal pattern varies from steeper to more-gradual patterns. From these results, we see that the steep pattern minimizes the energy, whereas the gradual pattern minimizes the variance, and the overshooting pattern emerges when the target concentration µ T 2 is higher. We also evaluated the dependence on the weight w of the energy and the variance for µ T 2 = 0.2; this is shown in Fig. 3(a), using the same parameters as in Fig. 2. As w increases, the variance increases and the energy decreases; there is a trade-off between the energy and the variance as both cannot be minimized simultaneously. For cellular inference, the relation between accuracy and energy consumption was investigated in several studies [37][38][39], which showed a trade-off between them. Similarly, in biochemical clocks, temporal accuracy and energy consumption have been shown to exhibit a tradeoff relation [40]. Neural systems also have a trade-off between the energy cost and the information coding ( [19] and references therein). We show that a similar relation also holds for dynamical signals. In Fig. 3(b), we compare the optimal signal (solid line) of µ T 2 = 1.2, which exhibits the overshooting pattern, with a constant signal (dashed line) starting from t = 0 and ending at t = T , which can attain the target concentration (w = 1.0 for the optimal signal, which is large enough to show overshooting; the other parameters follow Fig. 2). Although the concentration of the optimal signal is larger than the constant one in the interval t = 0-0.2, the optimal signal yields a smaller concentration for t > 0.2. The energy of the optimal signal is Π u = 15.10, whereas that of the constant one is 19.65. We investigated how the overshooting pattern depends on the total concentration x tot (µ T 2 = 1.2 and w = 1.0; other parameters follow Fig. 2). As we are interested in the shape but not in the magnitude, we define a normalized signal: u(t) = u(t)/ √ Π u which guarantees the unit energy Π u = 1. In Fig. 3(c), we plotted u(t) for x tot = 0.8 (solid line), 1.0 (dashed line), and 2.0 (dotted line). We can see that when the total concentration x tot is smaller, the optimal signal has a sharper overshoot.
We also considered the case in which the output molecule X 3 is synthesized by X 2 (the three-stage model ; Fig. 1(c)) and found, qualitatively, the same behavior as in the two-stage model (see the supplementary material).
Discussion-We identified that the steep pattern minimizes the energy. Let us explain this mechanism with a simplified two-stage model,ẋ 1 = u(t),ẋ 2 = x 1 (t), along with a delta-function stimulus, u(t) = δ(t − t s ) (t s is the time of the stimulus). The final output concentration is x 2 (T ) =´T 0 dt´t 0 u(t )dt = T − t s , which shows that the signal at an earlier time has a higher effect than that at a later time; the steep pattern can allow the output concentration to reach the target concentration at a lower cost. On the other hand, the effect of the gradual pattern can be accounted for by the moment equation. We find that the main contribution responsible for the different variance γ 2 (T ) of the steep and gradual patterns is the area´T 0 ρ 12 (t)dt, namely, a smaller area yields a smaller γ 2 (T ). Because decay velocity of ρ 12 (t) depends on u, a higher concentration of u at a later time result in smaller ρ 12 (t), which corresponds to the gradual pattern of u(t) (cf. moment equations in the supplementary material).
Our result can provide insights into experimentally observed dynamical patterns. Reference [8] reported the dynamical pattern of the ERK activity in response to different strengths of input (i.e., the ligand concentration). ERK activity is steep when stimulated by a strong signal, and it is gradual when stimulated by a weak one. This experimental observation can be accounted for by our model. Figure 3(d) shows u(t) for three values of µ T 2 (µ T 2 = 0.2, 1.2, and 1.5), while the other parameters remain unchanged (w = 8.3 × 10 −4 , and the other parameters follow Fig. 2). We see that the optimal signal is steep for larger values of µ T 2 and gradual for smaller values. It is expected that the strong and weak ligand stimuli result in strong and weak responses, respectively, i.e., higher and lower output concentrations. Therefore, ERK activity induced by the strong ligand stimulus is related to µ T 2 = 1.5 and that induced by the weak one is related to µ T 2 = 0.2. When the target concentration is higher (i.e., µ T 2 = 1.5), the magnitude of the signal is larger, and hence the effect of the energy of the signal on the objective function R [Eq. (1)] is greater than that of the variance γ 2 (T ). In contrast, for the smaller value of µ T 2 (i.e., µ T 2 = 0.2), the variance γ 2 (T ) becomes the leading term because the energy of the signal is smaller. Therefore, the steep pattern is preferred for a higher target concentration, while the gradual one is advantageous for a lower target concentration. These theoretical results qualitatively agree with the observed dynamical patterns reported in Ref. [8].
The overshooting dynamics minimize the energy of the input signals when the total concentration is smaller or the target concentration of the output molecule is higher. Surprisingly, this behavior can be found in several dynamical patterns; for example, activities of the ERK, the IκB kinase (IKK), which regulates the transcription factor NF-κB, and the kinase AKT show this behavior [4,[41][42][43][44]. These examples indicate biological advantages of the pattern. We also note the biochemical origin of the overshoot. For example, a simple incoherent feed-forward loop [1,2,45,46] or activation-inactivation motifs [47] can generate such a pattern, and these motifs can indeed be found in signaling pathways. Furthermore, a strongly damped oscillation is indistinguishable from overshooting. Although NF-κB is known to exhibit damped oscil-lation upon stimulation, some studies [48,49] are skeptical about the functional role of the NF-κB oscillation; that is, the NF-κB oscillation may be a by-product of inducing the overshooting dynamics.

Yoshihiko Hasegawa
This supplementary material describes in detail the calculations introduced in the main text. Equation and figure numbers in this section are prefixed with S (e.g., Eq. (S1) or Fig. S1). Numbers without the prefix (e.g., Eq. (1) or Fig. 1) refer to items in the main text.

Energetic cost of signal pattern
We derive the energy of signal U with a simple biochemical model. Details of the thermodynamics of biochemical reactions can be found in Refs. [1,2]. We assume that input molecular species U is synthesized from U , which undergoes the following reaction with rate constants α u and β u : (S1) Let u and u be the concentrations of U and U , respectively (total concentration u tot = u + u does not change with time). At equilibrium, we have where u eq and u eq are equilibrium concentrations. We here assume α u β u so that the concentration u eq is very low. The chemical potential of U and U are U and φ 0 U are chemical potentials at the standard state u = u = c 0 , and k B and T are the Boltzmann constant and temperature, respectively. The chemical potential difference is At equilibrium, ∆φ = 0; therefore, where we used Eq. (S2). We simulate intracellular biochemical reactions by the following controlling protocol after Refs. [1,2]. The system is open, and a controller actively manipulates the concentrations u and u to obtain the desired temporal dynamics of u(t) [before the onset of signal (i.e. t < 0), we assume u(t) = u eq ]. If the concentration u(t) is below the desired concentration, then the controller converts a molecule from U to U (and vice versa). The dissipated power of the biochemical reactions, which is an analogue of [power] = [current]×[volage] of electric circuits, is given by P = −J ∆φ [1,2], where J is the net-reaction flux: Therefore, the power P is where we used Eq. (S3) in the last line. From Eq. (S4), we define the energetic cost of U by the free energy dissipated during 0 ≤ t ≤ T : Because it is numerically difficult to use Eq. (S5) in the optimal control calculations, we approximate the power P with a simple equation. At equilibrium (u = u eq and u = u eq ), P = 0 because the flux J vanishes. As we assume that u eq is very small (which can be approximately identified as 0), P as a function of u passes near the origin. Also, the approximation should be computationally feasible for the optimal control. Taking into account the above requirements, we may approximate where q > 0 is a proportionality coefficient. Figure (S1) shows a comparison between the exact expression of P, [Eq. (S4)], and its quadratic approximation [Eq. (S6)]; the exact and quadratic results are shown by solid and dashed lines, respectively. In Fig. (S1), we calculated the results for two settings: (a) u tot = 20, α u = 0.1, and β u = 2 (q = 0.56 for quadratic); and (b) u tot = 20, α u = 0.1, and β u = 20 (q = 8.7 for quadratic). The Boltzmann constant and temperature were set to k B = 1 and T = 1 without loss of generality. For both parameter settings, we see that the behavior of the quadratic approximation is similar to that of the exact one. The major difference between the exact and the quadratic representations is that Eq. (S5) diverges to ∞ for u → 0 and u → u tot . From Eq. (S6), we represents the energy of the signal as In the main calculation, we employ q = 1 because the scaling of q is offset by a weight parameter w in the performance index [cf. Eq. (1)].

Moment equation
In this section, we derive the equations that must be satisfied by the mean, variance, and covariance. We consider the Fokker-Planck equation (FPE): where x = (x 1 , x 2 , ..., x N ), P (x; t) is the probability density of x at time t, and F i (x; t) and G i (x; t) are the drift and diffusion terms, respectively (we do not consider cross terms, such as ∂ 2 /∂x i ∂x j (i = j), as these terms do not emerge in our model). We denote the range of x i as x i,min ≤ x i ≤ x i,max ; in the two-stage model, x 1,min = 0 and x 1,max = x tot for X 1 , and x 2,min = 0 and x 2,max = ∞ for X 2 . Because the concentration must satisfy these constraints, we impose reflecting walls at the boundaries. Writing the FPE (S7) as the continuity equation, we have where J i denotes the probability current: Due to the reflecting walls, the current vanishes at the boundaries, i.e., Here, we consider the (uncentralized) moment (k = ): The time evolution of the moment obeys where Eq. (S7) is used. Using integration by parts, we have where we formally defineˆd From Eq. (S10), the first term in Eq. (S13) vanishes, and we obtain where we again used integration by parts. If we assume that G i (x; t)P (x; t) is negligible at the boundaries (S14) Equation (S14) is an equation for uncentralized moments. In order to obtain closed equations for the mean, variance, and covariance for general F i (x; t) and G i (x; t), we expand x i around the mean values as x i − µ i = δx i , with µ i = x i . Retaining terms up to the second order, such as δx m k δx n with m + n = 2, we obtain closed equations with respect to µ i (t), γ i (t) = (x i (t) − µ i (t)) 2 , and ρ ij = (x i − µ i ) (x j − µ j ) . However note that in the two-and three-stage models, because F i (x; t) and G i (x; t) are linear with respect to x, we can obtain closed differential equations for µ i (t), γ i (t), and ρ ij (t) without the truncation.

Calculations by optimal control 3.1 Two-stage model
Deterministic equations for the two-stage model are given by Eqs. (8) and (9). From Eq. (3), the corresponding FPE is From Eq. (S14), the moment equations arė Differential equations for the Lagrange multiplier λ i are obtained from Eq. (6), as follows: Here, u(t) is obtained from Eq. (7):

Three-stage model
Along with the reactions in Eqs. (8) and (9), we have the additional reactioṅ where α 3 and β 3 are the rates of synthesis and degradation, respectively. Again we obtained an FPE of the three-stage model: The corresponding moment equations arė Here, u(t) is obtained from Eq. (7): 4 Results of three-stage model As in the two-stage case in the main text, we calculated the optimal signal u(t) for the three-stage model. Figure S2 shows the optimal signal as a function of t and log 10 (w) for two values of µ When the weight w is larger, the optimal signal increases rapidly at t = 0, and this is followed by a decay. In contrast, the optimal signals show gradually increasing patterns when w is smaller. Similar to the twostage case, the optimal signal exhibits overshooting when w is larger and the target concentration µ T 3 is higher [ Fig. S2(a)]. We also see that the steep pattern minimizes the energy, whereas the gradual pattern minimizes the variance. We calculated the dependence on the weight w of the energy and the variance for the three-stage model; this is plotted in Fig. S3(a), where it can be seen that there is a trade-off between energy and variance, as in the two-stage case (µ T 3 = 0.4, and the parameters follow Fig. S2). Figure S3(b) compares the optimal signal with the constant signal, where both signals achieve the same target concentration at time t = T (µ T 3 = 4.0 and w = 1.0; the other parameters follow Fig. S2). The optimal signal has a lower concentration in the interval t = 3-10, and hence the optimal signal yields a smaller energy (Π u = 345.7 and 452.7 in the optimal and constant cases, respectively). This shows that the overshooting pattern minimizes the energy when the target concentration is higher. We also  Fig. S2). Again we see that when the total concentration x tot is smaller, the optimal signal has a sharper overshoot.
The three-stage model can be interpreted from a different point of view. Instead of the concentration of the output molecular species at t = T , the amount of molecules synthesized during 0 ≤ t ≤ T could be the output quantity. Assume that the amount of X 2 synthesized during 0 ≤ t ≤ T , which we define as S 2 , is used to report the result in a two-stage model: This quantity can be incorporated into our model by considering an auxiliary variable x 3 satisfying dx 3 (t) dt = x 2 (t), x 3 (0) = 0.
From Eq. (S48), we have x 3 (T ) = S 2 , showing that the amount of synthesized molecules can be represented by the concentration x 3 at t = T . This is identical to a setting α 3 = 1 and β 3 = 0 in the three-stage model, which was indeed employed in the calculations.