Parameter estimation of systems with delays via structural sensitivity analysis

This article presents a method for sensitivity analysis of non-linear continuous-time models with delays and its application to parameter estimation. The method is universal and may be used for sensitivity analysis of any system given as a block diagram with arbitrary structure and any number of delays. The method gives sensitivity functions of model trajectories with respect to all model parameters, including delay times, and both forward and adjoint sensitivity analysis may be performed. Two examples application of the method are presented: identification of a Wiener model with delay and identification of a model of JAK-STAT cell signal transduction mechanism.


1.
Introduction. Dynamical systems with delays are an important class of systems for the description of many phenomena and processes occurring in a number of areas, including technology, industry, and biology. Systems of this type are the subject of research for many years. The existence of delays in the system required the development of new analysis methods including stability, controllability, observability analysis, and tools for their modeling and simulation. Sensitivity analysis of dynamic systems answers the question on how changes of model parameters affect the model solution. The answer to this question can be useful in solving many tasks, such as design of experiments, estimation of model parameters, or the optimization of the structure of the model. For example, a high value of a sensitivity factor with respect to a given parameter indicates that this parameter is important to explain the behavior of the model, and consequently it should not be reduced during model structure optimization. Sensitivity analysis of dynamic systems has been developed since the 60s of the 20th century [4], [5]. It has concerned primarily systems without delay, usually described using ordinary differential equations (ODEs). Sensitivity analysis for models given in the form of delay differential equations (DDEs) was presented in [17]. Such models can take various forms. One possible simple form of a DDE is: where x(t) ∈ R n is a state vector of state variables, u(t) is a model input, f (·) is in the general case a multidimensional, non-linear function, which is differentiable with respect to its arguments, τ is a non-negative time delay, and p ∈ R q is a vector of parameters for which the sensitivity is examined. More complex models can include not just one, but more delays, and not only the state may be delayed but also the control signal u(t). Sensitivity analysis of the system described by equation (1) gives n × q scalar sensitivity functions: which are, like the signals occurring in the original system, functions of time t. There are many practical approaches to determine the function (2), which can be divided into three groups: • finite difference approximation, which uses explicitly the difference quotient formula, • forward sensitivity analysis, where a so-called sensitivity model (tangentlinearized) for the variations of signals is built and simulated, • adjoint sensitivity analysis, where a so-called adjoint system is built and simulated.
These approaches vary in simplicity, accuracy, computational intensity and the ability for simultaneous (on-line) simulation together with the original model.
The literature related to parameter estimation for systems with delays, because of its importance, is very large. See for example [16], [1], [2], [14], [11], [15], [19], [20] and the literature cited therein. Unfortunately, the methods proposed are valid for linear systems and/or for only one delay. Recently, a more general approach [12] has been proposed which can be used for non-linear systems with an arbitrary number of delays. The key idea is to define an objective function as a least-squares error between the predicted and measured system outputs, and then calculate the partial derivatives of this function with respect to the estimated delays. The derivatives are generated by an auxiliary delay-differential system. In fact, this auxiliary system is a "sensitivity model" and the method proposed in [12] uses the forward sensitivity approach to gradient derivation.
The approach proposed in this paper is even more general. The structural approach for sensitivity analysis specifies rules for construction of the sensitivity model or, additionally, the adjoint system. The latter approach, the adjoint sensitivity analysis, is less computationally costly, especially when the number of estimated parameters is large.
Moreover, the structural approach for sensitivity analysis is much simpler and intuitive. It is possible to fully automate the process of building the sensitivity model or the adjoint system fully automated. The method is universal in the sense that it does not assume a particular structure of the model analyzed and can be used for any models with discrete delays, for example models containing transport and non-transport delays, retarded-type delay differential equations (RDE) and neutral type delay differential equations (NDE) etc.
2. Structural representation of dynamical systems. Any dynamical system denoted analytically by equations (differential, difference, algebraic equations, with or without delays) can be presented alternatively in a structural way, usually by a block diagram. These two methods of description are equivalent, meaning that every system described analytically by equations has a corresponding block diagram (which contains the basic elements) and vice versa, each block diagram corresponds to a system of equations. Moreover, it was shown [13] that topological transformations of the structural description of the system correspond to appropriate mathematical transformations of the analytical description.
The structural description, due to its readability, is very often used in many areas, for example in automatic control theory where a closed loop structure with negative feedback is mostly utilized, in dynamical system identification [10] or to describe artificial neural networks.
In this paper the analysis focuses only on continuous-time systems with delays. A complete list of elements sufficient to build any system of this type is shown in Table 1.  Branching node y 1 = y 2 = u Delay element One element specifies an input-output relationship between its input u and the output y. A dynamical linear time-invariant (LTI) element is described by a socalled transfer function which is defined as a quotient of the operator (Laplace) form of the output Y (s) and the input U (s): K(s) = Y (s) U (s) under zero initial conditions. The transfer function may be understood as an operator gain because Y (s) = K(s)U (s).
In the case of a linear static element (without dynamics) the transfer function reduces to a number A (or a matrix of numbers for a multidimensional element) which may be applied in the operator domain: Y (s) = AU (s) or, thanks to the linearity property of the Laplace transform, also in the time domain: y(t) = Au(t).
For a non-linear static element, usually presented by a block surrounded by a double line, the operator description by a transfer function K(s) is impossible and only a time-domain description y(t) = f (u(t)) may be applied.
As mentioned before, any dynamical system described analytically by equations may be presented as a block diagram which contains elements from Table 1. For example, a system of non-linear ordinary differential equations (ODE) can be shown as a block diagram presented in Fig. 1. The transfer function 1 s is the integral opeator.  (3) When the the function f in the model (3) is linear then it may be written aṡ with matrices A and B of appropriate dimensions. Figure 2 presents the corresponding block diagram. Finally we can present the DDE model (1), described in the previous section, as a block diagram. The last element in Table 1 is an element which delays the input signal by time τ . In the operator domain it corresponds to the transfer function e −sτ . Using this element we can build a block diagram representing the DDE system (1) -see Fig. 3.
The block-diagram description of dynamical systems is not limited to general forms of mathematical models, like for example (1), (3) or (4), but may be applied to any system with arbitrary structure. An example of such system, a Wiener model, will be used as an exemplary model at the end of this paper.
3. Structural sensitivity analysis. Sensitivity functions (2) for any non-linear ODE model (3) may be obtained as solutions of a so-called sensitivity model. The sensitivity model depends on the form of the original model (3) and nominal trajectories x(t) obtained for given initial conditions x(0), input signal u(t) and the parameter vector p. The sensitivity model is valid for variations of x(t), u(t) and p, denoted here asx(t),ū(t) andp, and has the form:  (1) where are time varying Jacobi matrices. Therefore, the sensitivity model is a linear time varying (LTV) model. Under the assumption that x(0) does not depend on p, the sensitivity model has zero initial conditions. The sensitivity model (5) may be used to obtain sensitivity functions (2). If we applyp j = 1 andp k = 0 for k = j then which for p j = 1 is reduced to The structural sensitivity analysis takes as a basis the block diagram representation of the model instead of its analytical description (3). There is a very simple way to construct the sensitivity model based on its structural representations. It turns out that the construction of the sensitivity model has a locality property. This property says that the sensitivity model for a system having some structure and a set of elements is a system with the same structure, containing sensitivity models of individual elements. This means that the construction of the sensitivity model depends on differentiation of each element of the system separately. This allows us to formulate very simple rules for construction of the sensitivity model, which specify for any element of the original model its equivalent element appearing in the sensitivity model.
Another way to perform sensitivity analysis is so-called adjoint sensitivity analysis, see for example [3]. This approach depends on using a model which is adjoint to the sensitivity model (5). This approach is especially useful when the number of quantities for which the sensitivities are searched is much less than the number of parameters. Such a situation takes place in optimization and parameter estimation problems (one objective function versus many decision variables/parameters).
Using the structural approach, we are able to construct a sensitivity model for any arbitrarily complex system. Here we can see some advantage of the structural approach compared to the analytical way of description (1), which always restricts the class of systems investigated.
In the papers [6], [7], [8], [9] a set of such rules has been proposed and used for parameter estimation for various classes of models. These rules allow to perform both forward and adjoint sensitivity analysis for any hybrid continuous-discrete time system. To expand the class of systems analyzed to systems with delays, it is sufficient to determine the differentiation rule for a single delay element. This is the main subject of this paper. 4. Sensitivity model of a single delay element. The delay element, presented in the last row of Table 1, is described by the input-output relation y(t) = u(t − τ ) which gives Y (s) = U (s) · e −sτ in the operator domain.
We will consider the behavior of the system for a finite time interval t ∈ [0, t f ], where t f is a given final time. When non-zero initial conditions are allowed, the input signal u(t) of the delay element has to be known for previous times t ∈ [−τ, 0).
Looking at the delay element in Table 1, one can see that the delay time is distinguished as a separate input. This is because we want to make the sensitivity analysis possible with respect to τ together with other parameters.
The variation of the output y(t) of the delay element depends on the variation of its input (delayed by τ ) and the variation of τ : In equation (9)ȳ,ū,τ are variations of y, u and τ , respectively. The partial differential appearing in (9) may be calculated using the chain rule: Finally, equations (9) and (10) together givē Looking at equation (11) one can see that the signal y(t) has to be differentiable for times t ∈ [0, t f ], which is assured when the input u(t) is differentiable for t ∈ [−τ, t f − τ ].

5.
Structural sensitivity analysis of DDE systems. The complete set of rules for construction of the sensitivity model for any DDE system given as a block diagram is presented in Table 2. All rules, except the last, are known from [6], [7]. The last row contains the rule just derived and described by the equation (11).
Looking at the rules presented in Table 2, we see that only two non-linear elements should be changed during the construction of the sensitivity model. The first element is the non-linear static element that should be replaced by a static linear time variant element, where the variability in time is the result of differentiation of non-linearity f along the nominal input signal. Another element which should be replaced is the delay element, but only when we are interested in the sensitivity analysis of the model with respect to the delay time.
Rules for the construction of the adjoint model are presented in the third column of Table 2. These rules, with the exception of the last (for the delay element), have been published previously in [6], [7]. The directions of the signals should be reversed: inputs become outputs and vice versa. All multidimensional elements should be transposed. In addition, if the element of the sensitivity model is timevariant then it should be reversed in time (see the rule for non-linear element). The sensitivity model of the delay element consists of the delay (linear dynamic element), the summing junction, and the time-variant linear element. Thus, the ∂u(t) nom adjoint model of this element is a consequence of the application of the previous rules for its components. The next two Sections present two examples of applications of the method of sensitivity analysis derived above. 6. Example 1 -Wiener model. Let us consider the non-linear system presented in Figure 4. This is a specific example of the wider class of so-called Wiener Wiener models, or the very similar in spirit Hammerstein models, are used as general gray-box models for non-linear system identification [10]. This means that not all physical phenomena in the system are known and may be used for mathematical modeling (white-box model). This is also not the case when the modeler has only information about the input and output signals (black-box model) without any other information about the process. Wiener and Hammerstein models assume specific structures of the models. The Wiener model consists of two parts, a linear dynamical part followed by a static non-linear part, and the Hammerstein model also has these two parts but in the reversed order.
The first part of the Wiener model presented in Figure 4 (linear and dynamical) contains the delay element in its feedback loop. The second part (non-linear and static) is represented as a non-linear function f .
The transfer function K(s) and non-linear function f in the Wiener model presented may be arbitrarily complicated but in order to write the corresponding equations, which should be easier to understood for the reader not familiar with structural description of dynamical systems, let us assume simple first order dynamics K(s) = 1 1+s . Based on the form of K(s) one can write an equation in the operator domain which in the time domain gives a differential equatioṅ The summing junction gives which together with (13) leads to following differential equatioṅ Equation (15) together with the following algebraic output equation fully describes the system. In the general case, parameter estimation for the Wiener model depends on searching for all parameter values of both parts. In this Section, for the sake of simplicity, we assume that only the delay time τ is unknown. In order to estimate this parameter we will use the gradient (one partial derivative in this case) of the quadratic performance index where d(t) is the measured output of the the real system. This partial derivative will be obtained by using the structural sensitivity analysis presented in the previous Section. Prior to this, the extended model will be drawn -see Figure 5.
In the extended model one can see the delay time τ presented as an input and the performance index J(t) as an output of the overall system.
The sensitivity model presented in Figure 6 is the direct consequence of the rules presented in Table 2.  Once more, it is possible to write the corresponding equations describing the model from Fig. 6. Using the same reasoning as that used for finding equation (15) one can find a following differential equatioṅ and the algebraic output equation The termq(t) in equation (18) is a derivative (over time) of the signal q(t) taken from the model. In the general case of practical application a numerical differentiation has to be applied. If the system from Fig. 6, under zero initial conditions, is stimulated as depicted, i.e. only variation of τ is equal to 1 and the rest of the input signals (variations) are equal to 0, then the output signal, according to (8), will be the searched partial derivative:J This derivative may be utilized during gradient-based estimation of τ , for example by using the simplest gradient descent approach. It leads to the possibility of on-line parameter estimation of the delay time τ , for example in continuous time: where α is a learning coefficient assuring convergence of the estimation procedure.
Results of such estimation are presented in Figure 7. The reference signal d(t) was generated by a system of the same structure and the following data: τ = 1, The input signal u(t) was the square wave with the amplitude equal to 1 i and the period equal to 13,3 [sec]. The initial condition for the delay element was zero for −τ ≤ t < 0. The initial condition for the estimated parameter was τ (0) = 0.1. The results presented in Figure 7 show that the estimation process is very fast, and after three periods of the input signal the predicted value of y(t) obtained in the model is very close to the real value d(t) observed in the plant.
7. Example 2 -JAK-STAT pathway. The next example shows an application of the proposed approach to parameter estimation for the model of the JAK-STAT cell signaling pathway described in [18]. This model is a set of ODEs containing delays. The model describes changes of protein concentrations involved in signal transduction from a cell membrane receptor to the nucleus. Binding of Epo (erythropoietin) to the extracellular part of the receptor leads to activation by phosphorylation of the so-called Janus kinase (JAK) at the intracellular, cytoplasmic domain of the receptor. In turn, this leads to phosphorylation of monomeric STAT-5, a member of the STAT (signal transduction and activator of transcription) family of transcription factors. The phosphorylated monomeric STAT-5 forms dimers and these dimers migrate into the nucleus where they bind to the promotor region of the DNA and initiate gene transcription. After transcription, STAT-5 dimers are dephosphorylated and transported to the cytoplasm. A detailed description may be found in [18].
The model is a set of four differential equations with delays: where the amount of activated Epo (erythropoietin) receptors is denoted by EpoR A , unphosphorylated monomeric STAT-5 by x 1 , phosphorylated monomeric STAT-5 by x 2 , phosphorylated dimeric STAT-5 in the cytoplasm by x 3 , phosphorylated dimeric STAT-5 in the nucleus by x 4 and sojourn time of STAT-5 in the nucleus by τ .
For parameter estimation the experimental data taken from [18] were used. The data contains information about the amount of phosphorylated STAT-5 in the cytoplasm (left panel of Fig. 8) and the total amount of STAT-5 in the cytoplasm (right panel). The data also contains information about the amount of activated Epo receptors, see Fig. 9. Estimation of the parameters of the model (22) depends on minimization of the quadratic objective function measuring the discrepancy between the data and the model solution. Once more, as for the previous example, the gradient of the objective function has been computed using the proposed sensitivity analysis and then used in a gradient-based optimization procedure (fmincon function in Matlab) that uses a trust-region optimization method.
Estimated parameter values are presented in Table 3 and model solutions for the estimated parameters are presented in Fig. 10. The value of τ is approximately equal to 5, which means that molecules of STAT-5 are present in the nucleus for about 5 minutes and then are transported back to the cytoplasm. Table 3. Results of parameter estimation -estimated parameters and initial conditions Parameter or init. cond. Estimated value k 1 0.0378 k 2 1.02 k 2 0.130 k 2 0.130 τ 5.00 x 1 (0) 6.73 x 2 (0) 0.065 x 3 (0) 0 x 4 (0) 0 Figure 10. Fitting of the model solutions to the experimental data after gradient-based parameter estimation 8. Conclusion. This paper presents a method for the structural, forward and adjoint sensitivity analysis of non-linear dynamical systems with delays. The proposed rules for construction of the sensitivity model as well as the adjoint system are very simple and universal. The method is suitable for sensitivity analysis of any system given by a block diagram of arbitrary structure and any number of delays. It is worth noting that delays can occur anywhere: both control signals and state variables can be delayed.
The method can be used not only in gradient parameter estimation as in the examples presented, but also for control optimization, planning experiments, identification and structure optimization of models.