Sensitivity of Dynamical Systems to Parameters in a Convex Subset of a Topological Vector Space

We develop a theory for sensitivity with respect to parameters in a convex subset of a topological vector space of dynamical systems in a Banach space. Specific motivating examples for probability measure dependent differential, partial differential and delay differential equations are given. Schemes that approximate the measures in the Prohorov sense are illustrated with numerical simulations for distributed delay differential equations.


Introduction
In this paper we develop sensitivity equations for a general nonlinear dynamical system in a Banach space depending on parameters in a convex subset of a topological vector space which is not necessarily a normed linear space.More specifically, we study the sensitivity equation of the following ordinary differential equation ẋ(t) = f (t, x(t), µ) if t ≥ t 0 , x(t 0 ) = x 0 , (1) whose solution x is assumed to exist and belong to a complex Banach space (X, | • | X ) for any choice of the parameter µ in a convex subset M of a topological vector space X .Our investigations are motivated by the fact that in many applications of practical interest, the parameter space or set is not a Banach space or even a linear space and the sensitivity analysis developed in [19] is therefore not applicable.
As we shall see below, our motivating example involves the set P(S) (or a convex subset of P(S)), the general metric space of probability measures defined on a set S, taken with the Prohorov metric (discussed in detail below), when P(S) is viewed as a subset of the dual X = (C * B (S), T w * ) of the bounded continuous functions C B (S), with X = C * B (S) taken with the weak * topology.
The topological vector space structure of X and the underlying convex subset M ⊂ X do not permit the usual framework where we can define and use a Frechet derivative of f with respect to µ.However, the convexity assumption for M allows us to perform a sensitivity analysis of (1) by means of directional derivatives of f with respect to µ, when µ stays within the convex set M. Given two points µ, ν in M, we define the derivative δf (t, x, µ; ν − µ) of f at µ in the direction ν − µ to be the value of the limit lim provided this limit exists in X.Our goal in this paper is to show that under suitable conditions imposed on f , the sensitivity of the solution x with respect to µ, defined as the directional derivative x(t, µ + (ν − µ)) − x(t, µ) , (3) exists in X and satisfies the sensitivity equation We note the different nature of the derivatives of f appearing above.Here f x is the Frechet derivative of f with respect to x, whereas δf (t, x, µ; ν − µ) is the directional derivative of f at µ in the direction ν − µ.
Motivation for investigating sensitivity is widespread and there is a substantial literature on the subject.The large literature includes a number of books devoted to both elementary and advanced aspects of sensitivity [34,36,38,40,41,50,61].Equations for the sensitivity of a system with respect to finite dimensional vector parameters are used in standard methodology for optimization and inverse problems including least squares, maximum likelihood, computation of standard errors in statistics, etc., [35], as well as in model discrimination/model selection related quantities (dispersion matrix, Fisher information matrix, etc., [29,44]).Applications of sensitivity methods for finite dimensional vector parameters are widespread and can be found in many applications including biology [24] and physiology [22], mechanics [1,41], and control theory [61].More recently, investigators' attention has turned to more complex formulations for sensitivity of infinite dimensional systems with function space parameters in problems involving shape sensitivities or sensitivities with discontinuous coefficients (for interesting and well motivated pioneering results in this area, we recommend that the reader see [25,26,30,31,32,33] and [54] and the references therein).However, sensitivity for the systems we investigate here of the form ẋ(t) = F (t, x(t), P ), (5) where P is a probability distribution or measure, represents another class of problems where the "parameter" sets are infinite dimensional and presents a new set of both theoretical and computational challenges.Again the applications motivating such systems are diverse and growing with the increased scientific interest in incorporating uncertainty into models and systems.The need to employ dynamics with probabilistic structures has recently received increased emphasis.In particular, systems with probability measures or distributions embedded in the dynamics (problems involving aggregate dynamics as discussed in [8]) have become important in applications in biology [6,7,8], electromagnetics [11,12] and hysteretic [15,16,28,43,45] and polymeric [17,18,20] materials.While the interest in sensitivity of systems such as (5) is recent, we point out that general interest in such systems is rather longstanding.We illustrate with a simple example from control theory arising in relaxed or chattering control problems [46,47,51,58,59,60] wherein the controls are probability measures.Indeed, such systems date back to the seminal work of L.C. Young on generalized curves in the calculus of variations [62,63].Briefly, early applications in what is now called classical optimal control theory led to the need for controls that switched frequently or "chattered" between constant (in time) values in a control restraint set U .To be more precise, one has a dynamical system of the form and one wishes to "control" the system through the input or control function u which takes values in the set U .Here the requirement that u(t) ∈ U, 0 ≤ t ≤ T, is dictated by physics or the engineering aspects of the problem.In the so-called "chattering controls" that were found to be "near optimal" (for certain given criteria), the controls were piecewise constant taking values u j on intervals [t j , t j+1 ) comprising a partition of [0, T ].Thus the near optimal controls had the form In differential form (with respect to the control set) this becomes This in turn leads to the near optimal form of ( 6) being given by Furthermore, it was found that by increasing the number of switches through refinement of the time partition (so that the control switches or "chatters" increasingly often), one does increasingly better in optimizing the cost criterion.In the limit one obtains an optimal control that switches infinitely often between certain values in U .Such a control "function" is not a function in the usual sense but it can be realized by a "relaxed control" or U -valued time dependent measure µ(t; •) defined on the Borel subsets B of U .The corresponding limiting control system becomes Here the time dependent measure µ is the weak * −limit (on C B (U ), the bounded continuous functions v → f (t, x, v)) as the partition mesh in (9) approaches zero.By formulating the optimization problem over the relaxed control system (10), one obtains not only closure but also convexity in the resulting control problem.In this relaxed control setting it is then possible to obtain well-posedness (existence, uniqueness, appropriate continuous dependence, etc.) for the optimal control problem and to argue that the resulting optimal limit or relaxed control can be well approximated by sequences of ordinary control functions of the form given in (7).While much theoretical work (existence, uniqueness and some approximation) was carried in the period 1960 − 1975, little to no effort was made on a sensitivity methodology for such problems.While the by-now classical references (in particular, the book [60] by Warga presents a beautiful summary) are difficult to read, such an undertaking is wellworth the effort for the wealth of ideas found in this area of control theory.We note that the weak * topology on the dual C * B (U ) was quite naturally used in the development of relaxed control theory and, as we shall explain below, this is precisely the topology induced by the Prohorov metric (although convergence of measures in this topology is often called "weak convergence" by probabilists in something of a misnomer from a functional analysis point of view).We also remark here that more recent [15,16] formulations for materials with memories in a Preisach measures setting also lead to systems of the form (10) and provide yet another motivation for the sensitivity efforts pursued in this paper.Before turning to our new theoretical and computational findings for these systems, in the next section we offer several current applications where measure or probability distribution dependent dynamical systems are playing a major role.

Motivating Examples with Measure/Probability Density Dependent Dynamics
Our first example from recent investigations [11,12] in which the systems of the type under investigation here arise naturally entails Maxwell's equations for electromagnetic waves impinging on a heterogeneous dielectric material such as biotissue.If we consider a domain D = Ω 0 ∪ Ω where the dielectric material is Ω and the ambient Ω 0 is treated as a vacuum, then Maxwell's equations govern the electric field E and the magnetic field H in D and are given by the system Here ρ is the charge density in Ω and, as usual, the current J is composed of a source current J s and a material induced conductive current J c in Ω.Within the domain D we need constitutive relations that relate the flux densities D, B and the conductive current J c to the electric and magnetic fields E and H, respectively.These are given by in Ω by where r = (1 + χ) is the relative permittivity of the dielectric medium and χ accounts for the instantaneous polarization due to the interface between Ω 0 and Ω. (Of course in Ω 0 we have J c = 0, P m = 0 and χ = 0 so r = 1.)The dielectric material polarization P m generally has convolution form where g is the general dielectric response function (DRF).In every practical example (Debye, Lorentz, etc.) DRFs are parameter dependent as well as time (and possibly space) dependent; we represent this as g = g(t, x; ν), where typically ν = ( ∞ , s , τ ) contains parameters such as the high frequency limit dielectric permittivity ∞ , the static permittivity s , and relaxation time τ .Examples of often-used DRFs (see [9]) are the Debye in a material region Ω defined in the time domain by g(t, x) = 0 ( s − ∞ )/τ e −t/τ , and the Lorentz given by g(t, x) = 0 ω 2 p /ν 0 e −t/2τ sin(ν 0 t).The macroscopic polarization model ( 13) can be derived from microscopic dipole, electron cloud, etc., formulations by passing to a limit over the molecular population.However, such derivations tacitly assume that one has similar individual (molecular, dipole, etc.) parameters; that is, all dipoles, molecules, "electron clouds", etc., have the same relaxation parameters, plasma frequencies, etc. Historically, such models based on molecular level homogeneity throughout the material have often not performed well when trying to compare models with experimental data.Indeed, in 1907 Von Schweidler [27,56] observed the need to assume multiple relaxation times when considering experimental data and in 1913 Wagner [27,57] proposed continuous distributions of relaxation times.There are now incontrovertible experimentally based arguments for distributions of relaxation parameters in mechanisms for heterogeneous materials (see [37]).Moreover, there is compelling evidence of the presence of multiple mechanisms in complex materials such as tissue and modern polymeric composites.These multiple mechanisms may involve interfacial polarization, dipolar orientation, ionic diffusion and may often require a selection of several types of distributional representations from examples such as the Debye and Lorentz models.To allow for a distribution P 1 of parameters ν over some admissible set N , it is useful to generalize the polarization law (13) to where h(t, x; We expect to chose P 1 from (or from a subset of) the space P(N ) of all probability measures P 1 on N .
To allow for dielectric materials with multiple mechanisms or multiple DRFs (i.e., heterogeneous molecular structures), one may further generalize (14) by considering a family G of possible DRFs and distributions P 2 over this family.This leads to the polarization constitutive relationship where for P 1 ∈ P(N ) and P 2 ∈ P(G), K is defined by When we use ( 16)-( 17) in the Maxwell system (11)- (12), we are led to a system of partial differential equations where lower order terms (in time) depend on probability measures.
These resulting systems have the form of a partial differential equation which can be written as an abstract ordinary differential equation in a Banach or Hilbert space ẋ(t) = F (t, x(t), P 1 , P 2 ), where P 1 , P 2 are probability distributions or measures.
A second class of models [5] depending explicitly on distributions to incorporate uncertainty in the system dynamics involves the use of shrimp grown in production "raceways" (essentially large growth chambers wherein environmental factors such as temperature, oxygen, nutrient levels, etc., can be carefully controlled) artificially infected to efficiently produce large quantities of an associated vaccine.In such systems one recruits the biochemical machinery in an existing biomass for the production of a vaccine or antibody by infection using a virus carrying a passenger gene for the desired antibody response.
In the example developed in [5] one begins with a population of healthy shrimp and infects them with a recombinant viral vector (e.g., recombinant Taura Syndrome virus or rTSV in the example developed in [5]) expressing a foreign antigen, resulting in vaccine production in live infected shrimp.For simplicity the amount of vaccine produced can be assumed equal to the total infected biomass so that vaccine production will essentially follow the course of the viral dynamics in the shrimp.This requires modeling the dynamics of shrimp at the population level.Since shrimp have size dependent characteristics as well as structuredependent responses to external environment, an appropriate model might be based on the classical McKendrick-von-Foerster/Sinko-Streifer size-structured population equations [42,48] with mass as the structure variable, i.e., one equates the size variable with the mass in this model.
Experimental results suggest that mortality rates in acutely infected shrimp depend on the length of time that the shrimp remain acute.Moreover individuals in a latent phase have varying residency times before they progress into the acute phase.To account for the different residency periods of individual shrimp one can record the variable residency times in the different stages by introducing a new variable which one calls the class age of an individual.The class age of an individual in a given stage represents the length of time that the individual spends in that stage and serves as a surrogate for time delays.
In the vaccine production stage the shrimp can be infected by distributing chopped dead shrimp infected with a recombinant virus evenly throughout the raceway and it might be reasonably assumed that all the shrimp have an equal chance of becoming infected by eating the infected biomass.The expected time interval for infection is 7 to 10 days.During this time interval almost no shrimp progress into the chronic state.Therefore it is reasonable to consider the following three compartment states: susceptible (S), latently infected (L) and acutely infected (A) in a model (as depicted schematically in Figure 1) where it is assumed that shrimp will become instantly infected (i.e., progress into latent state) as soon as they ingest some of the infected biomass.As we have noted earlier, however, experimental observations suggest that there exists a temporal delay between the initial latent infection and initial acute infection [39].Moreover, it is biologically unrealistic to expect all members of the shrimp population to progress into the acute phase at a fixed number of days after initial latent infection.In addition the shrimp in the acute phase have varying mortality rates because of the different times that they progress into the acute phase and also due to the differences in genetic make-up of the host.It is difficult to account for the class age history (i.e., the length of time that shrimp spend in a state) of shrimp in a particular (latent or acute) state using a system of delay PDE's with only size as the structure variable.So in order to model variable residency times one may keep track of the class age and the size of shrimp by incorporating both size structure and class age structure into the latent and acute states.
It is expected that there is a positive probability that shrimp can stay in each the latent and acute state for more than 7 to 10 days.Thus one can assume that the class age interval for both states is the same as the time interval T V that we consider in our model.
The vaccine production model based on the above discussions, the stages as depicted in Figure 1 and as developed in [5], is given by In the above S(x, t) denotes the density of susceptible individuals having mass x at time t and ∂ t = ∂ ∂t .The function L(x, t, θ) denotes the density of latent individuals having mass x at time t that have spent θ days in the We note that this is again a probability distribution (P L ) dependent dynamical system (in this case a complicated system of partial differential equations) for which the distribution P L must be estimated in some type of inverse problem.
Our next example is one developed and investigated in [6,7] for models of progression of Human Immunodeficiency Virus (HIV) at the cellular level in mice.The resulting system is typical of delay systems that arise in biochemical pathways and cellular level kinetics of drug metabolism as well as other synthesis models.The HIV system based on the schematic for activity within an acutely infected cell as depicted in Figure 2 has the form (18) which is a vector system of integro-differential equations.Special cases of such systems include those in which the probability measures are defined on some finite interval Q = [−r, 0] of possible delay values θ.In this model one has compartments T, A, C, and V for in vitro blood level counts in mice of target (CD4+) cells, acutely infected cells, chronically infected cells and active viral particles, respectively.Free virus V infects target cells T , transforming them into acutely infected cells A which at some time later become chronically infected cells C. The basic pathway for infection and production of virus for acutely infected cells is schematically depicted in Figure 2.For models in which the individual kinetics for loss of envelope and capsid, integration, transcription, and assembly are not detailed, it is necessary (see [7]) to include a delay τ 1 from the time of infection of a target cell T until it first produces free virus V .There is also some delay τ 2 before an acutely infected cell A as depicted in Figure 2 becomes a chronically infected cell C.However, these delays vary over the population of cells and hence it is the distributions P 1 , P 2 of these delays, not a specific value of the delays that are of interest in the modeling.The variables C and V then become the expected values of the chronic cell levels and the viral loads, respectively (see [7] for a detailed derivation).This model was successfully used (again see [7]) to describe in vitro mice data from Dr. Michael Emerman's lab at Fred Hutchinson Cancer Research Center.Using inverse problem methodology and statistical analysis it was shown that improvement of fit to data by inclusion of the delays τ 1 or P 1 is statistically significant while inclusion of delays τ 2 (or distributions P 2 ) is less important.Indeed, it was found that the experimental data could not be properly fit with an ODE version (i.e., with the delay distributions omitted) of the model.
In a final example we introduce briefly a model for sublethal damage in populations of insects (pea aphids in this case) being exposed to pesticides.Background for such models can be found in [3] where both Leslie matrix models and size-structured partial differential equation models are employed.Here we discuss a different approach involving models we are currently investigating; this formulation focuses on uncertainty with respect to the delay after exposure before death.While the exact mechanisms for death are still being investigated by entomologists, there is sufficient understanding to attempt to develop and test models against experimental data.Insecticides generally affect neonates and adults somewhat differently and at different rates.Therefore, we let A(t) and N (t) denote the population density of the adult pea aphids and neonates, respectively, at time t.From previous studies, entomologists learned that the adult pea aphids reproduce best around 65 degrees Fahrenheit (18 degrees Celsius) and 80 percent of humidity [49].Furthermore, each adult female aphid produces 6 to 8 neonates (nymphs) a day up to 100 nymphs.These reproductive characteristics of adult aphids can be incorporated into the birth rate b(t, e, s) as a function of time t, environment e (temperature and humidity) and offspring productivity s.However, for simplicity, we here just assume a constant birthrate b(t, e, s) = b.In addition, one finds that it takes a newborn pea aphid or neonate τ days to become an adult aphid where τ can range from 5 to 7 days.Hence, a simple model for the pea aphids population density without the insecticides can be described by a distributed delay differential system where d A (t) and d N (t) represent the natural death rate at time t of the adults and the neonates, respectively.Here m(τ ) denotes the rate (probability) at which nymphs N become adults at age τ and move into the A compartment.Next we introduce the effects of the insecticide Margosan-O on the pea aphids into equation (19).It is known that Margosan-O can act on neonates and adults in several ways.First, the surfactant that causes the insecticide to adhere to plants' leaves can coat an aphid's exoskeleton upon contact, causing it to effectively smother to death shortly after exposure.Define p A (t) and p N (t) to be the probability of the adult and neonate aphids dying at time t due to Margosan-O.In the 1994 experiments [55], Stark and Rangus concluded that Margosan-O also affects the reproduction rate of the adult pea aphids.Thus, the birthrate b is no longer a constant but a function of insecticide concentration γ ∈ [0, 100] for Margosan-O concentration ranges from 0 to 100 parts per million (ppm).The functional form of b(γ) can be any decreasing function where b(0) = b max and b(100) = 0.A further cause of death in the neonates is mutation and subsequent death during molting in the presence of insecticide.One manner in which one can represent this is to make the probability density p N for death of neonates also dependent on the insecticide concentration γ.Of course, the adult death rate p A also depends on γ.The aphids population density taking into account insecticide effects can then be modelled by the system of differential equations This is a simple first model in which the quantities of interest include the probability densities p A and p N (which carry information about the sublethal effects of the insecticide) as well as the probability density m.It is easy to extend this model to more sophisticated and more useful models.For example, one might assume the birthrate is a probability density at time t over a continuum of pesticide concentrations γ.That is, b(t, γ) is a probability density with corresponding time dependent distribution dB(t, γ) = b(t, γ)dγ describing frequencies at which adult aphids give birth to neonates.
Our group is currently using models such as (20) with the data investigated in [3]; initial results are quite promising.

Theoretical Framework
In order to achieve our goal in a rigorous manner, we establish first existence and uniqueness results for equations ( 1) and ( 4), continuous dependence of the solution x on the parameter µ, and finally the fact that the directional derivative y(t) = δx(t, µ; ν − µ) exists and satisfies (4).Throughout this section, we will make use of the following form of the Gronwall's inequality, whose proof we do not present.
Lemma 1 (Gronwall's inequality) Let ϕ, ψ : [t 0 , T ] → R be two continuous and nonnegative functions and let K and L be positive constants, such that the inequality Consider the abstract differential equation (1) where f : R + ×X ×M → X is a continuous mapping in all three arguments; it is clear that for t ≥ t 0 , a solution x(t, µ) of (1) satisfies the equivalent integral equation Before studying the sensitivity of the solution of (1) with respect to the parameter µ, one must first establish that the solution of (1) exists and is unique.We define the successive approximations for the system (1) to be the functions x 0 , x 1 , ..., given recursively by for k = 0, 1, 2, .... Lemma 2 (Existence and Uniqueness) Let f : R + × X × M → X be continuous in t, x and µ and Lipschitz continuous in x, i.e.
for some constant C > 0. Then the successive approximations x k converge uniformly on [t 0 , T ] to the unique solution x of (1).
Lemma 3 (Continuous Dependence on Parameters) Let f : R + ×X ×M → X be continuous in t, x and µ and Lipschitz continuous in x and for µ = µ 0 let x(t, µ 0 ) be a solution of (1) existing on [t 0 , T ].Assume further that uniformly in (t, x).Then the differential equation (1) has a unique solution x(t, µ) satisfying The proofs of Lemma 2 and Lemma 3 above are omitted since they are very similar to those provided in [19] and in standard abstract differential equation results in the literature.
Theorem 1 Let f : R + ×X ×M → X of (1) be a continuous function in t, x and µ and have a Frechet derivative f x (t, x, µ) with respect to x such that Then the directional derivative y(t) = δx(t, µ; ν − µ) with respect to µ exists, with y : R + × X × M → X and satisfies the equation , by applying Lemma 2 we find that the differential equation ( 25) has a unique solution which we denote by y(t).Let us consider the difference for any two arbitrarily fixed µ, ν ∈ M. Equivalently, we have In order to show that the directional derivative of the solution x of (1) with respect to the parameter µ satisfies the sensitivity equation ( 25), we will prove that lim From the Frechet differentiability of f with respect to x ∈ X, we have where as |m(t, µ, ν, )| → 0. Consequently, we define g 1 (t, µ, ν, ) by so that g 1 (t, µ, ν, ) → 0 uniformly in t as → 0. Now for y(t) satisfying ( 25), we have −f x (s, x(s, t 0 , x 0 , µ), µ)y(s) − δf (s, x(s, t 0 , x 0 , µ), µ; ν − µ) ds Since the Frechet derivative f x is bounded and continuous with respect to the parameter µ, we have that and converges to 0 uniformly in s as → 0. In addition, we note that the solution y(t) of equation ( 25) is continuous and therefore bounded by a constant K 0 > 0 for all t ∈ [t 0 , T ].Also, from the definition of the directional derivative δf (s, x(s, µ), µ; ν − µ), we have that converges in s to 0 as → 0. Using all the arguments above, we obtain that Next we want to show that for some constant K 1 > 0. Hence, we want to consider From equation (26), we obtain Using the definition of the directional derivative of δf (t, x, µ; ν − µ), we have that Furthermore, with the assumptions that Applying Gronwall's inequality, we obtain that where (M 0 +g 1 )ds and g 1 (s, µ, ν, ) and g 3 (s, µ, ν, ) converge to 0 in s as → 0. It follows Then we obtain Applying Gronwall's inequality one more time, we have M 0 ds and taking the limit of → 0, we have lim which completes the proof.
As noted in [19], under somewhat weaker assumptions of local Lipschitz conditions on f plus domination of f by an affine function one can establish similar results to those presented here.

Sensitivity Analysis for Probability Distribution Dependent Systems
We next apply the theoretical results established in the previous section to provide sensitivity equations for specific probability distribution dependent systems.We illustrate the ideas above with two examples possessing similarities to systems arising in modeling HIV cell dynamics [7] where a four-dimensional nonlinear distributed delay differential equation system is employed.Previous studies have focused on the theories and computations of sensitivity of the HIV model with respect to the distribution for separate cases of discrete distributions [6] and absolutely continuous (continuous) probability distributions or measures [19].We are interested in a theoretical and computational framework for the sensitivity with respect to a class of mixed probability distributions which have an absolutely continuous component and a saltus component in nonlinear distributed delay systems.Since we will be dealing with measures on real intervals, we recall that there is a one-to-one identification between the probability measures and their representations by probability distributions.Therefore, in general, we will not distinguish between a measure and its corresponding distribution when no confusion can occur.We first consider a scalar linear distributed delay differential equation and then a two-dimensional nonlinear distributed delay system.We present the theoretical framework for the sensitivity analysis in this section and discuss corresponding computations in the next section.We consider the delay differential equation where x 0 ∈ R, Φ ∈ C(−r, 0) and f ∈ L 2 (0, T ) for a fixed T > 0. We wish to analyze the sensitivity of the solution x with respect to the probability distribution P , which in this case is assumed to be the sum of an absolutely continuous part and a singular part with respect to the Lebesgue measure on the interval S = [−r, 0], i.e.
Here p is the density of the absolutely continuous part of P (guaranteed to exist a.e. by the Radon-Nikodym theorem) and ∆ α i are Dirac measures with atoms at α i , i = 1, . . ., I in the interval [−r, 0].The main inconvenience we encounter when we try to investigate the sensitivity of x with respect to P is due to the fact that the set of the probability distributions of the form ( 28) is no longer a linear subspace of a Banach space and therefore we have to make sense of a meaningful way in which to perturb the probability distribution P and how to quantify the corresponding change in the solution x.As we shall see below, our general theoretical framework derived in Section 3 provides an elegant answer to this question if we interpret P as a "parameter" in an appropriately chosen space.Let us denote by (P(S), ρ) the general space of probability measures defined on S = [−r, 0] endowed with the Prohorov metric ρ, which can be formally defined as follows.For any closed subset F ⊂ S and > 0, we consider F = {q ∈ S : d(q, q) < , q ∈ F } and define the Prohorov metric ρ : P(S) × P(S) → R + as Although the definition of ρ is not intuitive, one can show that (a) (P(S), ρ) is a complete metric space, and as k → ∞, for all bounded uniformly continuous f : S → R.
If we view P(S) as a subset of X = C * B (S), from the equivalence (b) above we obtain that convergence in the Prohorov metric is equivalent to the weak * convergence in P(S) [4,23].Here C * B (S) denotes the topological dual of C B (S), the usual space of bounded continuous functions on S taken with the supremum norm and we take X = C * B (S) with the weak * topology.It is important to note that X = C * B (S) endowed with the weak * topology is a topological vector space which is not metrizable.However, the restriction of this topology to P(S) is metrizable and the metric which induces it is precisely the Prohorov metric ρ.In subsequent discussions in this paper, we shall take P(S) as the equivalent set of probability distributions on the finite interval S = [−r, 0].
Next, let us denote by M the subset of P(S) of probability distributions P of the form (28), that is, where N is the set natural numbers.Thus M is the set of probability distributions which can be expressed as the sum of an absolutely continuous part with respect to the Lebesgue measure and a singular part consisting of a finite sum of Dirac measures.Using the definition (29), one can easily check that M is a convex subset of (P(S), ρ) and therefore a convex subset of C * B (S).If we take the convex set M defined above with the Prohorov metric, we can apply the previous theory developed in Section 3 to perform the sensitivity analysis of (27) with respect to P .
Let (x(t), x t ) ∈ X = R × C(−r, 0; R) and M be given by (29); then (27) can be reformulated as an abstract system of the form (5) where the right side F : R + × X × M → X is defined as which is precisely that of the linear delay equation (27).In order to quantify how the solution of the equation ( 27) changes with respect to variations in the probability distribution P , we fix another probability distribution Q in the space M and show that the sensitivity y = ∂x ∂P (t, P ; Q − P ) of x with respect to P defined as the directional derivative of x at P in the direction Q − P exists and is uniquely defined.Then, by the theory of the previous section, y : where F is defined in (30).In the equation above x t represents the delay function defined on [−r, 0] and given by x t (τ ) = x(t + τ ).Similarly, y t represents the delay sensitivity function given by y t (τ ) = y(t + τ ) for −r ≤ τ ≤ 0. The quantities ∂F/∂x, ∂F/∂x t and ∂F/∂P represent the Frechet derivatives of F with respect to x, x t and, respectively, the directional derivative of F at P in the direction Q − P .It is easy to verify that ∂F/∂x = 0, and because the integral appearing in (30) is linear in x t , the Frechet derivative of F with respect to x t is simply In order to evaluate the directional derivative ∂F/∂P , let Q ∈ M be another probability distribution of the form Then the partial derivative ∂F/∂P at P in the direction Q − P is well defined and given by ∂F ∂P (t, x, x t , P ; Given the form of P + ε(Q − P ) and P , we obtain that ∂F ∂P (t, x, x t , P ; With F given in (30) and ∂F/∂x t and ∂F/∂P , respectively, derived in equation (32) and equation (33), Theorem 1 provides the existence and uniqueness of y(t) defined in (31) above.More explicitly, the sensitivity y(t) = ∂x ∂P (t, P ; Q − P ) is well defined and satisfies the delay differential equation We next consider a nonlinear example where c 1 , c 2 ∈ R, Φ ∈ C(−r, 0) and f, g ∈ L 2 (0, T ) for fixed T > 0. We define X = R 2 × C(−r, 0; R 2 ) and let (x(t), x t ) ∈ X where x(t) = (x 1 (t), x 2 (t)) T .Then for M defined in (29), the system (35) can be written in the vector form where x t (τ ) = x(t + τ ) and F : R + × X × M → X is a vector function whose components are given by the right sides of (35).Again, applying Theorem 1 from Section 3, we find that the sensitivity y : R + × X × M → X given by y(t) = (∂x/∂P )(t, P ; Q − P ) of system (36) By substituting these derivatives in (38) we obtain the sensitivity equation for (35).

Examples
In this section we present numerical results for the two examples given in the previous section.We outline the scheme to approximate the state solution and the solution of the corresponding sensitivity equation and then illustrate the convergence of the scheme with numerical simulations.

A Linear Example
We first discuss numerical simulations for the linear delay differential equation defined in (27) and the corresponding sensitivity equation (34).The two equations can be written together as a coupled system where τ α i , i = 1, . . ., I, and τ β j , j = 1, . . ., J, are atoms in [−r, 0] for the associated Dirac measures.In order to define approximations for (39), we first rewrite the system as an operator equation in a Hilbert space Z (e.g., see [10,13]).We let v = (x, y) T and z(t) = (v(t), v t ) ∈ Z ≡ R 2 × L 2 (−r, 0; R 2 ), where v t is the function τ → v(t + τ ) for τ ∈ [−r, 0).Given η = (ψ 0 , ζ 0 ) T ∈ R 2 and ϕ = (ψ, ζ) T ∈ C(−r, 0; R 2 ), we define a linear operator where Then the delay system (39) above, when formulated as an abstract Cauchy problem, is given by where the l M j 's are the usual piecewise linear splines on [−r, 0].It is known (see [14]) that for any P with corresponding absolutely continuous part p, one can, by appropriate choice of the coefficients {a M k }, obtain convergence of p M = a M k l M k to p in L 2 .This guarantees convergence of P M to P in the Prohorov metric.We note that any approximation scheme that guarantees weak L 2 convergence of p M → p could be used in the approximations P M for P in the Prohorov sense (e.g., see [20]).
For a given P, Q suppose that P M , Q M ∈ M M have been chosen so that P M → P , Q M → Q in the Prohorov sense.Then we approximate the system (40) by The convergence of z N,M to the solution z of (40) as N, M → ∞ can be argued using standard convergence arguments as in [14] and the convergence z N → z, P M → P and Q M → Q as N, M → ∞.We illustrate this convergence with simulations for a specific case of the linear delay system (27) for special choices of the distributions P and Q.
We first fix N = 16 and solve the approximation system (43) for different values of M and plot the approximations x N,M , y N,M of the solution x and the corresponding sensitivity y(t), respectively, in Figure 3.These graphs illustrate the convergence guaranteed by arguments similar to those in [14]; in Figure 3(a) we observe that x N,M → x and in Figure 3(b) that y N,M → y for a fixed N = 16 and M increasingly large.We then fix M = 16 and solve system (43) for different N values.The approximating solutions x N,M and y N,M for a fixed M = 16 and N increasingly large are graphed in Figure 4. Again, one can see from Figure 4 that we have convergence for both the solutions x N,M and sensitivity y N,M as N → ∞ and M = 16.We remark that by solving here the approximation to the system (46), we approximate x and y simultaneously.Alternatively, one can approximate x and y separately by first approximating x to obtain x N,M and then using x N,M in solving the equation for y N,M .

A Nonlinear Example
We consider next the nonlinear delay differential system (35) and the corresponding sensitivity system (38) derived in Section 4. Upon substituting for P, Q and coupling the delay system (35) and its sensitivity equation (38), we obtain where τ α i and τ β i ∈ [−r, 0].We formulate an abstract Cauchy problem for system (47) by em- r, 0; R 4 ).We may then define a nonlinear operator where D (i,j) is a 4 × 4 matrix with 1 at the (i th , j th ) entry and zero everywhere else and Then the abstract Cauchy formulation of the distributed delay system (47) above is where G(t) = ((f (t), g(t), 0, 0) T , (0, 0, 0, 0) T ) and z 0 = ((x 0 1 , x 0 2 , 0, 0) T , (Φ, 0, 0, 0) T ).For numerical purposes one can employ an approximation framework similar to that in the linear example above; we do not repeat details here.We note that the combined results in ([2], [7], [13]) can be used to establish that the resulting piecewise linear spline approximating scheme {Z N , Π N , A N (P, Q)} provides convergence of z N (t) to z(t) for the system (47).Here z N (t) is the solution of the corresponding approximation system similar to system (41) given above for the linear example.We again may approximate the distributions P, Q in the Prohorov sense so that P M → P and Q M → Q as M → ∞, and again construct corresponding approximating systems similar to system (43) in the linear example.In this case, one also can argue the convergence of the approximations z N,M (t) → z(t) as N, M → ∞ using z N (t) → z(t) for N → ∞ and P M → P , Q M → Q as M → ∞.
We again illustrate convergence with simulations corresponding to specific distributions P and Q.In particular, we again choose P to be given in (44) and Q to be the uniform distribution dQ(τ ) = q(τ ) dτ = 1 dτ for τ ∈ [−1, 0].We choose f (t) = 1, g(t) = 2, a = −3 and b = −2 and define Φ(θ) = 0 for θ ∈ [−1, 0) and x 0 1 = x 0 2 = 1.Then we solve the corresponding approximating system which is similar to the system given in (43); we first fix N = 16 and let M vary.The approximations of the solution x N,M (t) are plotted in Figure 5 and the sensitivity y N,M (t) are depicted in Figure 6.We see from Figure 5 and Figure 6 that with a fixed N = 16 and M → ∞, both x N,M and the corresponding sensitivity y N,M converge.We then take M fixed at 16 and let N vary.Similar to the previous case where N is fixed, we observe from Figure 7 and Figure 8 where we plot x N,M and y N,M respectively, that x N,M and y N,M converge to x and y for M = 16 and N → ∞.

Concluding Remarks
In this paper we have developed both theoretical and computational aspects of a sensitivity methodology for probability measure dependent dynamical systems.In addition to the usual uses of sensitivities, the results developed here represent a first step toward a general mathematical and statistical methodology for estimation of uncertainties (in this case represented by probability distributions P ) in the dynamics of a system.In particular, we refer to a theory for the asymptotic properties of ordinary least squares (OLS) estimators POLS which could lead to standard errors and confidence "intervals" in a functional setting.In the standard finite dimensional parameter theory (see [35] and Chapter 12 of [52]), the asymptotic theory is given in terms of the covariance matrix Σ 2 which can be approximated by Σ 2 ≈ σ 2 0 (χ T χ) −1 , where χ is the sensitivity matrix χ = ∂x i ∂θ j .
For the systems of interest here, the goal is to develop a rigorous mathematical/statistical framework for OLS estimators for "parameters" P ∈ P(S), i.e., the parameters to be estimated are actually probability distributions in an infinite dimensional set.For this an operator analogue of the covariance matrix will involve the sensitivities χ = ∂x ∂P .
Thus, in addition to use in routine sensitivity analysis for uncertainty in dynamical systems, the theory presented above provides a foundation to develop an operator theoretic form of the needed asymptotic distribution theory similar to the one currently available for estimation in a finite dimensional parameter setting.

Figure 2 :
Figure 2: HIV infection pathway in acutely infected cells.