On differentiability of stochastic flow for a multidimensional SDE with discontinuous drift

We consider a $d$-dimensional SDE with an identity diffusion matrix and a drift vector being a vector function of bounded variation. We give a representation for the derivative of the solution with respect to the initial data.


Introduction
Consider an SDE of the form (1) dϕ t (x) = a(ϕ t (x))dt + dw t , is a d-dimensional Wiener process, a = (a 1 , . . . , a d ) is a bounded measurable mapping from R d to R d . According to [23] there exists a unique strong solution to equation (1).
It is well known that if a is continuously differentiable and its derivative is bounded, then equation (1) generates a flow of diffeomorphisms. It turns out that this condition can be essentially reduced [12], and a flow of diffeomorphisms exists in the case of possible unbounded Hölder continuous drift vector a. Recently the case of discontinuous drift was studied in [10,11,19,20] and the weak differentiability of the solution to (1) was proved under rather weak assumptions on the drift. The authors of [10] consider a drift vector belonging to L q (0, T ; L p (R d )) for some p, q ∈ R d such that p ≥ 2, q > 2, d p + 2 q < 1.
They establish the existence of the Gâteaux derivative in L 2 (Ω × [0, T ]; R d ). In [20] it is proved that for a bounded measurable drift vector a the solution belongs to the space L 2 (Ω; W 1,p (U )) for each t ∈ R d , p > 1, and any open and bounded U ∈ R d . The Malliavin calculus is used in [19,20]. The aim of our paper is to find a natural representation of the derivative ∇ x ϕ t (x) if a is discontinuous. We suppose that for 1 ≤ i ≤ d, a i is a function of bounded variation on R d , i.e. for each 1 ≤ j ≤ d, a generalized derivative µ ij = ∂a i ∂xj is a signed measure on R d . Let µ ij,+ , µ ij,− be measures from Hahn-Jordan decomposition µ ij = µ ij,+ − µ ij,− . Denote |µ ij | = µ ij,+ + µ ij,− . Assume that for all 1 ≤ i, j ≤ d, |µ ij | satisfies the following condition (see Condition (A) in Section 1) 2s ds |µ ij |(dy) = 0.
The condition we impose on the drift is more restrictive then that of [10,20], but it allows us to obtain an explicit representation for the derivative in terms of intrinsic parameters of the initial equation (see Theorem 3). Our methods are different from the ones used in the papers cited above. We show that the derivative Y t (x) in x is a solution of the following integral equation where A t (ϕ(x)) is a continuous additive functional of the process (ϕ t (x)) t≥0 which is equal to t 0 ∇a(ϕ s (x))ds if a is differentiable, E is a d-dimensional identity matrix. This representation is a natural generalization of the expression for the derivative in the smooth case.
In the one-dimensional case (see [3,4]) the derivative was represented via the local time of the process. It is well known that the solution of (1) does not have a local time at a point in multidimensional situation. We use continuous additive functionals for the representation of the derivative. This method can be considered as a generalization of the local time approach to the multidimensional case.
Our method is likely to can be used in the case of non-constant diffusion. The paper is organized as follows. In Section 1 we collect some definitions and statements concerning continuous additive functionals. The main result of the paper is formulated in Section 2 (see Theorem 3). For the proof we approximate equation (1) by equations with smooth coefficients. The definitions and properties of approximating equations are given in Sections 3, 4. We prove Theorem 3 is Section 5.

Preliminaries: W-functionals
In this section we collect some facts about continuous additive functionals which will be used in the sequel. Further information can be found in [8], Ch. 6-8; [13], Ch. II, §6; see also very detailed exposition on continuous additive functionals in [6].
Let (Ω, F , P ) be a probability space with filtration {F t : t ≥ 0}. Let (ξ t ) t≥0 be a continuous R d -valued homogeneous Markov process with infinite lifetime adapted to the filtration F t . Denote N t = σ{ξ(s) : 0 ≤ s ≤ t}. Definition 1. A random function A t , t ≥ 0, adapted to the filtration {N t } is called a continuous additive functional of the process (ξ t ) t≥0 if it is • non-negative; • continuous in t; • homogeneous additive, i.e. for all t ≥ 0, where θ is a shift operator. If additionally for each t ≥ 0, sup Remark 1. It follows from Definition 1 that a W-functional is non-decreasing in t, and for all x ∈ R d P x {A 0 = 0} = 1.
Proposition 1 (See [8], Theorem 6.3). A W-functional is defined by its characteristic uniquely up to equivalence.
The following theorem states the connection between convergence of functionals and convergence of their characteristics.
Theorem 1 (See [8], Theorem 6.4). Let A n,t , n ≥ 1, be W-functionals of the process (ξ t ) t≥0 and f n,t (x) = E x A n,t be their characteristics. Suppose that for each t > 0, a function f t (x) satisfies the condition Then f t (x) is the characteristic of a W-functional A t . Moreover, where l.i.m. denotes the convergence in mean square (for any initial distribution ξ 0 ).
Proposition 2 (See [8], Lemma 6.1 ′ ). If for any t ≥ 0 a sequence of non-negative additive functionals {A n,t : n ≥ 1} of the Markov process (ξ t ) t≥0 converge in probability to a continuous functional A t , then the convergence in probability is uniform, i.e.
Let h be a non-negative continuous bounded function on R d , let the process (ξ t ) t≥0 has a transition probability density p t (x, y). Then is a W -functional of the process (ξ t ) t≥0 and its characteristic is equal to Let a measure ν be such that R d k t (x, y)ν(dy) is a function continuous in (t, x). If we can choose a sequence of non-negative bounded continuous functions {h n : n ≥ 1} such that for each T > 0, then by Theorem 1 there exists a W-functional corresponding to the measure ν with its characteristic being equal to R d k t (x, y)ν(dy). Formally we will denote this functional by t 0 dν dy (ξ s )ds. A sufficient condition for the existence of a W-functional corresponding to a given measure is stated in the following theorem.
Theorem 2 (See [8], Theorem 6.6). Let the condition hold. Then f t (x) is the characteristic of a W-functional A ν t . Moreover, Let us return to SDE (1). Let (ϕ t ) t≥0 be a solution of equation (1) with bounded measurable a. The transition probability density p ϕ t (y, z) of the process (ϕ t ) t≥0 satisfies the Gaussian estimates (see [2]) are positive constants that depend only on d, T, and a ∞ . Denote by k w t (x, y) the kernel k t (x, y) built on the transition density of the Wiener process, i.e.
It is easily to see ( Therefore, the kernel k w t (x, y) has a singularity if x = y (for d > 1) and the integral is well defined not for all measures. It follows from (5) that a measure ν satisfies condition (4) if and only if it satisfies (4) for k w t (x, y). Therefore (4) is equivalent to the following condition.
A proof is a slight modification of that for the case of ν(dx) = f (x)dx given in [1], Theorem 4.5 (see also [22], Exercise 1 on p. 12). Here f is a non-negative Borel measurable function. We use representation (7) in the proof. If a measure satisfies Condition (A) then it is called a measure of Kato's class (see [16]).
Example 1. Let d = 1. For each y ∈ R d , the measure ν = δ y satisfies Condition (A) and corresponds to the W-functional which is called a local time of Wiener process at the point y. Assume that ν is a measure satisfying Condition (A). This means now that sup x∈R ν([x, x + 1]) < ∞. Then (see [21], Ch. X, §2) the corresponding W-functional can be represented in the form Remark 3. If d ≥ 2, then δ y does not satisfy Condition (A). This agrees with the wellknown fact that the local time for a multidimensional Wiener process does not exist.
Denote by σ S the surface measure. Then for any non-negative bounded function f , the measure ν(dy) = f (y)σ S (dy) satisfies Condition (A).
We will need the following modification of Khas'minskii's Lemma (see [14] or [22], Ch.1 Lemma 2.1). For the convenience of reader we give a proof of this variant of Lemma. Lemma 1. Let the W-function f t satisfies condition (4). Let A t be the corresponding W-functional. Then for all p > 0, t ≥ 0, there exists a constant C depending on p, t, and f t ∞ such that for all x ∈ R d , To prove the Lemma we make use of the following proposition.
Taking into account (2), we see that (9) is valid for all t ≥ 0. Lemma 1 is proved.
The exceptional set may depend on x. Here the measure P x is a distribution of the process (ϕ t (x)) t≥0 . To emphasize that we consider the functional w.r.t. P x we will write A ν t (ϕ(x)).
If for the measure ν Condition (A) holds, then the functionals of the processes (ϕ t ) t≥0 and (w t ) t≥0 are well defined. Denote the corresponding measurable mappings by A ν,ϕ t and A ν,w t and the corresponding additive functionals by A ν,ϕ t (ϕ) and A ν,w t (w). By the Girsanov theorem, for each x ∈ R d , the distributions of the processes (ϕ t (x)) t≥0 and (x + w t ) t≥0 are equivalent. The question naturally arises whether the mappings A ν,ϕ t and A ν,w t are the same. The answer is positive and it is formulated in the next Lemma.

Lemma 2. Let ν satisfy Condition (A). Then for any
Then by the Girsanov theorem, where P−lim means the limit in probability. It remains to show that the characteristics of Theorem 1). This proof is routine and technical, so we postpone it to Appendix.

The main result
Let a be a bounded measurable function of bounded variation. Denote by ∇a the matrix ∂a i ∂xj 1≤i,j≤d . Further on we suppose that for all 1 ≤ i, j ≤ d, the measure |µ ij | = ∂a i ∂xj satisfies Condition (A). By Theorem 2, there exist W -functionals A µ ij,± ,w t (we will denote the corresponding mappings by A ij,± t (·)) with their characteristics defined according to the formula The main result on differentiability of a flow generated by equation (1) with respect to the initial conditions is given in the following theorem.
where E is a d × d-identity matrix, the integral in the right-hand side of (12) is the Lebesgue-Stieltjes integral with respect to the continuous function of bounded variation t → A t (ϕ(x)).
Remark 5. The differentiability was proved in [10,20]. We give a representation for the derivative. Note that the Sobolev derivative is defined up to the Lebesgue null set.

Remark 6. Consider a non-homogeneous SDE
Similar to reasoning of Section 1 a theory of non-homogeneous additive functionals of non-homogeneous Markov processes can be constructed. All the formulations and proofs can be literally rewritten with natural necessary modifications. Unfortunately, there are no corresponding references, therefore we did not carry out the corresponding reasonings.
Consider examples of functions a for which |µ ij |, 1 ≤ i, j ≤ d, satisfy Condition (A).
Example 5. Let for all 1 ≤ i ≤ d, a i be a Lipschitz function. By Rademacher's theorem [9] the Frechét derivatives µ ij = ∂a i ∂xj exist almost surely w.r.t. the Lebesgue measure. It is easy to verify that they are bounded and the Frechét derivative coincides with the derivative considered in the generalized sense. Then |µ ij | satisfies Condition (A). Let where n(x) = (n 1 (x), . . . , n d (x)) is the outward unit normal vector at the point x ∈ ∂D. Condition (A) is also satisfied by the measure generated by a being a linear combination of the form (14) h Further examples of a can be obtained as the limits of sequences of the functions of form (14).
In one-dimensional case all the functions of bounded variation generate measures satisfying Condition (A) (see Example 1).
See also Example 4 showing that if |µ ij | are "Hausdorff-type" measures with a parameter greater than (d − 1), then a satisfies assumptions of the Theorem.

Approximation by SDEs with smooth coefficients
For n ≥ 1, let g n ∈ C ∞ 0 (R d ) be a non-negative function such that R d g n (z)dz = 1, and g n (x) = 0, |x| ≥ 1/n. Put (15) a n (x) = (g n * a)( where the function a satisfies the assumptions of Theorem 3. Note that (16) sup n a n ∞ ≤ a ∞ , and a n → a, n → ∞, in L 1,loc (R d ). Passing to subsequences we may assume without loss of generality that a n (x) → a(x), n → ∞, for almost all x w.r.t. the Lebesgue measure.
Consider an SDE (17) dϕ n,t (x) = a n (ϕ n,t (x))dt + dw t , Then Y n,t (x) satisfies the equation where E is a d-dimensional identity matrix.
Lemma 3. For each p ≥ 1, 1) for all t ≥ 0 and any compact set U ∈ R d , where · is a norm in the space R d .
Proof. Statement 1) follows from the uniform boundedness of the coefficients and the finiteness of the moments of a Wiener process; 2) is proved in [18], Theorem 3.4.
By the properties of convolution of generalized function (see [24], Ch. 2, §7), ∇a n = ∇a * g n . For each n ≥ 1, 1 ≤ i, j ≤ d, the measure µ ij n satisfies Condition (A) (see Example 2). Let µ ij n = µ ij,+ n − µ ij,− n be the Hahn-Jordan decomposition of the measure µ ij n . Then, according to Theorem 2, there exist W-functionals A ij,± n,t (w) of a Wiener process on R d which correspond to the measures µ ij,± n and have characteristics of the form (19) f Since the measures µ ij,± n have continuous densities, the functional A ij n,t (w) = A ij,+ n,t (w) − A ij,− n,t (w) is given by the formula The following simple proposition used for the proof of Lemma 4 is easily checked.
Then the relation f n,t = g n * f t is fulfilled for characteristics corresponding W-functionals from the Wiener process.
Proof of Lemma 4. To prove the convergence of functionals in mean square it is sufficient to show that for each T > 0, 1 ≤ i, j ≤ d, Theorem 1). Then the uniform convergence in probability follows from Proposition 2.
For each 0 < δ < t, We have Because of Condition (A), for each ε > 0, we can choose δ so small that I 2 is less then ε/4. To obtain the same estimate for I 1 , note that by the associative, distributive and commutative properties of convolution (see [24], Ch. II, §7), We get I < ε/2. Consider II. The function Then there exists n 0 such that for all n > n 0 , sup δ<t<T II < ǫ/2.
For the proof we make use of the following proposition Proposition 5. Let X, Y be complete separable metric spaces, (Ω, F , P ) be a probability space. Let measurable mappings ξ n : Ω → X, h n : X → Y , n ≥ 0, be such that 1) ξ n → ξ 0 , n → ∞, in probability; 2) h n → h 0 , n → ∞, in measure ν, where ν is a probability measure on X; 3) for all n ≥ 1 the distribution P ξn of ξ n is absolutely continuous w.r.t. the measure ν; 4) the sequence of densities { dP ξn dν : n ≥ 1} is uniformly integrable w.r.t. the measure ν. Then h n (ξ n ) → h 0 (ξ 0 ), n → ∞, in probability.
According to Lemma 4, A ij n,t (w) → A ij t (w) as n → ∞, in probability uniformly in t ∈ [0, T ]. This means that h n → h, n → ∞, as elements of C([0, T ]) in measure P . So the second assertion of Proposition 5 is justified. The absolute continuity of the distribution of (ϕ n,t (x)) 0≤t≤T w.r.t. the measure P follows from Girsanov's theorem. The density is defined by the formula T 0 a n (w s (x)) 2 ds .

As
where · is a norm in R d , we have that for each p > 1, T 0 a n (w s (x)) 2 ds = 1 (cf. [17], Theorem 6.1). The uniform integrability of the family { dP ϕn (x) dP : n ≥ 1} follows from the estimate E exp p T 0 (a n (w s (x)), dw s (x)) − 1 2 T 0 a n (w s (x)) 2 ds = E exp p T 0 (a n (w s (x)), dw s (x)) − p 2 2 T 0 a n (w s (x)) 2 ds × T 0 a n (w s (x)) 2 ds = exp (p 2 − p) a n 2 ∞ T ≤ exp (p 2 − p) a 2 ∞ T valid for p > 1. Thus all the assertions of Proposition 5 are fulfilled and we have in probability. The Lemma is proved.

Convergence of the derivatives of solutions
Recall that Y t (x), Y n,t (x), t ≥ 0, x ∈ R d , are the solutions of equations (12), (18), respectively. In this section we show the convergence of the sequence {Y n,t (x) : n ≥ 1} in probability uniformly in t. This together with Lemma 3 allow us to prove Theorem 3.
For the proof we need the following two propositions. The first one is a variant of the Gronwall-Bellman inequality and can be obtained by a standard argument. Proposition 6. Let x(t) be a continuous function on [0, +∞), C(t) be a non-negative continuous function on [0, +∞), K(t) be a non-negative, non-decreasing function, and K(0) = 0. If for all 0 ≤ t ≤ T , Proof. The statement of the Proposition follows from Lemma 1 and inequalities (5), which allow us to obtain the estimates uniform in n ≥ 1.
Making use of the Gronwall-Bellman lemma we get The statement 1) follows now from estimate (24) and Proposition 7.
Consider the first summand in the right-hand side of (26). Put g n (s) = A + n,s (ϕ n (x)), g(s) = A + s (ϕ(x)), and f (s) = Y s (x). Then Lemma 5, Proposition 7, and Proposition 8 provide that sup 0≤u≤t u 0 dA + n,s (ϕ n (x)) − dA + s (ϕ(x)) Y s (x) exp {Var A n,t (ϕ n (x))} → 0 as n → ∞, in probability. Similarly it is proved that the second summand on the right-hand side of (26) tends to 0 as n → ∞. This and statement 1) entail statement 2) of the Lemma.

The proof of Theorem 3
Proof. Define approximating equations by (17), where a n , n ≥ 1, are determined by (15). From Lemma 3 and the dominated convergence theorem we get the relation Arguing similarly and taking into account Lemma 6 we arrive at the relation (28) sup t∈[0,T ] U |Y ij n,t (x) − Y ij t (x)| p dx → 0, n → ∞, almost surely, that is fulfilled for all 1 ≤ i, j ≤ d, p ≥ 0. Since the Sobolev space is a Banach space, relations (27), (28) mean that Y t (x) is the matrix of derivatives of the solution to (1).