The Yamada-Watanabe-Engelbert theorem for general stochastic equations and inequalities ∗

A general version of the Yamada-Watanabe and Engelbert results relating existence and uniqueness of strong and weak solutions for stochastic equations is given. The results apply to a wide variety of stochastic equations including classical stochastic diﬀerential equations, stochastic partial diﬀerential equations, and equations involving multiple time transforma-tions.


Introduction
In the study of stochastic equations, it is common to distinguish between "strong" solutions and "weak" or distributional solutions. Roughly, a strong solution is one that exists for a given probability space and given stochastic inputs while existence of a weak solution simply ensures that a solution exists on some probability space for some stochastic inputs having the specified distributional properties. For example, given a Brownian motion W defined on a probability space (Ω, F, P ), for X to be a strong solution of the Itô equation The issues addressed in these results arise naturally for any stochastic equation and extensions to other settings occur frequently in the literature (1; 12; 13; 18; 21). The goal of the present paper is to give general results that cover all these cases as well as other settings in which these questions have not yet been addressed.
The notion of weak uniqueness requiring that the joint distribution of the solution and the stochastic inputs be uniquely determined used by Engelbert and also by Jacod (12) is required in our extension of Engelbert's result as well, except in the simple setting of Section 2. For Itô equations, however, Cherny (6) has shown that it is sufficient to assume uniqueness in distribution for the solution X. In particular, uniqueness of the distribution of X implies uniqueness of the joint distribution of X and W . That result appears to depend heavily on the explicit construction of W from the solution X.
In Section 2, we consider simple equations of the form Γ(X, Y ) = 0, where Y represents the stochastic inputs and X is the solution. In this setting, strong existence is essentially existence of a measurable selection and the results are straightforward; however, considering the problem in this simple setting helps clarify the definitions and leads to further insight into what is really "going on." In particular, we see that the result has little to do with equations, but is really a simple consequence of the convexity of the collection of joint distributions of solutions.
The Yamada-Watanabe and Engelbert results do not follow from the results in Section 2, because measurability issues are key to the notion of strong uniqueness that is useful for stochastic differential equations. In Section 3, we introduce compatibility restrictions that enforce the necessary measurability and give results that cover stochastic differential equations of the usual form, as well as stochastic partial differential equations and other equations involving infinite dimensional semimartingales. We discover again in this more structured setting that convexity of the collection of joint distributions of solutions is still the foundation of the result and hence the result extends to problems involving inequalities or any other conditions that determine convex subsets of the collection of joint distributions.
For a Polish space S, M (S) will denote the collection of Borel measurable functions, B(S) the bounded, Borel measurable functions, and P(S) the Borel probability measures on S. For an S-valued random variable Y , µ Y ∈ P(S) will denote its distribution.
The author would like to thank Philip Protter for helpful comments on an earlier version of the paper and the referee for pointing out an error in the original proof of Proposition 2.10.

Simple stochastic equations
Let S 1 and S 2 be Polish spaces, and let Γ : S 1 × S 2 → R be a Borel measurable function. Let Y be an S 2 -valued random variable with distribution ν. We are interested in solutions of the equation Stochastic equations, at least equivalent to an equation of the form (2.2), arise in many contexts.
Ordinarily one is attempting to use the stochastic equation to specify a stochastic model and it is really the distribution of X or the joint distribution of (X, Y ) that is of primary interest. Consequently, it is both natural and useful to think of the primary data of the problem to be Γ and ν rather than Γ and Y , and we define a solution of (2.2) to be any pair of random variables (X, Y ) with values in S 1 × S 2 such that Γ(X, Y ) = 0 a.s. and µ Y = ν.
We will say that (X, Y ) is a solution for (Γ, ν) if (2.3) holds.
Clearly, (X, Y ) being a solution is a property of the joint distribution of (X, Y ), and following the terminology of Engelbert (7) and Jacod (12), we refer to the joint distribution of (X, Y ) as a joint solution measure. In particular, µ is a joint solution measure if µ(S 1 × ·) = ν and Without loss of generality, we can assume that Γ is bounded, so integrability is not an issue. Let S Γ,ν ⊂ P(S 1 × S 2 ) denote the collection of joint solution measures. Clearly, S Γ,ν is convex, and if Γ is continuous, then S Γ,ν is closed in the weak topology.
It is natural to hope that a solution for (Γ, ν) will have the property that X = F (Y ) for some measurable F : S 2 → S 1 , that is, Y completely characterizes the randomness in the problem. However, it is easy to see from simple examples that X will not be of this form in general.
is a strong solution if there exists a Borel measurable function F : Existence of a strong solution is essentially the existence of a measurable selection from Γ 0 y = {x : Γ(x, y) = 0}. In particular, if {(x, y) : Γ(x, y) = 0} is closed and ν{y : Γ 0 y = ∅} = 1, there exists a strong solution. See Wagner (19).
Note that (X, Y ) being a strong solution is a property of the distribution of (X, Y ) but that the collection of joint solution measures corresponding to strong solutions need not be convex. In fact, it is convex only if there is at most one strong solution.
Proof. The first statement is essentially just the existence of a regular conditional distribution.
There are several notions of uniqueness that are useful. The strongest notion is pointwise uniqueness (or pathwise uniqueness if S 1 and S 2 are function spaces). Definition 2.3. Pointwise uniqueness holds for (2.3) if X 1 , X 2 , and Y defined on the same probability space with µ X 1 ,Y , µ X 2 ,Y ∈ S Γ,ν implies X 1 = X 2 a.s. Engelbert (7) introduces a slightly weaker notion which in our present context is analogous to the following: Definition 2.4. For µ ∈ S Γ,ν , µ-pointwise uniqueness holds if X 1 , X 2 , and Y defined on the same probability space with µ X 1 ,Y = µ X 2 ,Y = µ implies X 1 = X 2 a.s. Remark 2.5. If µ ∈ S Γ,ν corresponds to a strong solution, then µ-pointwise uniqueness holds. Lemma 2.6. If every solution is a strong solution, then pointwise uniqueness holds.
Proof. Let G 1 and G 2 be functions corresponding to strong solutions and define Then for Y and ξ independent, µ Y = ν and ξ uniform on [0, 1], Alternatively, we could simply observe that µ defined by must be a solution by the convexity of S Γ,ν . Since every solution is a strong solution, it follows that G 1 = G 2 , ν-almost everywhere, and hence pointwise uniqueness holds.
Lemma 2.7. If µ ∈ S Γ,ν and µ-pointwise uniqueness holds, then µ is the joint distribution for a strong solution, and hence, if there is a µ ∈ S Γ,ν that does not correspond to a strong solution, then pointwise uniqueness does not hold.
Proof. Let Y , ξ 1 , and ξ 2 be independent with µ Y = ν and ξ 1 and ξ 2 uniformly distributed on s. implying that µ corresponds to a strong solution.
Corollary 2.8. If S Γ,ν = ∅ and pointwise uniqueness holds, then on any probability space supporting a random variable Y with distribution ν, there exists a unique solution for (Γ, ν) given by a measurable function of Y .
Two notions of uniqueness in law or weak uniqueness are useful. Proof. Clearly, pointwise uniqueness implies µ-pointwise uniqueness for every µ ∈ S Γ,ν . If µpointwise uniqueness holds for every µ ∈ S Γ,ν , then every solution is strong by Lemma 2.7 and pointwise uniqueness follows by Lemma 2.6.
Finally, we show that (d) implies (a). Suppose pointwise uniqueness does not hold. Then there exist X 1 , X 2 , Y defined on the same probability space such that , X cannot have the same distribution as X 1 or X 2 and hence uniqueness in law does not hold.
We close this section with the observation that if we drop any mention of the equation (2.3) and simply require that S Γ,ν be a convex subset of P(S 1 ×S 2 ) such that µ ∈ S Γ,ν implies µ(S 1 ×·) = ν and say that (X, Y ) is a solution for (Γ, ν) if µ XY ∈ S Γ,ν , then all of the definitions continue to make sense and all of the results continue to hold, except for Proposition 2.10. In Proposition 2.10, the equivalence of (a) and (b) continues to hold, (a) implies (c), and (c) implies (d).

Stochastic equations with compatibility restrictions
Let E 1 and E 2 be Polish spaces and let .
The notion of compatibility is essentially (4.5) of Jacod (12). It is central to the extension of the results in Section 2 to stochastic differential equations and other, more general, stochastic equations. Buckdahn, Engelbert, and Rȃşcanu (5) state an equivalent condition in terms of martingales. If Y has independent increments, then compatibility can be restated as an independence condition. Proof of the following lemma is straightforward.
If Y has independent increments, then X is compatible with Y if and only if for each t ≥ 0, We generalize the notion of compatibility in order to allow for stochastic equations involving processes with index sets other than [0, ∞). Note that if B S 1 α is a sub-σ-algebra of B(S 1 ) and X is an S 1 -valued random variable on (Ω, F, P ), then F X α ≡ {{X ∈ D} : Definition 3.3. Let A be an index set and for each α ∈ A, let B S 1 α be a sub-σ-algebra of B(S 1 ) and B S 2 α be a sub-σ-algebra of B(S 2 ). Let Y be an S 2 -valued random variable. An S 1 -valued random variable X is compatible with Y if for each α ∈ A and each h ∈ B(S 2 ), α ∈ A} will be referred to as a compatibility structure and we will say X is C-compatible with Y when we want to emphasize the particular choice of C. ) and B S i t is the σ-algebra generated by the coordinate maps π i s : z ∈ S i → z(s) ∈ E i for s ≤ t, then Definitions 3.1 and 3.3 are the same.
Note that (3.5) is equivalent to requiring that for each h ∈ B(S 2 ),

6)
so compatibility is a property of the joint distribution of (X, Y ).
Definition 3.6. Let C be a compatibility structure for S 1 , S 2 and µ ∈ P( Lemma 3.7. X is compatible with Y if and only if for each α ∈ A and each g ∈ B(B S 1 α ), Proof. Suppose that X is compatible with Y . Then for f ∈ B(S 2 ) and g ∈ B(B S 1 α ), and (3.7) follows.
Lemma 3.8. Let C be a compatibility structure and ν ∈ P(S 2 ). Let S C,ν be the collection of µ ∈ P(S 1 × S 2 ) with the following properties: Then S C,ν is convex.
Proof. Note that the right side of (3.6) is determined by ν, so µ ∈ S C,ν if µ(S 1 × ·) = ν and for each h ∈ B(S 2 ), each α ∈ A, and each f ∈ B(B S 1 α × B S 2 α ). Each of these inequalities is preserved under convex combinations.
In what follows, Γ will denote a collection of constraints that determine convex subsets of P(S 1 × S 2 ), and S Γ,C,ν will denote the convex subset of µ ∈ P(S 1 × S 2 ) such that µ fulfills the constraints in Γ, µ is C-compatible, and µ(S 1 × ·) = ν.

Examples of convex constraints include finiteness conditions
giving a collection of conditions of the form of (3.8).
Similarly, equations involving the quadratic variation [X] of X can be handled by including the constraint lim Equations involving local times L X (t, y) as, for example, in Engelbert and Schmidt (8), Barlow and Perkins (3), and Le Gall (16) can be handled by including constraints of the form Stochastic differential equations driven by Poisson random measures, infinite systems of stochastic differential equations, and stochastic partial differential equations can be formulated in a similar manner.
Remark 3.10. For Example 3.9, we could avoid the limit in (3.8) by applying results of Karandikar (14) that give a Borel measurable mapping Λ : Definition 2.1 continues to apply in the current setting. In the context of stochastic differential equations, a strong solution is sometimes defined to be a solution X that is {F Y t }-adapted. The following lemma shows that under the compatibility restriction, the two definitions are equivalent.
and the adaptedness follows.
In order to take into account the compatibility requirement, we must change the definition of pointwise uniqueness.
Definition 3.12. Let X 1 , X 2 , and Y be defined on the same probability space. Let X 1 and X 2 be S 1 -valued and Y be S 2 -valued.
Pointwise uniqueness holds for compatible solutions of (Γ, ν), if for every triple of processes (X 1 , X 2 , Y ) defined on the same sample space such that µ X 1 ,Y , µ X 2 ,Y ∈ S Γ,C,ν and (X 1 , X 2 ) is jointly compatible with Y , X 1 = X 2 a.s.
The modification of the definition for µ-pointwise uniqueness is similar.
Lemma 3.13. If every µ ∈ S Γ,C,ν corresponds to a strong solution, then pointwise uniqueness holds.
Proof. Using the convexity of S Γ,C,ν , the proof is the same as for Lemma 2.6.
Conversely, the strong solution must give the unique µ ∈ S Γ,C,ν . Consequently, by Lemma 2.2, there exists F : S 2 → S 1 such that µ XY = µ implies X = F (Y ) almost surely, and pointwise uniqueness follows.
The proof of the following generalization of Theorem 2 of Engelbert (7) is similar. If µ-pointwise uniqueness holds for every µ ∈ S Γ,C,ν , then every solution is strong and pointwise uniqueness holds.
Example 3.16. (Spatial birth and death processes.) Equations for birth and death processes of the following form are studied in (10). For t ≥ 0, η t is a counting measure on R d giving the locations of the particles that are alive at time t. λ(x, η t ) denotes the birth rate of a new particle at location x at time t and δ(x, η t ) denotes the death rate of an existing particle located at x, that is, the probability of a new particle being born in the set K in the time interval (t, t + ∆t] is approximately K λ(x, η t )dx∆t for small ∆t and if there is a particle at x at time t, then the probability that it dies in (t, t + ∆t] is approximately δ(x, η t )∆t. For simplicity, assume δ and λ are bounded.
To formulate the corresponding stochastic equation, let ξ be a Poisson random measure on where the x i denote the locations of the initial population. Define where the {τ i } are independent unit exponentials, independent of η 0 and ξ. The birth and death process η = {η t , t ≥ 0} should satisfy

Compatibility then becomes
This identity is simply the requirement that for all D satisfying |D| < ∞ and t, s, r, u ≥ 0.
Example 3.17. (Backward stochastic differential equations.) Let Y be a stochastic process with sample paths in |f (s, X(s), Y (s))|ds] < ∞, and for each 0 ≤ t ≤ T , (3.10) Note that (3.10) is equivalent to  where U and V are required to be {F t }-adapted and W is required to be an {F t }-Brownian motion. Assume that U takes values in R k , V in R l , and W in R m . Any solution of the system will also satisfy the system with F t replaced by F U,V,W t and the requirement then becomes that X = (U, V ) be compatible with Y = (W, U (0)).
Translating the problems into our setting, ν is the joint distribution of (W, U (0)), and the requirements that give the convex constraints are t 0 |σ(s, U (s), V (s))| 2 ds + t 0 |b(s, U (s), V (s))|ds < ∞ a.s., (3.11) in probability, for each 0 < t ≤ T , and . Note that the expression on the right of (3.11) is a Borel measurable function on D R k ×R l ×R m [0, ∞) and the limit will exist (that is, the stochastic integral will exist) provided (U, V ) is compatible with W .
Proposition 4.2 of (1) follows from Theorem 3.15. See also (17).   Kurtz (15) and Holly and Stroock (11) (see Chapters 6 and 11 of (9)) characterize processes as solutions of systems of the form where X = (X 1 , X 2 , . . .), β k (t, X) = β k (t, X(· ∧ t)) ≥ 0, and the Y k are independent Markov processes. Set τ k (t) = t 0 β k (s, X)ds, and for α ∈ [0, ∞) ∞ , define and F X α = σ({τ 1 (t) ≤ s 1 , τ 2 (t) ≤ s 2 , . . .} : s i ≤ α i , i = 1, 2, . . . , t ≥ 0). Let A k ⊂ B(E k ) × B(E k ) be a linear operator with (1, 0) ∈ A k , let ν 0 k denote the distribution of Y k (0), and assume that the distribution of Y k is uniquely determined by the requirement that Y k be a solution of the martingale problem for (A k , ν 0 is a martingale with respect to the filtration {F Y α } and compatibility is equivalent to the requirement that M f 1 ,...,f k be a martingale with respect to {F X α ∨ F Y α } for all k and all f i ∈ D(H i ). τ (t) = (τ 1 (t), τ 2 (t), . . .) is a stopping time with respect to {F X α ∨ F Y α } in the sense that and it follows that a compatible solution of (3.12) is a solution of the multiple random time change problem as defined in Section 3 of (15).