Skorokhod's M1 topology for distribution-valued processes

Skorokhod's M1 topology is defined for c\`adl\`ag paths taking values in the space of tempered distributions (more generally, in the dual of a countably Hilbertian nuclear space). Compactness and tightness characterisations are derived which allow us to study a collection of stochastic processes through their projections on the familiar space of real-valued c\`adl\`ag processes. It is shown how this topological space can be used in analysing the convergence of empirical process approximations to distribution-valued evolution equations with Dirichlet boundary conditions.

The purpose of this paper is to extend the M1 topology to collections of càdlàg processes taking values in the space of tempered distributions or, more generally, in the dual of a countably Hilbertian nuclear space (CHNS).Following the work of Itô [14], in which a central limit theorem was developed for distribution-valued processes, the J1 topology was extended to these spaces by Mitoma [24].These results were extended by Jakubowski to completely regular range spaces [16] and this is the focus of recent work by Kouritzin [19].The advantage of working in the dual of a CHNS (as opposed to some Hilbertian subspace, as explored in [26]) is that these spaces have a strong finite-dimensional character.Consequently, compactness in the path space can be checked test-function-by-test-function: if A is a collection of càdlàg paths taking values in the dual of a CHNS, then A is J1-compact if and only if {f (φ) = (t → f t (φ)) : f ∈ A} is J1-compact in the space of real-valued càdlàg paths, for every test function, φ, in the CHNS [24,Thm. 3.1 & 4.1].Hence the tightness of a sequence of distribution-valued càdlàg processes can be established by projecting down to the familiar space of realvalued càdlàg processes.It should be mentioned that contributions like [16], [7] and [19] also contain results using real-valued projections for more general spaces, but they require establishing point-wise tightness or existence of a limit.
Our aim is to fill the gap in theory between these two settings and to combine the temporal properties of the M1 topology with the spatial properties of the tempered distributions.In Section 2 we construct the M1 topology for càdlàg paths taking values in the dual of a CHNS (Definition 2.6).The corresponding compactness and tightness criteria to those of [24] are stated and proved in Section 3 (Theorems 3.1, 3.2 and 3.3).Finally, we use our tools on a concrete example in Section 4, specifically we prove the tightness of a sequence of discrete empirical-measure processes that approximate the solution of a stochastic evolution equation with Dirichlet boundary conditions.Here, the mass lost at the boundary is a monotone process, hence the M1 topology offers a simple decomposition trick for controlling the fluctuations in the approximating sequence (Proposition 4.2).

Construction of the topology
We refer the reader to [4,17,28,34] for the basic theory of countably Hilbertian nuclear spaces (CHNS) and the standard Skorokhod topologies for Banach range spaces.Our construction will mirror [24] for the J1 topology.
Throughout, E will denote a general CHNS (a specific example being S, the space of rapidly decreasing functions [17,Ex. 1.3.2]).The properties we will use are: • E is a linear topological space generated by an increasing sequence of Hilbertian semi-norms n < ∞, whenever {e m i } i≥1 is an orthonormal system of E m , • The topological duals,  E ).The space of càdlàg paths, D E , is defined to be the collection of functions mapping [0, T ] to E that are right-continuous and have left limits with respect to the strong topology on E .
In practice checking Definition 2.1 is straightforward due to [23].As in the classical case, we define the M1 topology on D E through a (pseudo-)graph distance on E × [0, T ].
The graph of an element in D E is formed by joining up its points of discontinuity with intervals (see Figure 1): Definition 2.2 (Interval).For f and g in E , define the interval between these points to be If h 1 and h 2 are two points in the interval, For (z 1 , t 1 ) , (z 2 , t 2 ) ∈ γ x , we say that (z 1 , t 1 ) ≤ γx (z 2 , t 2 ) if either: Definition 2.4 (Parametric representation).A parametric representation, λ x , of the graph γ x is a continuous surjection λ x : [0, 1] → γ x that is non-decreasing with respect to the graph ordering on γ x .For x ∈ D X , let the collection of all such parametrisations of γ x be denoted Λ x .
We can define a family of pseudometrics on D E by using the family of semi-norms, {p B } B∈B , on E to measure the graph distance between two parametric representations: Definition 2.5 (A family of pseudometrics).Fix B ∈ B. Let x and y be elements of D E .For parametric representations λ x = (z x , t x ) ∈ Λ x and λ y = (z y , t y ) ∈ Λ x define g B (λ x , λ y ) := sup The reader can verify that Definition 2.5 gives a pseudometric using [34,Thm. 12.3.1].

Compactness and tightness characterisations
The following three theorems characterise compactness, tightness and weak convergence in (D E , M1).All notation is as defined in Section 2.
Before proving these results we need the following technical result, which exploits the nuclear space structure of E to enable a generic subset A ⊆ D E to be controlled in terms of its projection onto a finite number of basis vectors.For a normed linear space X, we will denote the M1 modulus of continuity on D X by w X,M1 (x; δ) := sup where trip(t; δ) = {(t 1 , t 2 , t 3 ) : max(0, t − δ) ≤ t 1 < t 2 < t 3 < min(t + δ, T )}.x t −n < ∞.
Then, for every ε > 0, there exists k ≥ 1 and φ 1 , φ 2 , . . ., The proof is a technical computation for which the next lemma will be helpful.This argument is just a repackaging of the Heine-Borel theorem and is adapted from [13, Lem.A.28].
) to be the unit vector in the direction of the pair's minimiser: )∈S , therefore it is possible to take a finite sub-cover, {N v i 1 , v i 2 } i=1,...,k .To complete the proof let Θ = {θ v i 1 , v i 2 } i=1,...,k .
With this fixed choice of ε and M , take then, by normalising v 1 and v 2 by 2c, it follows that there is some i ∈ {1, 2, . . ., k} such that inf 3) fails to be true, then clearly the upper bound above can be replaced by 2cε.
For each i ∈ {1, 2, . . ., k}, φ i can now be constructed by defining ∈ R is the j th coordinate of θ i .Inequality (3.4) therefore reduces to where the second inequality is due to switching the maximum and supremum.Taking a supremum over x ∈ A and repeating the switch once more yields the result.
By taking λ = 1 in the above proof, the corresponding result for the increments of x is given: for every δ > 0 and s ∈ [0, T ].
The first half of the proof of [17, Thm.x t −n < ∞.
Proof of Theorem 3.2.To prove the second statement, assume that (µ n ) is tight on (D E , M1).

Application to empirical processes with Dirichlet boundary conditions
In the remainder of the paper we will show how our machinery can be applied to the problem of approximating stochastic evolution equations through empirical averages of microscopic particles.We will be very concrete and consider a problem from mathematical finance, specifically large portfolio credit modelling.Our analysis will show that (D E , M1) can be a convenient space on which to prove tightness when studying systems with Dirichlet boundary conditions.Define a collection of correlated Brownian motions, {X i,N } i=1,...,N , started on the half-line and evolving with the dynamics where τ i,N := inf{t > 0 : X i,N t ≤ 0}.Here, W, W 1 , W 2 , . . .are independent Brownian motions, {X i } i≥1 are i.i.d. with some density f : (0, ∞) → [0, ∞) and ρ : [0, 1] → [0, 1] is a measurable function.We will not impose any further regularity constraints on ρ in the following tightness calculations.This is a system in which the proportion of particles that have hit the origin determines the correlation in the system.
(To see that such {X i,N } i=1,...,N exist, notice that t → L N t is piecewise constant.Therefore, to construct the discrete system, take N Brownian motions with initial correlation ρ(0), stop the system at the first hitting of zero, restart the system with correlation ρ(1/N ) and repeat.) Our set-up extends the constant correlation model introduced in [8].The motivation for this particular form is to address the correlation skew seen in [8,Sec. 5].To analyse the model, the quantity of interest is the empirical measure of the population: where δ x is the usual Dirac delta measure at the point x ∈ R.
We would like to establish the weak convergence (at the process level) of (ν N ) N ≥1 to some limit ν, which should be the solution of the non-linear evolution equation with test functions φ ∈ S that satisfy φ(0) = 0.This is an example of a (stochastic) McKean-Vlasov equation [30].Proving existence and uniqueness of solutions to this equation would require further regularity constraints on ρ.For now, we will only demonstrate that (ν N ) N ≥1 is tight on the space (D S , M1), where S is the space of tempered distributions.
Notice that, for every t, ν N t is a sub-probability measure, so is an element of S , and for every φ, ν N t (φ) is a real-valued càdlàg function.Therefore ν N has a version that is càdlàg, by [23], so D S is an appropriate space to work with.By Theorem 3.2, it suffices to show ν N (φ) is tight in (D R , M1) for every φ ∈ S, and for that it is sufficient to verify [34, Thm.12.12.3], the first condition of which is trivial since |ν N t (φ)| ≤ φ ∞ .For demonstrating that the second condition of [34, Thm.12.12.
for all N ≥ 1, η > 0, and 0 ≤ t 1 < t 2 < t 3 ≤ T , and lim Here The challenge in working with (ν N ) N ≥1 is the discontinuity presented by the absorbing boundary at the origin.For the constant ρ case, the authors of [8] use explicit estimates based on 2d Brownian motion in a wedge [15,22] to control for the boundary effects.With the more complicated interactions in our present model such methods seem intractable, however, the M1 topology provides an alternative approach.
Introduce the process νN ∈ D S defined by .
This has the advantage of being continuous (in time), so its increments are easy to control, and it can be related to ν N through the simple fact νN t (φ) = ν N t (φ) + φ(0)L N t , for every φ ∈ S. Since L N is monotone increasing, the final term is zero for t 1 < t 2 < t 3 , and this completes the proof.
Our trick makes the remainder of the tightness proof routine:

Lemma 3 . 4 (
Controlling the modulus of continuity).Let p > n be such that the inclusion E p → E n is Hilbert-Schmidt and let A ⊆ D E be such that c := sup x∈A sup t∈[0,T ]

2 . 4 . 4 ]
does not depend on the choice of temporal topology, hence we have p > n for which A ⊆ D E−n ⊆ D E−p , E p → E n is Hilbert-Schmidt and c := sup x∈A sup t∈[0,T ] For every p ≥ 0, D E−p is a Polish space, so [29, Sec. 3, Def.2, Ex.1] implies that D E−p is a topological Radon space.From [29, Sec. 3, Ex.4] (D E , M1) is a topological Radon space, hence every probability measure on (D E , M1) is a Radon measure.By Theorem 3.1 (iii), every compact subset of (D E , M1) is metrizable, therefore [29, Sec. 5, Thm.2] completes the proof of the second result, since (D E , M1) is completely regular (Proposition 2.7 (iii)).The first part of the theorem follows from the work in the proof of Theorem 3.1 and [17, Thm.2.5.1].Proof of Theorem 3.3.Since the Borel and Kolmogorov σ-algebras on (D E , M1) coincide (Proposition 2.7 (iv)), the result follows by [24, Prop.5.1].
strong topology of E is that generated by the collection of semi-norms {p B : E → [0, ∞)} B∈B , where p B (f ) := sup x∈B |f (x)| and B is the collection of bounded subsets of E.
and t 1 , t 2 , t 3 ∈ [0, T ].Then, recalling the notation of (3.1),To construct the finite family of vectors, first notice that E −n ⊆ E −p , so for an orthonormal basis, {e p i } i≥1 , of E and the triangle inequality gives that y λ −n ≤ c, whenever x ∈ A. p y λ (e p i ) ≤ y λ −n e p i n ≤ c e p i n , for x ∈ A. Therefore if M is chosen large enough so that M .By introducing the R M vectors