A particle system with mean-field interaction: Large-scale limit of stationary distributions

We consider a system consisting of $n$ particles, moving forward in jumps on the real line. System state is the empirical distribution of particle locations. Each particle ``jumps forward'' at some time points, with the instantaneous rate of jumps given by a decreasing function of the particle's location quantile within the current state (empirical distribution). Previous work on this model established, under certain conditions, the convergence, as $n\to\infty$, of the system random dynamics to that of a deterministic mean-field model (MFM), which is a solution to an integro-differential equation. Another line of previous work established the existence of MFMs that are traveling waves, as well as the attraction of MFM trajectories to traveling waves. The main results of this paper are: (a) We prove that, as $n\to\infty$, the stationary distributions of (re-centered) states concentrate on a (re-centered) traveling wave; (b) We obtain a uniform across $n$ moment bound on the stationary distributions of (re-centered) states; (c) We prove a convergence-to-MFM result, which is substantially more general than that in previous work. Results (b) and (c) serve as ``ingredients'' of the proof of (a), but also are of independent interest.


Introduction
We consider a system consisting of n particles, moving forward on the real line. The particles move in jumps.
The system state at a given time is the current empirical distribution of particle locations. Each particle gets "urges to jump" as an independent Poisson process of constant rate. However, a particle getting a jump urge actually jumps with the probability given by a decreasing function of the particle's location quantile within the current state (i.e., empirical distribution); hence this a mean-field type of particles' interaction with each other. When a particle does jump, the jump size is independent, distributed as a random variable Z > 0. We are interested in the system behavior when n is large.
This model was introduced in [5,6] as an idealized model of distributed parallel simulation. In this case n particles represent n processors ("sites") simulating different part of some large system, and a particle location is the current "local simulation time" of the corresponding processor. The following types of questions are of interest, as n becomes large: how the local times of the processors progress over time; do local times "stay closely together;" does the evolution of the empirical distribution of local times becomes that of a traveling wave; etc. This model, and similar models, are motivated by other applications as well (including recent applications, such as blockchains), where, roughly speaking, a distributed synchronization of a large number of sites is of interest; cf. [1,[10][11][12][13][14][15], and references therein, for examples of synchronization models.
There are two lines of work on the particle system described above. The first is in paper [5], where it was shown that, under certain additional conditions, as n → ∞, the system random dynamics converges to that of a deterministic mean-field model (MFM), which is a solution to an integro-differential equation. There are several additional assumptions made in [5], one of which is especially restrictive, in that the proof technique crucially relies on it -a particle jump probability depends on the current locations of K other particles chosen uniformly at random, where K > 0 is a fixed parameter, same for all n. We will refer to this additional assumption as the finite-dependence assumption, and it substantially restricts the more general model of this paper.
The second line of work is represented by papers [6,20], where formally defined mean-field models (solutions to an integro-differential equation) are studied. The results of [6] prove, in particular, that if an MFM that is a traveling wave exist, then "typically" the traveling wave is unique and MFM trajectories are attracted to it as time increases; the question of a traveling wave existence under general assumptions was left open in [6]. Paper [20] proves the existence of a traveling wave, under very general assumptions. (We also note that, for some MFMs that are different from the one in this paper, the existence and explicit forms of the traveling waves are obtained in [1,7,8], in some special cases of the jump size distribution.) The combination of the results of [5,6,20] strongly suggests the following asymptotic property of the stationary distributions: as n → ∞, the stationary distributions of (re-centered) states concentrate on a (recentered) traveling wave. This property is stated as Conjecture 7.1 in [20,Section 7].
Our main results are: (a) We prove (Theorem 1) that, as n → ∞, the stationary distributions of (re-centered) states concentrate on a (re-centered) traveling wave. (This proves Conjecture 7.1 in [20,Section 7], under a slightly stronger assumption on the jump size, namely EZ 2+χ < ∞, χ > 0, as opposed to EZ 2 < ∞.) (b) As a key "ingredient" of the proof of (a), we obtain (Theorem 2) a uniform across n moment bound on the stationary distributions of (re-centered) states. This result is also of independent interest.
(c) We prove (Theorem 3) the convergence-to-MFM result in our general setting, without the additional finite-dependence assumption. (This substantially generalizes the result of [5]. The proof largely follows the approach used in [1], for a different model. The approach is more generic than that in [5]; in particular, it does not rely on the finite-dependence assumption.) This result is another ingredient of the proof of (a), but is also of independent interest.
The proof of (a) relies on (b) and (c), and on the results in [6,20] on the existence/uniqueness of -and attraction to -traveling waves.
2 The model and main results.
The particle system model is as follows. There are n particles, moving in the positive direction ("right") on the real axis R. Each particle moves in jumps, as follows. For each particle there is an independent Poisson process of rate µ > 0 of "jump urges." When a particle gets an urge to jump, it actually jumps, to the right, with probability η n (ν), where ν is its quantile in the current empirical distribution of the particles' locations; that is, ν = ℓ/n when the particle location is ℓ-th from the left. With complementary probability 1 − η n (ν) the particle does not jump. To have the model well-defined, assume that quantile-ties between co-located particles are broken uniformly at random. We adopt the convention that function η n (ν) is defined for a continuous argument ν ∈ [0, 1], by assuming that it is constant in each interval ((ℓ − 1)/n, ℓ/n] for ℓ = 1, . . . , n and η n (0) = 1. Assume that, for each n, function η n (ν), 0 ≤ ν ≤ 1, is non-increasing, and that, as n → ∞, it uniformly converges to a continuous, strictly decreasing function η(ν), 0 ≤ ν ≤ 1, with η(0) = 1, η(1) = 0. The jump sizes, when a particle does jump, are given by i.i.d. non-negative r.v. with CDF J(y), y ≥ 0; we denote byJ(y) = 1 − J(y) the complementary CDF; a generic jump size is given by the r.v. Z ≥ 0. Without loss of generality, we can and will assume that Z > 0, i.e. J(0) = 0. We will use notation for the ℓ-th moment of a jump size. (So, m (1) = EZ is the mean jump size.) For some (not all) of our results we will need the following two additional conditions on the jump size distribution: and J(·) is absolutely continuous with density J ′ (y), bounded away from 0 on compact subsets of R + . (3) Note that, without loss of generality (WLOG), we can and do assume µ = 1; otherwise, we can achieve this condition by rescaling time. Also, if m (1) = EZ < ∞, we can and do assume WLOG that m (1) = 1; otherwise, m (1) = 1 is achieved by rescaling space.
Let f n (t) = (f n x (t), x ∈ R) be the (random) empirical distribution of the particle locations at time t; namely, f n x (t) is the fraction of particles located in (−∞, x] at time t. As n → ∞, it is very intuitive that f n x (t) converges (in appropriate sense, under appropriate conditions) to a deterministic function f x (t), such that f (t) = (f x (t), x ∈ R) is a distribution function for each t, and the following equation holds: where d y means the differential in y (and recall that µ = 1 WLOG). We call a function f x (t) satisfying (4) a mean-field model. (The formal meaning of (4) and the definition of a mean-field model will be given later in Definition 9 in Section 6.3.) The intuition for (4) is as follows. For each t, the distribution f (t) = (f x (t), x ∈ R) approximates the distribution of particles f n (t) when n is large. Since particles move right, f x (t) is non-increasing in t for each x. So, the partial derivative (∂/∂t)f x (t) is non-positive and it should be equal to the RHS of (4), which gives the instantaneous rate (scaled by 1/n and taken with minus sign) at which particles jump over point x at time t.
It is known [6,20] that, as long as m (1) < ∞ (or, m (1) = 1 WLOG), for any mean-field model, the speed at which the mean xd is called a traveling wave shape (TWS) if f x (t) = φ x−vt is a meanfield model. By substituting into (4), we see that any TWS φ must satisfy equation It is known [20] that a TWS φ exists as long as m (2) < ∞ (no other assumption on the jump size distribution is required), and it is unique (up to a shift) if, in addition, (3) holds.
We now informally state the main results of this paper. (The formal results will be given later, in Section 4, after introducing more notation.) Letf n (t) denote the distribution f n (t), re-centered so that the distribution mean is at 0.

Basic notation
The set of real numbers is denoted by R, and is viewed as the usual Euclidean space. As a measurable space, R is endowed with Borel σ-algebra. A function h of x may be written as either h(x) or h x . Notation d x h(x, t) for a multivariate function h(x, t), where x ∈ R, means the differential in x.
Denote by M the set of scalar RCLL non-decreasing functions f = (f (x), x ∈ R), which are (proper) probability distribution functions, i.e., such that f (x) ∈ [0, 1], lim x↓−∞ f (x) = 0 and lim x↑∞ f (x) = 1. For elements f ∈ M we use the terms distribution function and distribution interchangeably. Space M is endowed with Levy-Prohorov metric (cf. [4]) and the corresponding topology of weak convergence (which is equivalent to the convergence at every point of continuity of the limit); the weak convergence in M is denoted w →. Note that, for f, φ ∈ M, the L 1 -norm of their difference, f −φ 1 , is equal to the Wasserstein W 1 -distance between the corresponding two distributions. The inverse (ν-th quantile) of f ∈ M is f −1 (ν) .
Unless explicitly specified otherwise, we use the following conventions regarding random elements and random processes. A measurable space is considered equipped with a Borel σ-algebra, induced by the metric which is clear from the context. A random process Y (t), t ≥ 0, always takes values in a complete separable metric space (clear from the context), and has RCLL sample paths. For a random process Y (t), t ≥ 0, we denote by Y (∞) the random value of Y (t) in a stationary regime (which will be clear from the context). Symbol For a distribution f ∈ M and scalar function h(x), x ∈ R, f h .
For scalar functions h(x), x ∈ X , with some domain X , h = sup x∈X |h(x)| is the sup-norm. When G k , G are operators mapping the space of such functions into itself, lim G k h = Gh and G k h → Gh mean the uniform convergence: Suppose we have a Markov process with state space X and transition function P t (x, H), t ≥ 0. A measurable set X ⊆ X is called small (cf. [2]), if there exist constants T > 0 and δ > 0, and probability distribution α(·) on X , such that for any x ∈ X and any measurable H ⊆ X , P T (x, H) ≥ δα(H). P t , as an operator, where h is a scalar function with domain X ; I = P 0 is the identity operator. The process (infinitesimal) generator B is Function h is within the domain of the generator B if Bh is well-defined. We say that a Markov process is stable if it is positive Harris recurrent (cf. [2]); if it is, it has unique stationary distribution.
For real numbers a and b we use notations: sign(a) = min{a, b}. RHS and LHS mean right-hand side and left-hand side, respectively; WLOG means without loss of generality. Abbreviation w.r.t. means with respect to; a.e. means almost everywhere w.r.t. Lebesgue measure. For an element f ∈ M, denote byf the mean of the corresponding distribution, with the usual convention that the mean is well-defined and finite when both integrals in the RHS are finite.
If we denote by M (n) ⊂ M the state space of the process f n (·), thenM (n) = M (n) ∩M is the state space off n (·). If we use notation for the ℓ-th absolute moment of f , then we obviously have Theorem 1. Suppose conditions (2) and (3) hold. Then, as n → ∞, where φ is the unique TWS withφ = 0. Moreover, φ is such that and a stronger form of convergence (6) holds: Proof of Theorem 1 is in Section 7.
Theorem 2. Suppose conditions (2) and (3) hold. Then there existC > 0 andn such that, for all n ≥n the Markov processf n (·) is stable and we have Proof of Theorem 2 is in Section 5.

Transient behavior: convergence to a mean-field model
Essentially all our asymptotic results on the transient behavior of the processes are for the non-centered processes f n (·). The result for the centered processes (Theorem 3(ii)) is obtained essentially as a corollary.
Denote by L (n) the generator of the process f n (·). For any h ∈ C b , function f n h of f n is within the domain of L (n) (where we use the fact each function in C b is constant outside a closed interval), and where the expectation is over the distribution of the random jump size Z. We also formally define the "limit" of L (n) as , and f (0) ∈ M. (Note that we do not assume that f (0) has well-defined meanf (0).) Assume that conditions m (1) = EZ < ∞ (or, m (1) = EZ = 1 WLOG) and (3) hold. Then we have: The dependence of f (·) (as an element of space D([0, ∞), M) with J 1 -convergence topology) on f (0) (as an element of M with weak convergence topology) is continuous.
Proof of Theorem 3 is in Section 6. Also in Section 6 we show (Theorem 10) that solutions to (10) are exactly the mean-field models (Definition 9). We note that many of the supplementary results in Section 6, which may be of independent interest, require assumptions on the jump size distribution that are much weaker than conditions m (1) = EZ < ∞ and (3). In particular, some of those result assume nothing about the jump size distribution besides it being a proper distribution.
5 Proof of Theorem 2

Equivalent view of processf n (·).
Statef n (t) can be equivalently described as w n (t) = (w 1 (t), w 2 (t), . . . , w n (t)) ∈ R n , where w 1 (t), w 2 (t), . . . , w n (t) are the locations of the n particles w.r.t. the meanf n (t), listed in a non-decreasing order. (So, the average (1/n) i w n (t) = 0 at all times.) From now on for eachf n ∈M (n) we will consider the corresponding vector w n = (w 1 , w 2 , . . . , w n ), and vice versa. Any function off n ∈M (n) may be expressed via the corresponding w n , and vice versa. In particular, Note that the topology onM (n) , induced by the (weak convergence) topology on M, is equivalent to the usual topology of component-wise convergence of the corresponding vectors w n .
The evolution of w n (t) is as follows. Between the times of the jump urges, w n (t) remains constant. At a time t of a jump urge, the following occurs. Let κ i (t) be the actual jumps size of particle i, in the system without re-centering, upon this urge; κ i (t) ≥ 0 and can be non-zero for at most one particle. Then, in the re-centered system, particle i jump size (i.e., the increment of w i (t)) at t is ζ i = κ i (t) − s κ s (t)/n (which may be positive or negative). After the jumps (if any) at t occur, the particles indices i are changed, if necessary, to keep w i (t) non-decreasing in i.

Informal intuition for the proof.
The proof of stability, in Subsection 5.3, uses fluid limit technique and is fairly straightforward. Let us discuss the intuition for the proof of the bound (9) in Subsection 5.4.
At a very high level, the bound (9) is due to the fundamental property of the system, which can be called the "egalitarian trend:" in re-centered system, the particles at high quantiles (large i) will have a negative drift, while particles at low quantiles (small i) will have positive drift, thus preventing the centered empirical distributionf n from "spreading out." To obtain the bound on the expected (1 + χ)-th moment off n , we need the finite (2 + χ)-th moment on a jump size. Informally speaking, we use Φ 2+χ (f n ) as a Lyapunov function. If B (n) is the generator of process f n (·), then, "generally speaking," we have EB (n) Φ 2+χ (f n (∞)) = 0; in other words, the expected drift of Φ 2+χ (f n (t)) in steady-state is 0. Function where ζ is the random jump size of particle i (in re-centered system) upon a jump urge when the state is f n , can be thought of as the "first-order approximation of the generator B (n) , applied to function Φ 2+χ (f n );" note that the derivative |w 2+χ for some constant ǫ > 0; this is where we use the egalitarian trend property, which ensures that Eζ (f n ) i is negative [resp., positive] for particles at high [resp., low] quantiles. From here we can obtain, informally speaking, G which holds for some K > 0 and allf n . Taking into account the fact that G , but only its first-order approximation, and doing the corresponding estimates, we obtain, informally speaking, Taking expectation w.r.t.f n (∞), we obtain, informally speaking, which yields (9).
In the actual proof, instead of Φ 2+χ (f n ) we use its truncated version Φ as the Lyapunov function, because the latter is certainly within the domain of generator B (n) . And then let C 1 ↑ ∞.
We note that, this "program" for proving a property of the type of (9) is likely applicable to other models having the egalitarian trend property, while the technical details may differ.
To prove stability we will use the equivalent representation w n (·) of the processf n (·), given in Subsection 5.1.
i.e. the set of those vectors in R n , corresponding tof n ∈M (n) . The norm of a state w n = (w 1 , . . . , w n ) ∈ W (n) is w n = max i |w i |. Using (3), it is straightforward to see that, for any fixed a > 0, the closed set W (n) (a) = {w n ∈W (n) | w n ≤ a} is small (see the definition in Section 3).
Consider a sequence of versions of the process w n (·), namely processes w n,k (·) with increasing norm of the initial state, w n,k (0) = c k ↑ ∞, k → ∞, with w n,k (0)/c k → w(0) = (w 1 (0), . . . , w n (0)). Given that W (n) (a) is a closed small set for any a > 0, to establish stability it suffices to show that, for some fixed T > 0, w n,k (T ) /c k ⇒ 0. This in turn follows from the fact that the limit (in appropriate sense) w(·) = (w 1 (·), . . . , w n (·)) of the sequence of processes w n,k (c k t)/c k , t ≥ 0, has trajectories such that w(t) ∈W (n) for all t ≥ 0, w(0) = 1 and (d/dt) max w i (t) ≤ −ǫ < 0 as long as max w i (t) > 0; therefore, w(t) = 0 for all t ≥ 1/ǫ. We omit further details, which are straightforward. ✷

Proof of (9)
At some level, this proof is similar to the proof of an analogous result in [19] for a different particle system. However, the difference of our model from that in [19] is substantial, so we give full details of the proof for our model.

Consider the following function
where ζ (f n ) i is the random jump size (which can have any sign) of particle i upon a jump urge when the state isf n . (The sizes ζ (f n ) i are dependent across i, of course.) As we will see, function G (n) 1+χ (f n ) can be thought of as the "first-order approximation of the generator B (n) of processf n (·), applied to function Φ 2+χ (f n );" but we do not even claim that Φ 2+χ (f n ) is within the generator B (n) domain. Note that, for each n, G Note that each Eζ is the quantity of the order O(1/n), which motivates the definition We observe that, if particle i location w i is the ℓ-th from the left, and it is not co-located with any other particle, thenζ is the average drift of the mean of the non-centered particle system. (Clearly, lim n v n = v.) In the more general case, when exactly k particles are co-located particles -namely ℓ-th, (ℓ + 1)-th, ..., (ℓ + k − 1)-th left-most particles are co-located -and particle i is one of them, we havē We define the functionζ (f n ) (x), x ∈ R, as follows: , where i is the particle whose location w i is the closest to x on the left; we also adopt a convention that, if w i is the location of the left-most particle, for all x < w i . Clearly, functionζ (f n ) (x) is a piece-wise constant non-increasing function.
We can write: or, by "integrating over the values ν off n x " and using (13), The inequality in (15) follows by the following argument. Denote and observe that, as n → ∞, lim To be specific, consider the case when Then we have where the last inequality holds because |[f n ] −1 (ν)| 1+χ is non-increasing in [0, ν n * ]. Thus, (15) is proved.
Next, we claim the following property: there exists a sufficiently large C > 0 and some ǫ > 0, such that, uniformly in all sufficiently large n and allf n ∈M (n) with Φ 1+χ (f n ) ≥ C, The proof of (16) is given in Section 5.5.
From (16) and (15) we obtain that, uniformly in all sufficiently large n, Denote by Φ Given that this is a continuous bounded function off n ∈M (n) , and the process of jump urges is Poisson, it is not hard to see that Φ Next, we claim the following fact: there exist C 2 > 0 such that for any fixed C 1 > 0, uniformly in all large n andf n such that Φ 2+χ (f n ) ≤ C 1 , we have and then with C 3 = ǫC + C 2 . The proof of (18) is given in Section 5.6.

From (19) and inequality
, which holds for a sufficiently large fixed C 4 > 0, we obtain where Bound (20) in turn implies that for any fixed C 1 > 0, because, obviously, Recalling thatf n (∞) is the random value off n (t) in the stationary regime, we have for all large n: Letting C 1 ↑ ∞, we finally obtain that EΦ 1+χ (f n (∞)) ≤ 2C 5 /ǫ for all sufficiently large n, and then EΦ 1+χ (f n (∞)) ≤C holds for all n for some largeC > 0. ✷

Proof of (16).
The definition ofζ i =ζ in (11) can be interpreted as follows:ζ i is the expected jump size of particle i, conditioned on this particle receiving the jump urge, and then centered by the expected jump size of any particle upon a jump urge in the system. The proof is by contradiction. Suppose property (16) does not hold. Then, we can and do choose a subsequence of n → ∞, and correspondingf n , so that along this subsequence Φ 1+χ (f n ) ↑ ∞ and

Consider separately two cases (a) lim inf
Case (a).
Consider a subsequence off n such that lim n Φ 1 (f n ) = c < ∞ and, moreover,f n w → f , where f is a proper distribution.
For a fixed 0 < δ < 1/2 denote Using the facts that f is proper and Φ 1+χ (f n ) → ∞, we easily see that, for any fixed δ > 0, Pick a sufficiently small δ > 0 such that, for some δ 1 > 0 and all large n, η n (δ) − v n > δ 1 and Denote c n = Φ 1 (f n ) ↑ ∞, and consider the sequence of rescaled versions off n , namelỹ f n x =f n cnx , w ∈ R.
So, the remaining case to consider is (b.2).
For an f ∈ M, let us formally define a "limiting version" of the functional G (n) 1+χ (f n ) defined in the LHS of the inequality in (15): Note that Φ 1+χ (f n ) ≥ 1, so that 0 < c < ∞. Consider a subsequence off n such that lim n Φ 1+χ (f n ) = c andf n w → f ∈ M. Distribution f cannot be concentrated at a single point. (Otherwise, since Φ 1 (f n ) = 1 for all n, Φ 1+χ (f n ) could not remain bounded.) Therefore, |G 1+χ (f )| > 0, and then lim inf |G
Consider a fixed statef n and consider the expected increment ∆ of Φ 2+χ (f n ) upon a jump urge, which occurs in this state. Then, using (25), where expectation E is with respect to the uniform selection of the particle receiving the jump urge, the random event of it actually jumping, and the randomness of the jump size (if it occurs).
For ζ i we have: where κ i = κ i (f n ) the (random) jump size of particle i. Note that Eζ i is the quantity of the order O(1/n), since s κ s is of order O(1) and Eκ i is of order O(1/n) (because 1/n is the probability of particle i receiving the jump urge). Therefore,ζ i = nEζ i = O(1), and we can write: Next, and therefore Eζ 2+χ for some C 7 > 0 because E( s κ s ) 2+χ is upper bounded by the (2 + χ)-th moment of a jump size and Eκ 2+χ i is upper bounded by 1/n times the (2 + χ)-th moment of a jump size. Note that (26) holds for χ = 0 as well, with possibly different C 7 . Therefore, by choosing C 7 sufficiently large, we have both (26) and Assembling these bounds, we obtain where, recall, ∆ is the expected increment of Φ 2+χ upon a jump urge occurring in a fixed statef n , and Now consider the value of the generator 2+χ (f n (t)) over a small interval [0, t/n], withf n (0) =f n . First, note that, as t ↓ 0, the contribution into this expected increment of the event that more than one jump urge occurs, is o(t). (Because jump urges follow a Poisson process of rate n, and Φ (C1) 2+χ is bounded.) With probability t + o(t) there will be exactly one jump urge in [0, t/n], which therefore occurs into the statef n (such that Φ 2+χ (f n ) ≤ C 1 ); then, the expected increment of Φ (C1) 2+χ will not exceed that of Φ 2+χ . Using these observations and the estimate (28), we obtain (18). We omit the remaining straightforward ǫ/δ formalities. ✷

Proof of Theorem 3
This proof largely follows the approach used in [1], for a different model. Unlike in [1], we work with Levy-Prohorov metric (inducing the weak convergence topology) on M, as opposed to the stronger Wasserstein W 1 -metric. This, in fact, simplifies some parts of the proof in our case; we will point out those parts as they appear. However, some other parts of our proof of Theorem 3 are completely different from (or not present in) the development in [1]. They are: Section 6.3 and Theorem 10, which establish the equivalence between solutions to (10) and mean-field models; Theorem 11, which establishes the uniqueness of a mean-field model and its continuous dependence on the initial state.
We note that many of the supplementary results in this section, which may be of independent interest, require assumptions on the jump size distribution that are much weaker than conditions m (1) = EZ < ∞ and (3). In particular, some of the result assume nothing about the jump size distribution besides it being a proper distribution. We will emphasize such weaker assumptions, where applicable, in the results' statements.

C-relative compactness of the processes.
A sequence of random processes with sample path in the Skorohod space D([0, ∞), R) [resp., D([0, ∞), M)] is called C-relatively compact (see [1,16]) if it is: (a) relatively compact, i.e. its any subsequence has a further subsequence converging in distribution to some limiting process; and (b) any such limiting process has continuous sample paths, a.s. Proof. This result is analogous to Theorem 6.9 in [1]. Note that, although the model in [1] is different from ours, the only property that is used in the proof of Theorem 6.9 in [1] is that the rate of jumps of each particle is upper bounded by a finite constant a at all times. The later property, obviously, holds for our model as well, with a = µ = 1. Therefore, the proof of Theorem 6.9 in [1] applies essentially verbatim, with the following adjustments.
What in the proof of Theorem 6.9 in [1] are f, µ n , a, in our notation are h, f n , µ = 1, respectively. In the proof, x i (t) denote the locations of the particles, uniquely determined by the process state at time t, and vice versa. The notationx i (t) is used for the locations of particles in an artificial system, with the same initial particle locationsx i (0) = x i (0), but such that each particle jumps every time it gets a jump urge; the artificial system is coupled to the original one in the natural way, so that the corresponding particles have common processes of jump urges and common jump sizes (if the particle in the original system happen to jump at a jump urge). Clearly, with this coupling, x i (t) ≤x i (t) and Proof. This result is analogous to Corollary 6.10 in [1], with essentially same proof. In fact, in our case the proof is simpler. We need to verify conditions (i) and (ii) of Theorem II.4.1 in [16], which in our case take the following form.
(i) For any T > 0 and ǫ > 0, there exists K > 0 such that (ii) For any h ∈ C b the sequence of processes {f n (·)h} is C-relatively compact in the Skorohod space D([0, ∞), R). (Note that the class of functions h ∈ C b is separating, which means that a probability measure g is uniquely determined by the values of gh for h ∈ C b .) Condition (ii) is verified by Theorem 4. The verification of condition (i) repeats the proof of Corollary 6.10 in [1], essentially verbatim, up to and including the display where Markov inequality is used for the first time.
At that point it remains to observe that the probability in the RHS of the display can be made arbitrarily small by making K sufficiently large. The measure µ n (t, ·) in the proof of Corollary 6.10 in [1] is in our notation the measure (distribution) f n (t) on R; the particle locations x i (t) andx i (t) in the coupled original and artificial systems, are as described above in the proof of our Theorem 5. ✷

Trajectories of a limit satisfy (10)
For trajectories f n (·) ∈ D([0, ∞), M) with f n (t) ∈ M (n) for all t ≥ 0, let us define the following functional for each h ∈ C b and t ≥ 0: We will also formally define a "limit version" of A n t,h for trajectories f (·) ∈ D([0, ∞), M), for each h ∈ C b and t ≥ 0: Lf (s)hds.
Jump size distribution only needs to be proper.) Proof. The proof repeats the proof of Theorem 6.11 in [1], in which we replace: L by L (n) ; f by h; µ n by f n ; a by µ = 1. In particular, in our case, I n (t) = f n (t)h = h(x)d x f n x (t) = (1/n) i h(x i (t)), so that LI n (t) is replaced by where the expectation in the integrand is over a random jump size Z. The martingale M n (t), t ≥ 0, in our case is M n (t) = A n t,h (f n (·)), so that LM 2 n (t) in our case is L (n) M 2 n (t). The last line of the last display of the proof in [1] can be removed, and the final estimate L (n) M 2 n (t) ≤ 4 h 2 /n for h ∈ C b can be observed without that line (because the jump urge rate of each particle is µ = 1).
Finally, note that in our theorem we only need to consider h ∈ C b . We do not need to consider the identity test function h(x) = x, and that is why in Theorem 6 we do not need condition EZ 2 < ∞ -it suffices that Z has a proper distribution). ✷ Then, for every t ≥ 0 and any h ∈ C b , (Note that we do not assume that f (0) has well-defined meanf (0), or (3), or (2), or even m (1) = EZ < ∞.
Jump size distribution only needs to be proper.) Proof. The statement of this theorem is analogous to that of Theorem 6.12 in [1]. However, the proof in our case is simpler, and is as follows. We know that, by Theorem 5, a.s. the limiting process f (·) has continuous trajectories in D([0, ∞), M). We can use Skorohod representation to construct all processes on a common probability space so that, w.p.1, f n (·)

J1
→ f (·) as n → ∞; moreover, by the continuity of f (·), we see that f n (t) → f (t) uniformly on compact sets of t; we also see that, w.p.1, f x (t) is non-increasing in t (because so are f n x (t)). Then, it is easy to see (using, in particular, the facts that convergence η n (·) → η(·) is uniform, and η(·) is strictly decreasing continuous) that A n t,h (f n (·)) → A t,h (f (·)) w.p.1 ✷ , and f (0) ∈ M. Then the sequence of processes {f n (·)} is such that its any distributional limit f (·) in D([0, ∞), M) is such that, a.s., f (·) is continuous and it satisfies A t,h (f (·)) = 0 (i.e., (10)) for every t ≥ 0 and any h ∈ C b . (Note that we do not assume that f (0) has well-defined meanf (0), or (3), or (2), or even m (1) = EZ < ∞. Jump size distribution only needs to be proper.) Proof. By Theorem 5, there exists a subsequence of {f n (·)}, which converges in distribution to a process f (·) with a.s. continuous trajectories. By Theorems 6 and 7, this process must satisfy A t,h (f (·)) = 0 for every t ≥ 0 and any h ∈ C b . ✷ 6.3 Equivalent characterization of solutions to (10) as mean-field models. Definition 9. A function f x (t), x ∈ R, t ∈ R + , will be called a mean-field model if it satisfies the following conditions. (a) For any t, f (t) = (f x (t), x ∈ R) ∈ M.
(b) For any x, f x (t) is non-increasing c-Lipschitz in t, with constant c independent of x. (c) For any x, for any t where the partial derivative (∂/∂t)f x (t) exists (which is almost all t w.r.t. Lebesgue measure, by the Lipschitz property), equation holds.
Note that, by the change of variable y = f −1 (ν) in the integral in the RHS of (29), equation (29) can be equivalently written as where we use notations where ν 2 = f y (t), ν 1 = f y− (t). Equation (30) (or (29)) is a more general form of (4), allowing f x (t) to be RCLL in x, rather than continuous. If f u (t) is continuous at u = y, thenη(y, f (t)) = η(f y (t)); if f u (t) has a jump at u = y thenη(y, f (t)) is η(ν) averaged over ν ∈ [f y− (t), f y (t)]. In paper [20], the equation in form (30) is used to define a mean-field model.
This means that f (t)h = f x (t) is absolutely continuous in t, with the derivative equal to (∂/∂t)f x (t) = L[f (t)h] a.e. in t. This, in particular, implies that, for any fixed y, the f y (t) − f y− (t) is continuous in t (in fact, Lipschitz); this, in turn, means that, possible discontinuity points y of f y (t) "cannot move" in time t. We can then conclude that, for any x, the derivative (∂/∂t)f x (t) = L[f (t)h] is in fact continuous in t. Therefore, (∂/∂t)f x (t) = L[f (t)h] at every t. It remains to observe that L[f (t)h] is exactly the RHS of (29).
'If.' The definition of a mean-field model f (·) implies that (10) holds for the defined above step-function h for any x. Then, we have (10) for any h, which is piece-wise constant with finite number of pieces; and the set of such functions h is tight within the space of test functions h ∈ C b , equipped with uniform metric. Then (10) holds for any h ∈ C b . ✷ 6.4 Uniqueness of solution to (10) and continuity in initial state. Proof of Theorem 3(i).
Theorem 11. Assume that m (1) = EZ < ∞ (or, m (1) = EZ = 1 WLOG) and condition (3) holds. Then, for any initial condition f (0) ∈ M, a solution f (·) of (10) (i.e., a mean-field model) is unique. (Note that we do not assume that f (0) has well-defined meanf (0), or (2).) Proof. We know that solutions to (10) are mean-field models. Papers [6,20] study the properties of mean-field models. It is easy to check that the proofs of all results in Sections 4.4 and 4.5 of [6], including Theorem 2, stating that the Wasserstein W 1 -distance (i.e., the L 1 -norm of the difference) between any two meanfield models f (1) (·) and f (2) (·) is non-increasing, never use the fact that the meansf (1) (0) andf (2) (0) are well-defined and equal to 0. Those proofs only use the fact that (The conditions EZ < ∞ and (3) are used there. Technically, those proofs in [6] assume that the jumps size distribution J(·) is exponential -but only the property (3) is actually used.) For any f (0) ∈ M a mean-field model f (·) is such that (see [20]) Now, if f (1) (·) and f (2) (·) are two different solutions with the same initial condition, Therefore, the Wasserstein W 1 -distance between f (1) (t) and f (2) (t) cannot increase, which implies the uniqueness. ✷ Proof of Theorem 3(i). We have established that the family of distributions of the processes is C-tight, any subsequential distributional limit is concentrated on solutions f (·) to (10) (i.e., mean-field models), and the solution f (·) with initial condition f (0) is unique. This implies the convergence f n (·) ⇒ f (·).

Characterization of a limit of stationary distributions.
Suppose f (0) ∈M, that is f (0) ∈ M andf (0) = 0. If f (·) is a mean-field model with initial state f (0) (i.e., the solution to (10)), then the corresponding centered trajectoryf (·) will be called the centered mean-field model.
Lemma 13. The distribution of any subsequential limitf (∞) is a stationary distribution of the deterministic process evolving along centered mean-field limits.
Proof. By Theorem 3(ii), the dependence of the deterministic trajectoryf (·) on the initial statef (0) is continuous. Then we can apply Theorem 8.5.1 in [9], adapted to our setting. Or, the proof is also easy to obtain directly as follows. We need to show that for any test function h ∈ C b and any t ≥ 0, we have where f (0) is equal in distribution to f (∞). We obtain (34) by taking the limit of the equality Ef n (0)h = Ef n (t)h, wheref n (0) is equal in distribution tof n (∞); (35) clearly holds for all n. Clearly, Ef n (0)h → Ef (0)h. To demonstrate Ef n (t)h → Ef (t)h, we can use Skorohod representation, so that the convergencef n (0) w →f (0) is a.s. For any deterministic converging sequencef n (0) w →f (0) we have, by Theorem 3(ii),f n (t) ⇒f (t) (which is equivalent tof n (t) P → f (t)), and then Ef n (t)h → Ef (t)h. Thus, we obtain (36), and then (34). ✷

Discussion
Main results of this paper, in a sense, complete the "program" represented by previous work [5,6,20] on the specific model in this paper. Paper [5], informally speaking, proves the convergence to a deterministic mean-field model as n → ∞. (In our paper we generalize that result to a more general model, without the finite-dependence assumption.) Paper [6], informally speaking, proves that if a traveling wave exists, then each mean-field model trajectory is attracted to that traveling wave, as t → ∞; paper [20] shows that a traveling wave does exist under very general assumptions. This paper proves that the convergence to the traveling wave also holds if we "interchange the limits:" first consider the stationary distribution (take limit in t → ∞) and then consider the n → ∞ limit of stationary distribution; if we take limits in this order, the limit is same -a traveling wave. Thus, the results of this "program," answer essentially "all" questions about the behavior of the system when n is large -both about its transient behavior and about its stationary distribution.