Analysis of Large Urn Models with Local Mean-Field Interactions

The stochastic models investigated in this paper describe the evolution of a set of $F_N$ identical balls scattered into $N$ urns connected by an underlying symmetrical graph with constant degree $h_N$. After some random amount of time {\em all the balls} of any urn are redistributed locally, among the $h_N$ urns of its neighborhood. The allocation of balls is done at random according to a set of weights which depend on the state of the system. The main original features of this context is that the cardinality $h_N$ of the range of interaction is not necessarily linear with respect to $N$ as in a classical mean-field context and, also, that the number of simultaneous jumps of the process is not bounded due to the redistribution of all balls of an urn at the same time. The approach relies on the analysis of the evolution of the local empirical distributions associated to the state of urns located in the neighborhood of a given urn. Under convenient conditions, by taking an appropriate Wasserstein distance and by establishing several technical estimates for local empirical distributions, we are able to prove mean-field convergence results. When the load per node goes to infinity, a convergence result for the invariant distribution of the associated McKean-Vlasov process is obtained for several allocation policies. For the class of power of $d$ choices policies, we show that the associated invariant measure has an asymptotic finite support property under this regime. This result differs somewhat from the classical double exponential decay property usually encountered in the literature for power of $d$ choices policies.


Introduction
The stochastic models investigated in this paper describe the evolution of a set of N urns indexed by i∈{1, . . . , N } with F N identical balls. There is an underlying deterministic symmetrical graph structure connecting the urns. The system evolves as follows. After some exponentially distributed amount of time with mean 1 all the balls of an urn with index i∈{1, . . . , N } say, are redistributed among a subset H N (i) of urns in the neighborhood of urn i. An important feature of the model is that the allocation of the balls into urns of H N (i) is done at random according to a probability vector depending on the number of balls in the urns.
Quite general allocation schemes are investigated but two policies stand out because of their importance. Their specific equilibrium properties are analyzed in detail in the paper. We describe them quickly. Assume that a ball of urn i has to be allocated.
The ball is allocated uniformly at random in one of the neighboring urns, i.e. in one of the urns whose index is in H N (i), this occurs with probability 1/Card(H N (i)). (

2) Power of d-Choices
For this scheme, a subset of d urns whose indices are in H N (i) is taken at random, the ball is allocated to the urn having the least number of balls among these d urns. Ties are broken with coin tossing. Under some weak symmetry assumption and supposing that the state of the system can be represented by an irreducible finite state Markov process, at equilibrium the average number of balls per urn is F N /N whatever the allocation procedure is. The main problem considered in this paper concerns the distribution of the number of balls in a given urn when the number of urns is large. It may be expected that if the algorithm used to perform allocation is chosen conveniently, then there are few heavily loaded urns, i.e. the probability of such event should be significantly small. An additional desirable feature of such an algorithm is that only a limited information should be required for the allocation of the balls. In our case this will be the knowledge of the occupancy of few urns among the neighboring urns.  [8], Godrèche and Luck [11] and Evans and Hanney [10] for recent overviews. Urn models have been used in statistical mechanics as toy models to understand conceptual problems related to non-equilibrium properties of particle systems. As a model they describe the evolution a set of particles moving from one urn to another one. Ehrenfest [8] is one of the most famous models: in continuous time models, each particle moves at random to another urn after an exponentially distributed amount of time. There may also be underlying structure for the urns, a ball can be allocated to the connected urns at random according to some weights on the corresponding edges or, via a metropolis algorithm, associated to some energy functional. Due to their importance and simplicity, these stochastic models have been thoroughly investigated over the years: reversibility properties, precise estimates of fluctuations, hitting times, convergence rate to equilibrium, . . . See, for example Diaconis [7]. For these models balls are moved one by one. Closely related to these models is the zero range process for which the corresponding rate is f (x) for some general function f on N. When f (·) is constant the jumps are in fact associated with the urns instead of the balls, as it is our case. See Evans and Hanney [10].
These problems are also considered in theoretical computer science to evaluate the efficiency of algorithms to allocate tasks to processors in a large scale computing network for example. See Karthik et al. [15], Maguluri et al. [22] and Sun et al. [25]. A classical problem in this context is the assignments of N balls into N urns. Balls are assumed to be allocated one by one. The constraints in this setting are of minimizing the maximum of the number of balls in the urns with a reduced information on the state of the whole system. When balls are distributed randomly, it has been proved that the maximum number of balls in an urn is, with high probability, of the order of log N/ log log N . See Kolchin et al. [17]. By using an algorithm of the type power of d-choices as described above, Azar et al. [2] has shown that the maximum is of the order of log log N/ log d. Hence, with a limited information on the system, only the state of d urns is required, the improvement over the random policy is striking. See also Mitzenmacher [23].
A related problem is of assigning the jobs to the queues of N processing units working at rate 1. The jobs are assumed to be arriving according to a Poisson process with rate λN . When the natural stability condition λ<1 holds, if the jobs are allocated at random, then, at equilibrium, it is easily shown that the tail distribution of the number of jobs at a given unit decreases exponentially. If the allocation follows a power of d choices, Vvedenskaya et al. [27] has shown that the corresponding tail distribution has a double exponential decay, i.e. of the type x→ exp(−α 1 exp(α 2 x)) for some positive constants α 1 and α 2 .
The initial motivation of this work is coming from a collaboration with computer scientists to study the efficiency of duplication algorithms in the context of a large distributed system. The urns are the disks of a set of servers and the balls are copies of files on these disks. The redistribution of balls of an urn correspond to a disk crash, in this case, the copies of its lost files are retrieved on other neighboring disks. See Sun et al. [25] for more details on the modeling aspects.

Results.
Mean-Field Convergence. For the stochastic models investigated in this paper there are N urns and a total of F N balls with F N ∼βN for some β>0. The underlying graph is symmetrical with degree h N =Card(H N (i)), 1≤i≤N . The sequence (h N ) is assumed to converge to infinity. The state of the system is represented by a vector =( i , 1≤i≤N ) describing the number of balls in the urns. The main mathematical difficulties in the analysis of the stochastic model have two sources: (1) Multiple Simultaneous Jumps.
When the exponential clock associated to urn i rings, then all its balls are redistributed to other urns of the system. For this reason, there are i ∈{1, . . . , F N } jumps occurring simultaneously.
be the local empirical distribution around urn i, where δ a is the Dirac mass at a∈N. When urn i has to allocate a ball, it is sent to one of h N neighboring urns, j∈H N (i), with a probability of the order of Ψ( This interaction with only local neighborhoods and the unbounded number of simultaneous jumps are at the origin of the main technical difficulties to establish a mean-field convergence theorem. See the quite intricate evolution equation (26) for these local empirical distributions below. More specifically, this is, partially, due to a factor i /h N which has to be controlled in several integrands of the evolution equations. This is where unbounded jumps play a role. See Relation (37) for example. A convenient Wasserstein distance (19) is introduced for this reason.
It should be noted a classical mean-field analysis can be achieved when the sequences (h N ) has a linear growth, this is in fact close to a classical mean-field framework with full interaction. In this case the term i /h N is not anymore a problem since i is bounded by F N ∼βN , The same is true if the simultaneous jumps feature is removed by assuming for example that only a ball is transferred for each event. In this case a mean-field result can be established with standard methods for the model with neighborhoods.
Literature. For an introduction to the classical mean-field approach, see Sznitman [26]. Specific results for power of d choices policies when h N =N are presented in Graham [12], see also Luczak and McDiarmid [21]. Budhiraja et al. [3] considers also such a local interaction for a queueing network with an underlying graph which is, possibly, random, with jumps of size 1. In this setting when the graph is deterministic, the mean-field analysis can be carried out by using standard arguments. Andreis et al. [1] investigates mean-field of jump processes associated to neural networks with a large number of simultaneous jumps with an infinitesimal amplitude. Luçon and Stannat [20] establishes mean-field results in a diffusion context when the mean-field interaction involves particles in a box whose size is linear with respect to N and a spatial component with possible singularities. A related setting is also considered in Müller [24].
Asymptotic Finite Support Property. The mean field results show that, for a fixed asymptotic load β per urn and under appropriate conditions, the evolution of the state of the occupancy of a given urn is converging in distribution to some nonlinear Markov process (L β (t)). A probability distribution π on N will be said to be invariant for this process when the following property holds: if the distribution of the initial value L β (0) is π then, for any T >0, the processes (L β (t+T )) and (L β (t)) have the same distribution. In particular the distribution of L β (t) is constant for all t≥0.
We show that, for all classes of allocations considered in this paper, when it exists, an invariant distribution of (L β (t)/β) has at a tail which is upper-bounded by an exponentially decreasing function. For the random algorithm this is an exact exponentially decreasing tail distribution in fact. This implies that a small, but significant, fraction of urns will have an arbitrarily large load. For this reason the performances of random algorithms are weak in terms of occupancy of urns. Recall that these algorithms do not use any information to allocate balls into urns, the question is that if some minimal information is used, can we improve the order of magnitude of the load of a given urn?
For the power of d choices algorithm, we show that, when the average load β is large, the invariant distribution of (L β (t)/β) is converging in distribution to a distribution with a support in the compact interval [0, d/(d−1)]. When d=2, this is a uniform distribution on [0, 2]. The striking feature is that, for an average load of β per urn, at equilibrium, the occupancy of a given urn is at most βd/(d−1) with probability arbitrarily close to 1. In some way this can be seen as the equivalent of the double exponential decay property of Vvedenskaya et al. [27] in this context. Note that, contrary to the model analyzed in Vvedenskaya et al. [27], we do not have an explicit expression for the invariant distribution of (L β (t)). It should be noted that this is an asymptotic picture for N large and also β large. Experiments seem to show nevertheless that, in practice, this is an accurate description. See for example Figure 2 of Sun et al. [25] where d=2, N =200 and β=150 and the uniform distribution on [0, 300] is quite neat. Explicit bounds on error terms would be of interest but seem to be out of reach for the moment.

1.3.
Outline of the Paper. The stochastic model is presented in Section 2. Section 3 introduces the main evolution equations for the local empirical distributions and establishes the existence and uniqueness properties of the corresponding asymptotic McKean-Vlasov process. The main convergence results are proved in Section 4 via several technical estimates. Section 5 is devoted to the analysis of the invariant distribution of the McKean-Vlasov process. The finite support property is proved in this section.
Acknowledgments. The paper has benefited from several useful remarks from two anonymous reviewers. The authors are really grateful for the work they have done on the first version of the paper.

The Stochastic Model
We give a precise mathematical description of our system with N urns and F N balls with the following scaling assumption for some β>0. An index N will be added on the important quantities describing the system to stress the dependence on N but not systematically to make the various equations involved more readable.
In this section, we describe the topology of the class of graphs considered and the allocation policies which are investigated. Finally, the mathematical tools used to represent and analyze the evolution of the state of the system, see Section 3, are introduced.
2.1. Graph Structure. We first describe the topology of the system, i.e. the set of links between the N urns. The urns connected to a given urn i determine the set of possible destinations when the balls of node i are redistributed.
The N nodes are labeled from 1 to N and a set H N ⊂{2, . . . , N } which is assumed to satisfy the following property of symmetry with the convention used throughout the paper that (0 mod N ) is N .
The set H N is the set of neighbors of node 1. The set H N (i) of neighbors of a node i∈{1, . . . , N } is defined by translation in the following way In particular H N (1)=H N , one denotes by h N the cardinality of H N . Relation (3) gives the property that the associated graph is symmetrical that is, if i∈H N (j) then j∈H N (i). We give some simple situations of graphs satisfying these assumptions.
Examples.  We have chosen that 1 ∈H N and, consequently, i ∈H N (i). For the allocation process it expresses the fact that a ball of a given urn cannot be re-allocated to this urn when the ball is redistributed. Our results in the following do not need this assumption in fact. It just simplifies some steps of the proofs, to compute the previsible increasing processes of some martingales in particular.

Assumption [T] (Topology).
(1) The sequence of the degrees of nodes (h N ) def.
(2) The interaction set of node i, 1≤i≤N , is defined as and its cardinality satisfies The set A N (i) is the set of nodes which interact with a node in the neighborhood of i. Technically, this subset appears naturally in the evolution equation (26) of the process (Λ N i (t)), the local empirical distribution at a given node i∈{1, . . . , N } defined by Relation (1). The condition on its cardinality is used in Lemma 3.4 to prove that the martingale part of these equations is asymptotically negligible. See also Relation (34).
Assumption [T] is clearly satisfied for the examples described above, note that in these cases Card(A N (1))≤Ch N , for some constant C.

Allocation Algorithms.
A set of F N balls are scattered in the nodes, the state of the system is thus described by a vector =( i ) ∈ S N , with for i∈{1, . . . , N }, i is the number of balls in urn i. For each ∈S N , one associates a probability vector P N ( )=(p N i ( ), 1≤i≤N ) with support on H N , i.e. p N i =0 if i ∈H N . The vector P N ( ) is in fact the set of weights associated to urn 1 in state and p N i ( ) is the probability that a given ball of urn 1 is allocated to urn i. As for the topology, we define the vector of weights for the other urns by translation.
For i∈{1, . . . , N } a probability vector P N i ( )=(p N ij ( )) with support on H N (i) is defined by (5) p N ij ( )=p N a (y), with a=(j−i+1 mod N ) and y=( (i+k−1 modN ) , 1≤k≤N ), for any j∈{1, . . . , N }. In particular P N 1 ( )=P N ( ) and P N i (·) is the vector P N (·) "centered" around node i. The quantity p N ij ( ) is the probability that, in state , a ball of urn i is allocated to urn j.
The dynamics are as follows: After an exponentially distributed amount of time with mean 1, the balls of an urn i, 1≤i≤N , are distributed into the neighboring urns in the subset H N (i) one by one, according to some policy depending on the state =( j ) of the system just before the event. Each of the i balls of the ith urn is distributed on H N (i) according to the probability vector P i ( ) and the corresponding i choices are assumed to be independent. As it will be seen our asymptotic results hold under general assumptions, the following cases will be discussed due to their practical importance. Examples.
Balls of a given urn i∈{1, . . . , N } are sent uniformly at random into an urn with index in H N (i). This is the simplest policy which is used in large distributed systems. This corresponds to the case where (2) Random Weighted Algorithm. Each ball is sent into urn j∈H N (i) with a probability proportional to W ( j ), that is where W is some function on N with values in (c, C) with 0<c<C.
For each ball, d urns are chosen at random in H N (i), the ball is allocated to the urn of this subset having the minimum number of balls. Ties are broken with coin tossing. The probability of assigning a ball to an urn with at least m balls, m∈N, is therefore with the convention that n d =0 if n<d. Hence, by taking into account the ties, for j∈H N , since p N j ( ) is the probability of assigning a ball to a given urn of H with j balls.
For simplicity we have chosen to consider only the state of the system just before the jump for all the balls which have to be moved. The state of the system could be also updated after each ball allocation and, consequently the vector p N i (·), the dynamical description would then be more intricate to express.
It should be noted that the exponential clock associated to the ith urn gives simultaneous jumps of j coordinates with high probability if N is large. In particular the magnitude of a possible jump is not bounded which leads to significant technical complications to prove a mean-field result as it will be seen.
We now introduce the main assumption on the allocation policies investigated. It describes the asymptotic behavior of the probability vector (P N (·)) when N is large with a key functional Ψ which is playing an important role in the subsequent analysis. The non-linear part of the dynamic of the associated McKean-Vlasov process is essentially expressed with this functional. See Relation (28) of Section 3 for example.
The set M 1 (N) is the space of probability distributions on N and · tv is the total variation norm defined by, if µ 1 , µ 2 ∈M 1 (N) where [µ 1 , µ 2 ] is the set of couplings Q associated to µ 1 and µ 2 , elements Q∈M 1 (N 2 ) such that Q(dx, N)=µ 1 and Q(N, dy)=µ 2 . See Proposition 4.7 of Levin et al. [19]. Relation (10) is a conservation of mass condition, all balls are reallocated. As it will be seen in the investigation of mean-field convergence, the specific nonlinear term in the limiting dynamical system is expressed with the functional Ψ, Condition (11) is just a classical Lipschitz condition as it is quite common in such a setting.

Random Weighted Algorithm. Assumption [A] is satisfied with the function Ψ given by
with the convention that 0/0=0.

Stochastic Representation of the Dynamics of Allocation.
In order to use a convenient stochastic calculus to study these allocation algorithms, one has to introduce marked Poisson processes. See Chapter 5 of Kingman [16] for an introduction on marked Poisson point processes.
We define the space of marks M=[0, 1] N , a mark u=(u k )∈M associated to an urn i∈{1, . . . , N } describes how the balls of this urn are allocated in the system: If the state is =( j ), assuming that the balls are indexed by 1≤k≤ i , the kth ball is allocated to urn j∈{1, . . . , N } if if u k is a uniform random variable on [0, 1], this occurs with probability p N ij ( ). For i, j∈{1, . . . , N }, we introduce the family of mappings Z ij on M×S N defined by, for u∈M and ∈S N , the quantity Z ij (u, ) is the number of balls of urn i which are allocated to urn j if the ith urn is emptied when the system is in state and with mark u. If U =(U k ) is an i.i.d. sequence of uniform random variables on [0, 1] then, clearly, for i∈{1, . . . , N } and j ∈ H N (i), and ∈S N , where B i ( ) is a multinomial distribution with parameters i and p N i1 ( ), . . . , p N iN ( ), in particular is the number of points of N in A. We denote by N the point process on R + defined by the first coordinates of the points of N , i.e. N (dt)=N (dt, M), N is a Poisson process on R + with rate 1. We denote by N i , i∈N, i.i.d. marked Poisson point processes with the same distribution as N .
The martingale property mentioned in the following is associated to the filtration (F t ) defined as follows, F 0 is the σ-field associated to the initial state of the urn process and, for t>0, We recall an elementary result concerning the martingales associated to marked Poisson point processes. It is used throughout the paper. See Section 4.5 of Jacobsen [14], see also Last and Brandt [18] for more details.
is a square integrable martingale with respect to the filtration (F t ), its previsible increasing process is given by We conclude this section with some notations which will be used throughout the paper.
2.4. Wasserstein Distances. Throughout the paper M 1 (X) denotes the set of probability distributions on the set X. If µ∈M 1 (N) and f :N → R, provided that the latter term is well defined. The function I on N is the identity function, The space M 1 (N) endowed with the total variation norm is a separable Banach space. For p>0, we define the Wasserstein distance, where the set [σ 1 , σ 2 ] of couplings of σ 1 and σ 2 is defined below Relation (12).
the distance associated to the topology of D T def.
= D([0, T ], M 1 (N)) as defined in Section 3.5 of Ethier and Kurtz [9], in this way (D T , d T ) is a complete separable metric space.
We introduce the Wasserstein metric W T on the corresponding stochastic process, on M 1 (D T ), the space of càdlàg processes with values in M 1 (N). Let P 1 and where, for a, b∈R, a∧b= min(a, b and [P 1 , P 2 ] is, similarly to Definition (12), the set of couplings associated to distributions P 1 and P 2 , i.e. Q∈[P 1 , P 2 ] is an element of M 1 (D 2 T ) with marginals are P 1 and P 2 respectively. The metric space (M 1 (D T ), W T ) is complete and separable. See Section 3 of Dawson [6] for a more specific presentation of measure-valued stochastic processes.
We will use the stronger Wasserstein distance W T to establish our convergence results,

Evolution Equations
In this section, we introduce the stochastic processes used to represent the evolution of the state of the system as well as the stochastic differential equations (SDE) they satisfy.
The state of the system at time t is denoted by a càdlàg process (L N (t)), with is the number of balls in urn i at time t. One defines the local empirical distribution at i, 1≤i≤N , at time t≥0 by (20) Λ the global empirical distribution is, classically, The process (L N i (t)) can be represented as the solution of the following SDE, for 1≤i≤N , (22) dL where f (t−) denotes the left limit of the function f at t>0. For i∈{1, . . . , N }, the points of the process N i (dt) correspond to the instants when the ith urn is emptied. Recall the notation N i (dt)=N i (dt, M). If time t is one of these instants, due to the uniform distribution assumption of the variables (U · k ), Relation (13) gives that, conditionally on F t− , a ball from urn i is allocated to urn j with probability p N ij (L N (t−)). This shows that the solution of Equation (22) does represent our allocation process of balls into the urns.
The distribution of the vector L N (0) def.
(2) The local empirical distribution of the initial state converges in distribution, for the total variation distance, to a random variable Π 0 ∈M 1 (N), i.e. to a random probability distribution on N, and such that P( Π(0), I =β)=1. (3) There exists some η>0 such that Relation (23) implies that, for 1≤i≤N , L N i (0) has the same distribution as L N i+1 (0), and therefore they are identically distributed. Definition (3) of the topology of the graph by translation of the set of neighboring nodes gives that the local empirical distribution Λ N i (0) at node i has the same distribution as Λ N 1 (0). The dynamics of the evolution, see Equation (22), are also invariant (in distribution) under translation, consequently, for t≥0, the variable Evolution Equations for Local Empirical Distributions. Recall that, for any function f with finite support on N, The SDE (22) can then be rewritten in the following way, where the variables (Z jk (·, ·)) are defined by Relation (14) and A N (i) by Relation (4).

A Heuristic Asymptotic Description.
We first present an informal, hopefully intuitive, motivation for the asymptotic SDE satisfied by the time evolution of the number of balls in a given urn. It should be noted that we will not establish our mean-field result in the same way. The method can be used in a simpler setting, see Sun et al. [25]. It does not seem to be possible for our current model. We assume for this section that the distribution of the Π 0 of Relation (24) is a Dirac measure at π 0 ∈M 1 (N). The integration of Equation (22) and the use of Proposition 1 lead to the relation where (C N i (t)) is the process associated to the interaction of the nodes in the neighborhood of i, As it will be seen, under Condition [A] of Section 2.2, with high probability the process (Z ji (u, L N (s−)), u∈M, s≤t) is either 0 or 1 and, consequently, (C N i (t)) is a counting process. This comes essentially from the fact that, on a bounded time interval, when the balls of an urn are re-distributed, the probability of having two balls assigned to the same urn of a given neighborhood will be of the order of 1/h N . Additionally, the process is the compensator of (C N i (t)). See Example 4.3.4 p. 56 of Jacobsen [14]. With the definition of (p N ji ( )), Relation (9)

of Condition [A] gives the equivalence
) with high probability on finite time intervals, then Assuming that the local empirical measures (Λ N j (t)) [resp. the processes (L N j (t))] are converging in distribution to a continuous deterministic process (Λ(t)) [resp. to a process (L(t))]. In particular (Λ(t)(f ))=(E(f (L(t)))) and, due to the fact that L N (t)∈S N and the scaling assumption (2), where I(x)=x is the identity. Under this hypothesis, one would have the equivalence in distribution for the compensator of (C N i (t)), This suggest that the sequence of counting processes (C N i (t)) is converging in distribution to a counting process (C(t)) given by (C(t)) = where P is an homogeneous Poisson point process on R 2 + .
In view of Relation (27), a possible limit (L N 1 (t)) of (L N 1 (t)) when N is large should satisfy the following SDE We first establish the existence and uniqueness of such a process. The proof of the convergence in distribution of (L N i (t)), 1≤i≤N , to this asymptotic process is achieved in the next section. = π(0)∈M 1 (N) and such that the SDE holds, where, for t>0, π(t) is the distribution of L(t) on N and P [resp. P] is an homogeneous Poisson point process on R 2 + [resp. R + ] with rate 1 and the random processes P and P are independent .
This will be referred to as the McKean-Vlasov process associated to this model. The associated process (π(t)) is a continuous M 1 (N)-valued function, for any function f with finite support on N, π(t), f = E(f (L(t))) and the equation holds where, for σ∈M 1 (N), the generator Ω σ is defined by for x∈N.
To prove the mean-field convergence under Condition [I], it is convenient to introduce the random process on M 1 (N ) satisfying the integral equation Recall that Π(0) is the asymptotic local empirical distribution of a node at time 0, it is a priori random. Its distribution is therefore an element of M 1 (M 1 (N)).
Proof of Theorem 1. The result is a direct consequence of Theorem 2.1 of Graham [13]. All we have to prove is that there exists a constant C 0 such that, for all σ, σ ∈M 1 (N) and , ∈N, the relation holds, where W 1 is the Wasserstein distance on M 1 (N) defined by Relation (17). This result can also be expressed as π(t), I =β for all t≥0. Recall the I is the identity function on N, I(x)=x for x∈N.
Proof. By integrating Equation (28) and by using the fact that Ψ is bounded by Assumption [A], we get that L(t) is integrable for all t≥0 and Relation (10) of Assumption [A] gives that E(Ψ(π(s), L(s)))=1, for all s≥0. It is then easy to conclude that (E(L(t))) is constant equal to β.
We conclude the section with two technical results which will be used in the proof of the mean-field convergence. The first one gives the existence of an exponential moment for the McKean-Vlasov process, this is in fact a key ingredient in the proof of Theorem 2 for the convergence in distribution of the local empirical distributions. Proof. With a convenient probability space, for π∈M 1 (N), we denote by (L π (t)) the solution of the SDE (28) with initial distribution π. From the SDE, we get the following inequality, for all t≤T , By applying this estimate we get, for η>0, almost surely, for the distribution of We conclude by integrating the inequality with respect to the distribution of Π(0).
The following simple lemma is used in the proof of Proposition 5 to derive a Grönwall-like inequality for the distance between the empirical distribution and the McKean-Vlasov process.
holds for any σ, σ ∈M 1 (N) such that f is integrable with respect to σ and σ .
Proof. This is a simple consequence of the inequality and Relation (11).

We establish the main convergence results, namely that under Assumptions [T], [A] and [I]
, the process of the local empirical distribution at a node is converging in distribution to the solution of the Fokker-Planck equation (30). It is also shown that the same result holds for the (global) empirical distribution. The important result of this section is the following theorem. In particular the process (Λ N i (t)) converges to (Π(t)) for the Wasserstein metric W T defined by Relation (19).

The statement of Assumption [T] is in Section 2.1, that of [A] is in Section 2.2 and for Assumption [I] it is in Section 3.
The proof of this result is quite technical. The second term within the expected value of Relation (31) is related to a uniform L 1 -convergence on finite time intervals of the local mean on the neighboring nodes of node i. This term plays an important role in the convergence result as it will be seen. When Π(0) of Assumption [I] is deterministic, the process (Π(t)) is deterministic and from Proposition 2 of Section 3, we have Π(t), I =β, for all t≥0. The theorem shows in particular that the limit of the process of the local average of the load in a given neighborhood is constant and equal to β. The total average, i.e. when all nodes are considered, is deterministic and equal to F N /N , its convergence is just the scaling assumption (2).
The general strategy to establish the mean-field convergence is to use the system of evolution equations (26) and decompose it in a convenient way with the help of stochastic calculus for Poisson process and with various estimates. This is done by proving successively technical results: Lemma 3.2 for the boundedness of the second moments of the number of balls in an urn, Lemma 3.3 to show that an urn receives at most one ball at each event on any finite time interval, and Lemma 3.4 to prove that the martingales vanish in the limit. A Grönwall's Inequality for the distance to the McKean-Vlasov process is established with Propositions 4 and 5. The theorem is then proved. The mean-field convergence, i.e. the convergence in distribution of the empirical distribution, is established in Theorem 3.
We begin by recalling and introducing some notations which will be used throughout this section.
-As before, if f is some function on N, for y∈N, ∇ y is the discrete gradient operator defined by, for x∈N, def.
-The set of 1-Lipschitz functions on N is denoted as Lip (1), and, for K≥0, = x1 {x≥K} and, as before, Relation (26) is rewritten by compensating the stochastic integrals with respect to the Poisson processes then, by using Proposition 1, one gets that, for a bounded function f on N and t≥0, B kj (·) is the binomial distribution defined by Relation (16) and (M N f,i (t)) is a martingale whose previsible increasing process ( M N f,i (t) ) is given, via some simple calculations, by where A N (i) is defined by Relation (4) and B j (·) is the multinomial distribution defined by Relation (15). We introduce the potential asymptotic process of Theorem 1 in this relation. A careful (and somewhat cumbersome) rewriting of Relation (33) gives the identity where Ω . is the operator defined by Relation (29) and I is the identity function. The other terms are where (Π(s)) is defined by Equation (30), Note that the almost sure relation Π(s), I =β, s≥0 has been used in this derivation. The term L N k (s)/h N in the expression of (Y N f,i (t)) is the main source of difficulty to prove the mean-field convergence. When the sequence (h N ) grows linearly with N then, since |L N k |≤F N ∼βN , this term is bounded and the usual contraction methods, via Grönwall's Inequality, can be used without too much difficulty. A more careful approach has to be considered if the growth of (h N ) is sublinear. Finally, where (B kj ( )) are the binomial distributions defined by Relation (16). We first consider the last four terms of Relation (35) via three technical lemmas.

Lemma 3.2. Under Assumptions [A] and [I] of Section 3, for any
Proof. Assumption [I] shows that the quantity is finite. From Relation (9) and the boundedness of the functional Ψ, we get the existence of a constant C 0 and N 0 ∈N, such that, for N ≥N 0 , the relation holds. Proposition 1 and Relation (22) give the identity For s≥0, by using Relation (39) and the symmetry of the model, we get by Cauchy-Shwartz's Inequality. Similarly, by using the expression of the second moment of a binomial variable, By plugging these estimates in Equation (40), we obtain the following inequality, for all N ≥1, . A straightforward use of Grönwall's Inequality gives then directly the estimation since the constants q 0 and C 0 do not depend on N . The lemma is proved.
then the sequence ( Z N T ) converges to 0. Proof. We denote by δ N 1 (f, t), δ N 2 (f, t) and δ N 3 (f, t) the three terms of the right hand side of Definition (38) of (Z N f,i (t)). Let, for l∈N, B(l) be binomial distribution with parameter l and p 0 def.

By using this inequality and the fact that if
From Lemma 3.2 we deduce that the right hand side of the relation is converging to 0 as N gets large. A similar argument can also be used for the term (δ N 2 (f, t)). For the last term of (Z N f,i (t)) Relation (9) of Assumption [A] shows that this term is converging to 0 when N goes to infinity. The lemma is proved.

Lemma 3.4. Under Assumptions [T] of Section 2.1 and [I] of Section 3, the relation
holds for any T ≥0, where, for f ∈Lip (1), (M N f,i (t)) is the martingale of SDE (33).
Proof. By using Relation (34), we get the inequality Note that, for ∈S N , the multinomial distribution B j ( ) on N N has the support {z∈N N :z 1 +z 2 + · · · +z N = j }, which gives the relation We conclude the proof by using again Lemma 3.2, Assumption [T] and Doob's Inequality.
Now we can turn to the remaining terms of Relation (35) to establish the convergence results. = E sup and for K>0, where I is the identity function, (X N f,1 (t)) and (Y N f,1 (t)) are defined respectively by Relations (36) and (37), and (Π(t)) by Equation (30).
Proof. The first Inequality is straightforward to derive from Relation (36).
Let f ∈Lip(1) and t≤T , by the Lipschitz property of Relation (11) we get that For s≥0, . We get therefore, by symmetry, that 1 h 2 , and this term is smaller than which gives the desired result.
For the next step to prove the main mean-field result, one has to estimate the deviations of the local mean, this is the next proposition. We define, for t≥0, d N (t) def.
We are going to show that, for T >0, the sequence (d N (T )) is converging to 0. Let

Proposition 5. Under Assumptions [T], [A] and [I]
, for any T >0, there exists a constant C 0 >0 such that the relation holds for all t≤T and K≥1.
Proof. Noting that I∈Lip (1), Relations (30) and (35) give the inequality, for T >0 and t≤T , with the notations of Proposition 4 and Lemma 3.3. From Lemma 3.1 we get the relation, for s≥0, , Ω Π(s) (I) ≤ (2 Ψ ∞ +D Ψ )β Λ N 1 (s)−Π(s) tv + Λ N 1 (s)−Π(s), I . By using Proposition 4 and Lemma 3.3, we get that We now turn to the estimation of d N 1 (t). For f :N→{0, 1} and T >0, by Lemma 3.1, if t≤T , (30) and (35) give then the inequality | By definition of the total variation norm, see Relation (12), we have With the same argument as before, we get Denote by E K ={f =1 F :F ⊂[0, K]}, By taking successively the supremum on all f ∈E K and s≤t for Relation (48), we obtain the inequality, for t≤T , Again the quantity X N t + Y N t is upper bounded with the help of Proposition 4. If we gather the estimates (47), (49), (50) and (51), we obtain that, for T ≥0, there exists a constant C 0 independent of K≥1 and T such that, for any t≤T , Note that, for s≥0, | Λ N 1 (s)−Π(s), I |≤F N /N +β, the mapping s →d N (s) is therefore bounded by a constant. By using Assumption [I], Lemmas 3.3 and 3.4, and by applying Fatou's Lemma, we obtain the relation, The proposition is proved.
We can now prove the main convergence result.
Proof of Theorem 2. The notations of the proof of the previous proposition are used. With Grönwall's Inequality and Relation (46), we get the relation for all t≤T . Proposition 3 gives the existence of some η>0 and C 1 >0 such that By letting K go to infinity, we obtain that d(t)=0 for all t≤t 1 def.
= η/(2C 0 )∧T and therefore the desired convergence on the time interval [0, t 1 ]. In particular if we make a time shift at t 1 , it is easy to check that Assumption [I] of Section 3 are also satisfied for this initial state and we can then repeat the same procedure until the time T is reached. The theorem is proved.
and, for any p≥2, and (n i )∈N p , where Q π0 is the distribution of the of McKean-Vlasov process (L(t)), the solution of the SDE (28) with L(0) dist.
Proof. Let (Λ N (t)) be the process of the empirical distribution associated to the vector (L N i (t), 1≤i≤N ), then, for t≥0, it is easily seen that, because of the structure of the underlying graph, the relation The first claim of the theorem follows directly from Theorem 2 and the fact that the processes (Λ N i (t)), 1≤i≤N , have the same distribution. The last assertion is a simple consequence of the fact that (Π(t)) is in this case a deterministic process and of Proposition 2.2 p. 177 of Sznitman [26].
Convergence Properties of the Invariant Distributions. We conclude this section with a mean-field convergence result for the invariant distributions of evolution equations (30) when the sequence of the sizes (h N ) of the neighborhoods grows at linearly with respect N . Further results, Propositions 6 and 7, will be given in Section 5.
For a fixed N , the irreducible Markov process (L N i (t)) with a finite state space has a unique invariant distribution. Let L N =( L N i ) be a random variable whose distribution is the invariant measure of (L N i (t)), and has the same distribution as the local empirical distribution at node 1 at equilibrium.
then the sequence ( Λ N 1 ) is tight and the distribution of any of its limiting points Q is an invariant distribution for the dynamical system of Relation (30), that is, if Law(Π(0))=Q, then, for all s∈R + , the process (Π(s+t), t≥0) has the same distribution as (Π(t), t≥0).
Proof. We first show that some exponential moments of the variables L N 1 , N ≥1, are bounded. For η>0, the balance equation for the function f ( )= exp(η 1 ), ∈S N , gives the relation, remember that L N 1 ≤F N , where, for ∈S N , By using Relation (39), we have p N i1 ( )≤C 0 /h N , we get then the estimation The assumption of the sequence (h N ) shows that there exists some η 1 >0 such that if η<η 1 then the last term is bounded by D 0 (e η −1) 2 for some constant D 0 ≥0. By using this inequality, we get the relation for some constant D 1 ≥0. If η 0 <η 1 is such that D 1 (e η0 −1)+D 0 (e η0 −1) 2 <1, then we get that the exponential moments of order η 0 of L N 1 are bounded, For K>0, from Lemma 3.2.8 of Dawson [6], we deduce that the sequence of local empirical distributions at equilibrium ( Λ N 1 ) is tight for the convergence in distribution. We take a subsequence (N k ) so that the sequence ( Λ N k 1 ) converge to a random probability distribution on N, Π.
With the same argument as in the proof of Theorem 3, we have that the sequence of global empirical distribution ( Λ N k ) is also converging in distribution to Π. The relation of conservation of mass Λ N k , I =F N k /N k , we thus get that, almost surely Π, I =β. The vector L N k satisfies therefore the conditions of Assumption [I].
We define the process ( L N i (t)) associated to the SDE (22) with the initial point ( L N i ) and ( Λ N 1 (t)) the corresponding local empirical distributions. Theorem 2 shows the convergence in distribution where (Π(t)) is the solution of Equation (30) with initial point Π. The last assertion comes from the fact that for s≥0 and k∈N, by invariance, we have the equality The proposition is proved.
The following section is devoted to the properties of the invariant distributions of the McKean-Vlasov process.

Equilibrium of the McKean-Vlasov Process
Recall that β is the average load of an urn at any time. The purpose of this section is to investigate the properties of the tail distribution of the number of balls in an urn at equilibrium when β get large. Recall that in our asymptotic picture, the time evolution of the number of balls in a given urn can be seen as the solution of the SDE defining the McKean-Vlasov process where P [resp. P] is an homogeneous Poisson point process on R 2 + [resp. R + ] with rate 1.
Suppose the process (L β (t)) starts from equilibrium, i.e. if π β ∈M 1 (N) is the distribution of L β (0) then, for all t≥0, the distribution π(t) of L β (t) is constant and equal to π β . In this case the process (L β (t)) is a classical Markov jump process on N. It is easily checked that any invariant distribution π β satisfies the following fixed point equation F β (π)= (F β (π)(n)) def.
The following proposition shows that, for a large class of functionals Ψ, a potential invariant distribution is always concentrated around values proportional to its average β. We will give more precise results later for power of d-choices algorithms. Additionally, an existence and uniqueness result is proved when the average load per urn is small enough. (1) if, for β>0, the distribution of a random variable Y β is invariant for the McKean-Vlasov process (28), then Y β is stochastically bounded by a geometric distribution with parameter β Ψ ∞ /(1+β Ψ ∞ ), in particular, for any x>0, .
It should be noted that for non-linear Markov processes, Inequalities like (57) are, in general, delicate to obtain, even when there is a unique invariant distribution. See for example, Carillo et al. [5] for non-linear diffusions associated to Langevin evolution equation, or Caputo et al. [4] in a discrete state space setting. In our case a convenient coupling and some calculus give the desired exponential decay of the Wasserstein distance to equilibrium.
Proof. Let π β be the distribution of Y β , If the initial distribution of the McKean-Vlasov process is π β , then (L β (t)) is a simple Markov process which jumps from n to n+1 at rate βΨ(π β , n) and returns at 0 with rate 1. Denote by ( L β (t)) a Markov process on N with the same characteristics except that the jumps from n to n+1 occur at rate β Ψ ∞ . It is easy to construct a coupling such that L β (0)=L β (0)=Y β and that L β (t)≤ L β (t) holds for all t≥0. By assumption, the distribution of L β (t) is constant with respect to t. It is easy to check that the invariant distribution of ( L β (t)) is a geometric distribution with parameter β Ψ ∞ /(1+β Ψ ∞ ). This proves the first part of the proposition.
This gives the Lipschitz property for the mapping F β for the total variation norm, Hence if βD Ψ (1+β Ψ ∞ ) 2 <1, F β is a contracting application for the total variation norm. Hence we get the existence and uniqueness of an invariant distribution. Let σ a and σ b ∈M 1 (N), two probability distributions on N with a second moment, and A and B two integer valued random variables whose distributions are respectively σ a and σ b . We denote (L a (t)) and (L b (t)) the solutions of the McKean-Vlasov equation (28) such that L a (0)=A and L b (0)=B and, for c∈{a, b}, where π c (t) is the distribution of L c (t). Note that for both processes we use the same Poisson processes P and P with intensity 1 on R 2 + and R + respectively. It is not difficult to show that, for all t≥0 and c∈{a, b}, L c (t) has a second moment.
By using the SDE's, we get that where J(s) is the intervals whose end points are β Ψ(π c (s), L c (s−)), c∈{a, b}. By taking the expectation in the last inequality we obtain that, for t≥0, Since the processes are integer-valued, by using that |x|≤x 2 for x∈Z, we have, as in the proof of Theorem 1, for s≥0, π a (s)−π b (s) tv ≤W 1 (π a (s), π b (s)) ≤ W 2 (π a (s), π b (s)) 2 .
and with the fact that the total variation distance is bounded by 1, We thus get that holds. If we choose β so that γ=(1−6βC)/2>0, it gives Grönwall's Inequality gives the relation By taking the minimum on all couplings (A, B) with marginals σ a and σ b , we get finally the inequality Hence if β is sufficiently small, Equation (56) shows that the unique invariant distribution has a second moment. Inequality (57) is then a consequence of the last relation. The proposition is proved.
We can now give a stronger version of Proposition 4 when β is sufficiently small. Recall that L N =( L N i ) is a random variable whose distribution is the invariant measure of (L N i (t)), and Λ N 1 def.
has the same distribution of the local empirical distribution at node 1 at equilibrium.
As it will be seen for specific functionals Ψ much more can be said, in particular the existence and uniqueness of the invariant distribution of the McKean-Vlasov process for all β>0. The rest of this section is devoted to the Random Weighted Algorithm and the Power of d-choices Algorithm, the existence and uniqueness of the invariant measure of the McKean-Vlasov process defined by SDE (28) is established and its limiting behavior when the load β is large is investigated. It is easy to see that this equation has a unique solution γ β which gives the existence and the uniqueness of the invariant distribution in this case.
When (W (k)) is constant equal to w>0, then balls are placed at random in the neighboring urns, independently of their loads in particular. In this case γ β =w and the corresponding invariant measure is the geometric distribution with parameter β/(1+β).
Proposition 8. For the Random Algorithm and any β>0, the geometric distribution with parameter β/(1+β) is the equilibrium distribution of the McKean-Vlasov process defined by SDE (28), if Y β is a random variable with such a distribution, for the convergence in distribution, where E 1 is an exponentially distributed random variable with parameter 1.
It turns out that the random algorithm behaves poorly in terms of the load of a given urn. For a large β, the asymptotic tail distribution of the occupancy of an urn at equilibrium is, as expected, the upper bound (56). The simple consequence of this result is that, if on average, there are β balls per urn, there is a significant fraction of urns with an arbitrarily large number of balls. We will see that the situation is completely different for the power of d-choices algorithm.
Proposition 9. For any β>0, there exists a unique invariant distribution π β of (L β (t)), it is given by The convergence in distribution of the McKean-Vlasov to its invariant measure obtained in Proposition 6 in a general setting is with the assumption that β is sufficiently small. Note that a related convergence (61) for the Power of Choices Algorithm is obtained for all β>0.
The proposition is proved.
The following theorem shows that the power of d-choices policy is efficient in terms of the load of an arbitrary urn, the invariant distribution of this load is asymptotically concentrated on the finite interval [0, d/(d−1)β]. Only an extra capacity β/(d−1) has to be added to the minimal capacity β for any urn in order to handle properly this allocation policy. This has important algorithmic consequences in some contexts. See Sun et al. [25].
Theorem 5 (Power of d-choices). If, for β>0, Y β is a random variable whose distribution is the unique invariant measure of the McKean-Vlasov process (L β (t)) defined by SDE (28) then, for the convergence in distribution, where U is a uniform random variable on [0, 1].
Proof. For k≥1, by summing up Equation (60) for 1 to k−1, one gets the relation = Y β /β, for x>0, Relation (56) shows that, for β 1 sufficiently large the family of random variables Z β , β≥β 1 is tight and that the corresponding second moments are bounded. Let Z be a limiting point when β gets large and h(x) def.
= P(Z≥x), x≥0. The previous relation gives the identity, for x≥ 0, It is easy to see that this relation determines h and therefore gives the desired convergence in distribution. The theorem is proved.