Concentration and local smoothness of the averaging process

We consider the averaging process on the discrete $d$-dimensional torus. On this graph, the process is known to converge to equilibrium on diffusive timescales, not exhibiting cutoff. In this work, we refine this picture in two ways. Firstly, we prove a concentration phenomenon of the averaging process around its mean, occurring on a shorter timescale than the one of its relaxation to equilibrium. Secondly, we establish sharp gradient estimates, which capture its fast local smoothness property. This is the first setting in which these two features of the averaging process -- concentration and local smoothness -- can be quantified. These features carry useful information on a number of large scale properties of the averaging process. As an illustration of this fact, we determine the limit profile of its distance to equilibrium and derive a quantitative hydrodynamic limit for it. Finally, we discuss their implications on cutoff for the binomial splitting process, the particle analogue of the averaging process.


Introduction
The averaging process on a graph is a random evolution of a probability mass function over its vertices.Its dynamics consists in repeatedly iterating the following two operations: (i) first, select an unordered pair of nearest neighbor vertices at the arrival times of i.i.d.Poisson clocks; (ii) then, level out the masses associated to them.Hence, step (i) is random, while step (ii) is deterministic.In particular, each update is performed at some i.i.d.space-time locations, and conserves the total mass.Therefore, as time runs, for any mass initialization and provided the underlying graph is connected, the averaging dynamics will converge to a deterministic flat distribution.In view of the degeneracy of the equilibrium and of the intrinsic irreversibility of the averaging dynamics, a comprehensive quantitative picture of such a convergence in relation to the underlying graph structure is, despite the recent advances, still a largely open problem of both applied and theoretical interest.
The averaging process appears in the computer science literature as a basic model of opinion dynamics (or randomized distributed/gossip algorithms, see, e.g., [BGPS06,Sha09,MSW22] and references therein), together with a long list of variants, e.g., with updates involving only one vertex at a time (see, e.g., [BCN20] and references therein).In this realm, vertices are typically interpreted as agents, their masses as corresponding to their opinions, and the long-run limiting distribution arising from the local random interactions is then referred to as the consensus.
The mathematical interest on the averaging process as an interacting particle system was revived about a decade ago by a series of lectures and expository articles by Aldous [Ald11] and Aldous and Lanoue [AL12].Since then, the list of works on the subject rapidly grew (see, e.g., [CDSZ22,Spi22,QS23,MSW22,Cao23,CQS23]).In particular, the seminal paper [AL12] was the first one to link this model to the theory of Markov chains mixing times [AF02,LP17,MT06], and established the first general bounds on the mixing of the averaging process.Only recently, [CDSZ22,QS23,CQS23] provided the first sharp asymptotic mixing results on specific geometries, either proving or excluding the occurrence of the so-called cutoff phenomenon [LP17,Chapter 18].
In this article, we consider the averaging process on large discrete d-dimensional tori.Mixing in this setting is covered by the analysis carried out in [QS23], which shows that the averaging process mixes gradually (i.e., without exhibiting an abrupt cutoff phenomenon) on diffusive timescales.Our aim is to refine this picture, by analyzing two new features of the averaging process -an early concentration phenomenon, and a fast local smoothness property -and extract from them quantitative information on its mixing and scaling behaviors.After a quick recap on the averaging process and its main properties, we present our findings first informally in Section 1.4, and then in detail in Section 2. Here, |x − y| denotes the usual graph distance between vertices x, y ∈ T d N , while η xy ∈ P(T d N ) is obtained from η ∈ P(T d N ) by updating the masses η(x) and η(y) with their average value: for all x, y and z ∈ T d N , Henceforth, although the updates occur at both random times and locations, as time runs, the deterministic averages and the conservation of mass lead the (random) distribution to become flat, for any mass initialization.In formulas, this reads as where (η ξ t ) t≥0 = (η ξ,N t ) t≥0 denotes a trajectory of Avg(T d N ) when starting at time t = 0 from ξ ∈ P(T d N ), π = π N is the uniform distribution on T d N , and the above convergence holds a.s. with respect to the random sequence of updates.For the probability law corresponding to such random updates, we write P = P N , while E = E N stands for the corresponding expectation.1.2.Mixing for the averaging process.The statement in (1.1) concerns the qualitative long-run behavior of the averaging process on a fixed-size torus.A mixing analysis aims at quantifying such a convergence in the limit as N → ∞, with the scope of better capturing the relevant timescales governing the relaxation of η ξ t , when starting from a worst-case initial condition ξ ∈ P(T d N ).As a sensible notion of distance between the law of η ξ t and that of π, we consider the (mean) L p -distance to equilibrium, given, for all N ∈ N and p ∈ [1, 2], by and define the corresponding (worst-case) L p -mixing time as the first time t ≥ 0 for which the above quantity falls below 1/2 for all initial conditions ξ ∈ P(T d N ).Note that 2) stands for the usual L p -norm on (T d N , π).Further, we remark that (1.2) may also be interpreted as the L p -Wasserstein metrics between the random η ξ t ∈ P(T d N ) and the deterministic π ∈ P(T d N ), when endowing P(T d N ) with the distance induced by • p .Note that stronger distances, e.g., total variation, are less relevant in the context of the averaging process; indeed, the equilibrium π is deterministic, while η ξ t has a non-degenerate random distribution at any time t > 0, provided that ξ = π.
Within this setting, a worst-case mixing analysis for Avg(T d N ) has been carried out in [QS23].There, the authors show that mixing occurs gradually at times t = Θ(N 2 ).This result was actually derived for a larger class of graph sequences, referred to as "finitedimensional" [QS23, Assumption 1], establishing in all these examples that mixing of the averaging process was dictated by that of the corresponding simple random walk on the same graph.Other settings for which sharp asymptotics for the L p -mixing times, with p = 1, 2, have been established are the hypercube [CQS23], and the complete and complete bipartite graphs [CDSZ22,CQS23].In these last three examples, cutoff occurs.For a quick overview on these results and on some general tools recently developed for the study of mixing times of the averaging process, we refer the interested reader to [CQS23, Section 2].1.3.Averaging process as a noisy heat flow.The averaging process may also be viewed as the probability distribution of a random walk (U t ) t≥0 in a highly degenerate dynamic random environment, in which the time-dependent conductances between nearest neighbors are either zero, or become "instantaneously infinite" as soon as the Poisson clock associated to the pair rings.At the occurrence of such an event, the walk originally sitting on one of the vertices will find itself with equal probability on either one of the two nearest neighbors.
With this interpretation, while a realization of η ξ t ∈ P(T d N ) coincides with the distribution of U t for a quenched realization of the random environment, the mean E[η ξ t ] ∈ P(T d N ) shall be considered as the corresponding annealed law.As already observed in [AL12, Lemma 1] and exploited in [QS23,CQS23] (see also [Tra23,Section 6.3] for a recent use of this fact), t → E[η ξ t ] describes the heat flow of RW(T d N ), the "lazy" continuous-time simple random walk (X t ) t≥0 on T d N , with X 0 ∼ ξ and infinitesimal generator L RW = L RW

N
given, for all ψ ∈ R T d N , by (1.3) Hence, letting P RW ξ denote the law associated to this walk and π ξ t its distribution at time t ≥ 0, we obtain In other words, (1.4) states that the "noisy" jump process η ξ t has the heat flow π ξ t as its mean.Although the heat flow does not necessarily identify the correct mixing behavior of the averaging process on all graphs, we will prove that a strong form of concentration of η ξ t around π ξ t holds on T d N .
1.4.Summary of main results.Let us provide a preliminary informal description of the two main features of Avg that we investigate in this article, together with some of their applications.In what follows, the standard asymptotic notation "O( • )", "o( • )", "Θ( • )", "≪", etc., refers to the limit N → ∞.
In what follows, we focus on worst-case initial conditions ξ ∈ P(T d N ), which, by simple convexity arguments (see, e.g., [CQS23, Section 2.3]), are all Dirac measures: for all t ≥ 0, sup (The analogous identity for RW(T d N ) is well-known.)Here and all throughout, we write η t (x, • ) = η ξ t and, similarly, do not depend on x ∈ T d N ; hence, for notational convenience, we consider η t (0, • ) and π t (0, • ), i.e., we fix the location of the initial mass as x = 0 ∈ T d N all throughout.
1.4.1.Early concentration phenomenon.The idea adopted in [QS23] in order to identify the correct timescale at which the averaging process mixes is, in a nutshell, the following.First, identify the timescale at which RW mixes; there, compare the distanceto-equilibrium of Avg with the one of RW, and prove that the resulting error becomes arbitrarily small on that same timescale.More in detail, this strategy starts from two estimates.The first one -the lower bound in [QS23, Lemma 5.1], which is an immediate consequence of Jensen inequality and (1.4) -states that mixing for η t (0, • ) does not occur before that of π t (0, • ): for all p ∈ [1, 2] and t ≥ 0, The second one -a slight sharpening of the upper bound in [QS23, §5.2] -follows by the triangle inequality for the L p -Wasserstein metrics in (1.2) and Jensen inequality: for all p ∈ [1, 2] and t ≥ 0, In view of these two bounds, if one expects the mixing of η t (0, • ) to be dictated by that of π t (0, • ), the key to get matching lower and upper bounds in (1.5)-(1.6)consists in efficiently estimating the second term on the right-hand side of (1.6).
By implementing this strategy for the specific example of T d N , since π t (0, • ) mixes gradually at times t = Θ(N 2 ), showing readily implies that also η t (0, • ) mixes at times t = Θ(N 2 ).Proving (1.7) is the main step in the proofs of [QS23, Proposition 2.8 & Theorem 2.9].Our first main goal is the following improvement of (1.7): (1.8) Since N 2d/(d+2) ≪ N 2 for any d ≥ 1, (1.8) captures a new timescale -dimensiondependent, but always shorter than the diffusive one -after which η t (0, • ) and its expectation π t (0, • ) stay close to each other, although both of them are still far from stationarity.For this reason, we refer to such a property of the averaging process as an early concentration phenomenon.For the precise quantitative version of this result, see Theorem 2.4 below.
In view of the bounds (1.5) and (1.6) -which hold true on any graph -early concentration of Avg may be considered as a stronger form of mixing, whenever Avg and RW mix on the same timescale.On the one hand, this suggests that, just like the mixing behaviors of Avg and RW are comparable on a large class of geometries (see, e.g., [QS23, Proposition 2.8] and [CQS23, Theorem 1.1]), early concentration should not be an exclusive instance of the example of Avg(T d N ) considered here, and should be investigated in settings in which scaling limits of discrete gradients of the underlying heat flow are available.On the other hand, this automatically excludes early concentration from settings like the complete and complete bipartite graphs [CDSZ22,CQS23], in which mixing of Avg occurs on a strictly longer timescale than that of RW.For what concerns L 2 -mixing, this is further explained by the following identity, which resembles the inequality in (1.6) with p = 2: for all t ≥ 0, (1.9) The proof of (1.9) combines the definition of L 2 -norm in (1.2) and the identity in (1.4).
A quantitative version of (1.8) bears several useful consequences related to both mixing and scaling limits of Avg(T d N ).For instance, as a refinement of mixing, our estimate combined with a quantitative local CLT for RW(T d N ) identifies the limit profile for Avg(T d N ): for all p ∈ [0, 1] and t > 0, where h t (0, • ) is the heat kernel of the diffusion on T d with generator 1 2 ∆.We detail this result and its simple proof in Proposition 2.5 below.For related recent results about limit profiles also in absence of cutoff, see, e.g., [BDCJ22].
Along the same lines, (1.8) with t N = tN 2 may be also interpreted as a quantitative hydrodynamic limit for Avg(T d N ) (Corollary 2.6).Here, the heat equation ∂ t ρ = 1 2 ∆ρ on T d arises as a hydrodynamic equation, and convergence is established in the stronger L p -Wasserstein metrics rather than in probability, as most typically done (see, e.g., the classical monograph [KL99]).1.4.2.Fast local smoothness.Mixing of the averaging process typically deals with measuring how far the averaging process η t (0, • ) is from its global equilibrium π.Next to this measure of distance-to-equilibrium, Aldous and Lanoue [AL12, Section 2.3] proposed a natural notion of local smoothness, aiming at quantifying more accurately the local flatness of η t (0, • ).As intuitively clear, local smoothness should occur at least as fast as global mixing and, as we will show in Theorem 2.4, is intimately related in our context to the phenomenon of early concentration discussed in the previous paragraph.Nevertheless, analyzing local smoothness in general settings turns out to be a delicate issue, and, compared to the range of tools recently developed for the study of global mixing for Avg, techniques and results for local mixing are rather limited (see, e.g., [AL12, Proposition 4] and [BBG20, Corollary 1]).
Following [AL12] and the analogy with the corresponding heat flow, we employ the (mean) Dirichlet form to quantify local roughness of the averaging process.Indeed, while L p -distances are natural quantities to measure the distance to equilibrium of π t (0, • ), the Dirichlet form associated to RW(T d N ) with generator L RW (1.3), given by relates to an infinitesimal, thus, local, stage of the convergence of π t (0, • ), as encoded in In [AL12, Proposition 2], the authors prove an analogous identity for η t (0, • ): (1.11) These two relations are, indeed, similar, although differing by a factor 2; this dissimilarity is reflected by different exponential decay rates for the (mean) L 2 -distances.Nevertheless, both (1.10) and (1.11) show that quantifying the decay rates of the terms on the righthand side implies refined mixing results for π t (0, • ) and η t (0, • ), respectively.A first simple estimate for the (mean) Dirichlet form along the averaging process may be obtained via a convexity argument as done to derive (1.5).Indeed, Jensen inequality and the identity (1.4) yield the following lower bound: (1.12) Our main contribution is to show that, for Avg(T d N ), the inequality in (1.12) can actually be reversed as follows at the negligible cost of including some dimension-dependent constant C = C(d) > 0 and a term φ(t/N d ) which diverges only for very large times t ≫ N d .Ultimately, since the Dirichlet form along the heat flow π t (0, • ) may be efficiently estimated on T d N , this yields a quantitative control for the local smoothness of Avg(T d N ) for all times t ≥ 0. This allows, in particular, to capture both the fast local smoothening (on the same timescale at which early concentration in (1.8) occurs) of Avg(T d N ), as well as the correct exponential contraction rate for large times.As an immediate consequence of this, we obtain, by integrating over time this bound, the corresponding control for the global distance-toequilibrium (Corollary 2.3).The latter result and (1.13) are both employed in Section 7.
The bound in (1.13) and an improved version of (1.12) are the content of Theorem 2.1, and together represent the first sharp results concerning local smoothness for the averaging process.It is worth to emphasize that an analogous result, although for a different model, has been recently proven by Banerjee and Burdzy in [BB21, Theorem 2.6].There, the authors study (among other things) the smoothing process on T d N , a model discussed by Liggett within the class of linear systems [Lig05, Chapter IX].Like the averaging process, also the smoothing process falls into the class of Markovian mass redistribution models.However, since averages are performed over one vertex at the time, the smoothing dynamics does not conserve mass.This seemingly marginal variation leads to some qualitative differences between the two processes.For instance, equilibrium for the smoothing process is a truly random mass profile, while this is not the case for the averaging.Moreover, as proved in [BB21, Proposition 2.5], averaging over a single vertex deterministically does not increase the value of E( ηt π ); as simple examples show, this monotonicity property does not hold when performing our edge-averaging dynamics.Nevertheless, despite these dissimilarities, we import some of their arguments into our analysis, yielding comparable local smoothness behaviors of the two processes when considered on T d N .As already anticipated, fast local smoothness comes with some relevant applications.One of them, and probably the most important, is the early concentration phenomenon (Theorem 2.4).As a second application, we show how to transfer the integral information encoded in the (mean) Dirichlet form into pointwise gradient estimates for Avg(T d N ).This result is presented in Proposition 2.2, and may be viewed as an analogue of the annealed gradient estimates for random walks in dynamic random environment recently obtained in [DKS23].

Main results
We are now ready to present our main results and some of their consequences in full detail.Before that, recalling the definition of the random walk RW(T d N ) (Section 1.3), we define λ = λ N > 0 as its spectral gap, namely the smallest non-zero eigenvalue of the (negative) generator In what follows, B, C, C 1 , C 2 , . . ., C ′ 1 , C ′ 2 , . . .> 0 denote constants whose exact value is unimportant and may change from line to line.Moreover, unless stated otherwise, such constants may depend on d ≥ 1, but not on other variables, e.g., N ∈ N and t ≥ 0.

Local smoothness and gradients estimates.
The following is the first of our main results.
Theorem 2.1 (Local smoothness).For all d ≥ 1, for all N ∈ N large enough, and for all t ≥ 0, we have (2.1) for some constants B, C 1 , C 2 > 0 (depending only on d ≥ 1).
Although we formulated the above result as a comparison between the (mean) Dirichlet form of Avg(T d N ) and that of RW(T d N ), the latter quantity is rather explicit thanks to the full knowledge of eigenvalues and eigenfunctions for RW(T d N ) (see also Section 4.1 below for more details).Indeed, one obtains, for all N ∈ N large enough and for all t ≥ 0, for some constants C ′ 1 , C ′ 2 > 0 (depending only on d ≥ 1).Plugging the bounds in (2.2) into (2.1)allows to quantify the fast local smoothness of Avg(T d N ).Not only there follows that but we also extract quantitative information for t ≪ N 2 and t ≫ N 2 .In particular, as long as t = O(N 2 ), the resulting upper bound states that η t (0, ), the exponential factor in the upper bound of (2.1) plays no significant role as long as t = O(N d+2 ), a timescale much longer than the diffusive one.As for the lower bound in (2.1), the factor (1 + C 1 (t ∧ 1)) is a seemingly irrelevant improvement upon (1.12).However, this term is strictly larger than one as soon as t > 0, suggesting that, even after a very small time, the fluctuations of the random gradients of η t (0, • ) become comparable with their mean.
Local smoothness is an integral (mean) quantity regarding the gradients of the averaging process.As a consequence of the estimate in Theorem 2.1, we derive pointwise (rather than integral) estimates for such gradients.We remark that, keeping in mind the interpretation from Section 1.3 of η t (x, y) as the distribution of a random walk in a dynamic random environment, such bounds may be thought of as the analogues (for p = 2) of the annealed gradient estimates recently obtained in [DKS23, Theorem 1.6(i)] for the time-dependent random conductance model with space-time ergodic and uniformly elliptic environment on Z d .We emphasize that, although our result resembles that in [DKS23], the two proofs are quite different.We postpone the proof of the following result to Section 6. Proposition 2.2 (Pointwise gradient estimates).For all d ≥ 1, for all N ∈ N large enough, for all t ≥ 0, and for all x, y ∈ T d N with |x − y| = 1, we have for some constants B, C > 0 (depending only on d ≥ 1).
Let us conclude this discussion on local smoothness with an important result on mixing of Avg(T d N ), which will turn out useful in Section 7, and which is immediately derived by integrating the infinitesimal relation (1.11) over the time interval [t, +∞), combined with the upper bounds in (2.1) and (2.2).
Corollary 2.3.For all d ≥ 1, for all N ∈ N large enough, and for all t ≥ 0, we have for some constants B, C > 0 (depending only on d ≥ 1).
2.2.Concentration and limit profile.We now present our second main result, whose proof is based on the following identity (which we prove in Lemma 5.1) and the findings in Theorem 2.1.
The proof of the following consequence of Theorem 2.4 may be found in Section 6.
Proposition 2.5 (Limit profile).Fix d ≥ 1, and let h t (0, u), u ∈ T d and t ≥ 0, be the heat kernel of the diffusion on T d with generator 1 2 ∆, started from the origin 0 ∈ T d .Then, for all bounded intervals [a, b] ⊂ (0, +∞), there exists C = C(a, b, d) > 0 satisfying, for all p ∈ [1, 2] and for all N ∈ N large enough, As already discussed in Section 1.4.1, specializing the upper bound in (2.3) to times t = Θ(N 2 ) yields a quantitative hydrodynamic limit for Avg(T d N ).We report the precise statement below.Its simple proof is analogous to that of Proposition 2.5 and, thus, is left to the reader.
Then, for all bounded intervals [a, b] ⊂ (0, +∞), there exists C = C(a, b, d) > 0 satisfying, for all N ∈ N large enough, for all ξ ∈ P(T d N ), and 2.3.Structure of the paper.The rest of the paper is organized as follows.In Section 3, we discuss some preliminary facts which allow us to conveniently reformulate our problem.In Sections 4 and 5 we present the proofs of our main results, Theorems 2.1 and 2.4, respectively.The proofs of Propositions 2.2 and 2.5 are the content of Section 6.As a further application of our main results, in Section 7, we prove cutoff (in total variation distance) for an interacting particle system dual to the averaging process, known as binomial splitting process, strengthening a result in [QS23].

Preliminaries
In this section, we introduce two auxiliary Markov processes, whose properties will play a key role in Sections 4 and 5 below, dedicated to the proofs of our main results.
3.1.Coupled random walks.As discussed in (1.4), first-order moments of the averaging process may be expressed in terms of the random walk X t .This property is a particular instance of duality (see, e.g., [Lig05]).This connection was already noted in [AL12], further exploited and generalized in [QS23,CQS23], and holds more generally for all k th -order moments.In particular, in order to prove our main results, we will need to introduce a dual system of k = 2 interacting walks, which we now describe in detail.
Place two particles on the vertices of T d N , and endow each unordered pair of nearest neighbors of T d N with exponential clocks of unit rate, independent over the pairs.Assume that the clock associated to the pair {x, y}, |x − y| = 1, rings.Particles sitting at that time on either x or y decide, independently from each other and with probability 1/2, to change vertex, i.e., moving from x to y if originally at x, or vice versa.Let ((X t , Y t )) t≥0 ⊂ T d N ×T d N denote the Markov process of the positions of these two particles.Both marginals X t and Y t evolve as two copies of RW(T d N ).Moreover, they move independently as long as they are at graph distance larger than one.However, when sitting at distance smaller than two, the two walks may interact by experiencing synchronous jumps.For this reason, we refer to (X t , Y t ) as a system of two coupled random walks, shortly CRW(T d N ).We let P CRW ν denote its law when (X 0 , Y 0 ) ∼ ν.The importance of CRW(T d N ) lies in the following second-order duality relation, analogous to that in (1.4): Proof.By definition of E( • ) and the duality relation (3.1), we obtain We get the desired conclusion by rearranging the terms in the last summation as follows: , where for the last identity we used the translation and permutation invariance of the marginals laws of (X t , Y t ), as well as (X t , Y t ) Law = (Y t , X t ) under P CRW 0,0 .
3.2.Difference process.Next to CRW(T d N ), motivated by the result in Proposition 3.1, we consider a second auxiliary process.Given ((X t , Y t )) t≥0 evolving as CRW(T d N ), define (Z t ) t≥0 as Z t := X t − Y t , t ≥ 0 .We refer to Z t as the difference process associated to (X t , Y t ).In our context, Z t turns out to be a Markov process.Indeed, if we were considering the difference process Z 0 t := X 0 t −Y 0 t associated to (X 0 t , Y 0 t ), two independent copies of RW(T d N ), then it is well-known that Z 0 t is the simple random walk on T d N with infinitesimal generator A 0 = 2L RW , i.e., Z 0 t jumps like RW(T d N ), but at a double rate.As for Z t , due to the interaction between X t and Y t when |X t − Y t | ≤ 1, Z t moves like Z 0 t with a defect represented by slow bonds attached to the origin.More in detail, the infinitesimal generator A of Z t reads as where we recall that A 0 = 2L RW is the generator of Z 0 t , while, for all ψ : Moreover, for all t ≥ 0, let S t = e tA (resp.S 0 t = e tA 0 ) denote the transition kernels of the random walk Z t (resp.Z 0 t ).Observe that, since both A 0 and R are symmetric kernels, A and S t are symmetric, too.Symmetry of S t and Proposition 3.1 imply the following result.

Proof of Theorem 2.1
This section is devoted to the proof of Theorem 2.1, which we split into three parts.First, we exploit the representations in (3.3) and (3.4) to express the mean Dirichlet form of Avg(T d N ) as an infinite series (Section 4.1).Then, for this infinite series, we provide lower and upper bounds in Sections 4.2 and 4.3, respectively.We gather all these facts together in Section 4.4.

4.1.
A renewal-type equation.Our main goal in this section is to find a closed-form expression for the remainder in the inequality (3.5).For this purpose, we employ the identities in (3.3) and (3.4) involving Dirichlet forms and the transition kernels S t and S 0 t , respectively, both introduced in Section 3.2.Let us further recall the definition of the associated generators A = A 0 + R and A 0 from Section 3.2.
We start by applying the integration by parts formula: for all x ∈ T d N and t ≥ 0, (S s (y, 0) − S s (0, 0)) ds By evaluating the above expression at x = 0 and x = e ∈ T d N , and subtracting, we get S t (0, 0) − S t (e, 0) = S 0 t (0, 0) − S 0 t (e, 0) (4.1) By exploiting the symmetry and translation invariance of the random walk Z 0 t , we may simplify the summation above as follows: for e, e ′ ∈ T d N with |e| = |e ′ | = 1 and e ′ = ±e, 1 2 S 0 t (0, 0) − S 0 t (0, y) − S 0 t (e, 0) + S 0 t (e, y) t (e, 0) + (d − 1) S 0 t (e + e ′ , 0) + 1/2 S 0 t (2e, 0) .Hence, adopting the shorthand notation f (t) := (d + 1/2) S 0 t (0, 0) − 2d S 0 t (e, 0) + (d − 1) S 0 t (e + e ′ , 0) + 1/2 S 0 t (2e, 0) , (4.2) g(t) := S 0 t (0, 0) − S 0 t (e, 0) , u(t) := S t (0, 0) − S t (e, 0) , (4.3) the identity in (4.1) reads as the following renewal equation for which, by iterating this integral relation, a solution may be expressed in terms of an infinite series expansion involving only the functions f and g: Here, the symbol * denotes the usual convolution for functions defined on R, although we will always apply it to functions which vanish on (−∞, 0).Let us further remark that, with our notation, f * 1 = f .Moreover, note that both f and g are bounded and continuous.Therefore, in order to ensure that u = k g * f * k is the unique solution to (4.4), it suffices to show that the functions f and g (and, thus, their iterated convolutions) are non-negative.We prove this property in Lemma 4.1 below.In the same lemma, we gather other simple properties of these functions to be employed later.
Instrumentally to the proof of this lemma, we remark that the functions f and g may be further simplified, since they only depend on transition probabilities of the simple random walk Z 0 t (without defects) on T d N and Z 0 t moves independently among each of the d components.Indeed, letting p t (i) = p t (i, 0), for i ∈ T N and t ≥ 0, denote the transition probabilities to the origin of the one-dimensional simple random walk on T N (i.e., Z 0 t for d = 1), we obtain, for all d ≥ 1, x = (i 1 , . . ., i d ) ∈ T d N and t ≥ 0, Henceforth, we have, for all d ≥ 1, and, for d ≥ 2, In what follows, we will repeatedly employ this convenient rewriting in combination with the explicit eigendecomposition (or, equivalently, the Laplace inversion formula) of p t (i) (see, e.g., [LP17, Section 12.3.1]):for all i ∈ T N and t ≥ 0, Lemma 4.1.For every d ≥ 1, we have: (a) For all t > 0, f (t) and g(t) are positive and uniformly (with respect to N ∈ N) bounded away from zero.(b) For all N ∈ N large enough, f and g are decreasing.(c) For all N ∈ N large enough, f ≤ (d + 1/2) g.

Proof.
By the inversion formula (4.9), we obtain at once that t −→ p t (0) and t −→ p t (0) − p t (1) 4.2.Lower bound in Theorem 2.1.In view of (3.3)-(3.4),(4.2)-(4.3),and the infinite series representation in (4.5), we recast the lower bound in (2.1) in terms of the functions u(t) and g(t).This is the content of the following lemma.Remark that, as in Theorem 2.1, for t > 0, we aim at a strict, uniform in N ∈ N, inequality u g, improving upon the obvious estimate u ≥ g given in (3.5).
In what follows, we derive (4.12) by showing that holds for all k ∈ N. We show (4.13) by induction on k ∈ N. The claim for k = 1, namely t 0 f (s) ds ≥ tf (t), clearly holds since f is decreasing.Fix k ≥ 2, and assume (4.13) to be true for k − 1; then, for all t ≥ 0, where for the first and third inequalities we used the monotonicity and positivity of f , while for the second one we used the induction hypothesis.This shows the validity of (4.13).The desired conclusion now follows by the infinite-series representation of u(t) in (4.5), as well as the monotonicity of g and the positivity of f (Lemma 4.1): for all t ≥ s ≥ 0, where the last inequality is a consequence of (4.13) and positivity of g.Optimizing over s ≤ t yields (4.12) and, thus, concludes the proof.

4.3.
Upper bound in Theorem 2.1.Also for the derivation of the upper bound in Theorem 2.1, we exploit the infinite-series representation (4.5) for u involving convolutions of the functions f and g.However, since we are not interested in optimizing constants in this case, we may employ f ≤ (d + 1/2) g from Lemma 4.1(c) and positivity of f and g (Lemma 4.1(a)) to estimate u(t) from above by ũ(t), defined as Hence, the following estimate combined with (2.2) (cf.(4.3) and (3.4)) would conclude the proof of the upper bound in Theorem 2.1.
Lemma 4.3.For all d ≥ 1 and for all N ∈ N large enough, we have for some constants B, C > 0 (depending only on d ≥ 1).
Before entering the details of the proof of Lemma 4.3, we remark that, compared to the bound for some C ′ = C ′ (d) > 0, estimating the infinite series in (4.15) requires an extra factor exp Bt N d+2 .With this approach , this exponential factor is actually unavoidable, as we now explain.Recall the lower bound for g (thus, for g, too) in (2.2), analogous to (4.16): for some Finally, letting c = c(t) := C ′′ N d+2 1 [0,∞) (t), we get g(t) ≥ c exp (−2t/t rel ) and, thus, Proof of Lemma 4.3.Since we can follow closely the proof in [BB21, Lemma 5.6], our task boils down to establishing the analogues of the two preliminary lemmas therein, namely [BB21, Lemmas 5.4 & 5.5].
From this stage, the rest of the proof follows as in [BB21, Lemma 5.6]; we streamline the main arguments in [BB21, pp. 1150[BB21, pp. -1153] ] for the reader's convenience, while implementing the small modifications required.Recall θ ∈ (0, 1) from (4.18), and fix σ ∈ (0, 1) such that θ/σ ∈ (0, 1).For such a σ ∈ (0, 1), we define the sequence Next, by adopting the following shorthand notation for the integrand in (4.18), we claim that the following holds true: for all k ∈ N and t > 0, We refer to [BB21, Eq. (5.35) & pp.1150-1151] for the proof by induction of this inequality.Here, we only remark that t → h(t) given in (4.24) (and, thus, the right-hand side of (4.25)) is a non-increasing function; this fact, (4.23), and the inequalities in (4.18) and (4.24) are the only inputs required for the proof of (4.25).
By (4.25) and the monotonicity of t → h(t), we obtain, for all t > 0, Now, for t ≤ 2N 2 , since ta * a m ≤ ta m ≤ N 2 , we further get, for some , where for the last inequality we used that θ/σ ∈ (0, 1).Analogously, we have this concludes the proof of the lemma.

Proof of Theorem 2.4
We divide the proof of Theorem 2.4 on the phenomenon of early concentration into two parts.First, we express the quantity of interest in terms of (mean) Dirichlet forms for Avg(T d N ) and RW(T d N ).This is carried out in Lemma 5.1, by specializing some expressions obtained in [QS23,CQS23] to our setting.Finally, thanks to this rewriting, we exploit the estimates (and the proof arguments) from Theorem 2.1 to conclude.
Proof.Let N t denote the left-hand side of (5.1).Then, by the last two displays in [CQS23, Proposition 2.5], we have (π s (0, x) − π s (0, y)) 2 Φ t−s (x, y) ds , (5.3) where Φ t (x, y) is defined in terms of CRW(T d N ) (Section 3.1) as follows: By translation invariance of the dynamics of CRW(T d N ) and recalling the definition of the difference process Z t and its kernel S t (Section 3.2), we obtain, for all nearest neighbor vertices x, y , where e ∈ T d N is any vertex satisfying |e| = 1.Plugging this identity into (5.3),we get The identities in (5.1) and (5.2) follow by (3.3) and (3.4), respectively.
Proof of Theorem 2.4.Let us adopt the notation from Section 4. Recalling the definitions of u(t) and g(t) in (4.3), the identity in (5.2) reads as For the upper bound in (2.3), since u ≤ ũ and 0 ≤ g ≤ g (cf.(4.5) and (4.14)), we obtain and observe that the right-hand side was estimated, for all t > 0 and N ∈ N large enough, in Lemma 4.3.This provides the desired estimate for t ≥ 1; for t ∈ [0, 1], it suffices to recall g ≤ u ≤ 1 (cf.(4.5) and (4.26)), yielding For the lower bound in (2.3), using again 0 ≤ g ≤ u and the monotonicity of t → g(t) (Lemma 4.1(b)), we get, for all t ≥ 0, where the last inequality follows from Lemma 4.1(a)-(b).The first inequality in (2.2) and (3.4), (4.3) yield the desired lower bound, thus, concluding the proof of the theorem.
6. Proofs of Propositions 2.2 and 2.5 We start with the proof of Proposition 2.2.
Proof of Proposition 2.2.Recall that, for P-a.e. realization of the Poisson point process used for the updates, η t (0, • ) may also be interpreted as the probability distribution over T d N of the "infinitesimal chunk" of mass U 0,t ∈ T d N , which is a time-inhomogeneous random walk, started in 0 ∈ T d N , and evolving at later times as follows: nothing happens, until the vertex on which it sits, say x ∈ T d N , experiences an update with a nearest neighbor, say y ∈ T d N ; in that case, the chunk moves from x to y with probability 1/2, while with the remaining probability it stays put.Since we are describing a timeinhomogeneous Markov process, we adopt, only in this proof, the slightly more convenient notation η s,t ( • , • ), s ≤ t, indicating both starting and terminal times.
In view of this representation, Chapman-Kolmogorov formula holds in our context, and reads as follows: for all 0 ≤ s ≤ t, η 0,t (0, x) − η 0,t (0, y) = z∈T d N η 0,s (0, z) (η s,t (z, x) − η s,t (z, y)) . (6.1) Then, by (6.1) and Cauchy-Schwarz inequality, we get, for all 0 ≤ s ≤ t, where the third step is a consequence of the fact that the Poisson updates over the time intervals (0, s) and (s, +∞) are independent, while for the fourth one we used (1.4).By using the well-known heat kernel estimate π s (0, z) where for the last step we used the duality relation (3.1), the reversibility of CRW(T d N ) with respect to the counting measure on T d N × T d N , and Proposition 3.1.After inserting s = t/2 ∧ N 2 in this last estimate, the upper bounds in (2.1) and (2.2) yield the desired result.
We then conclude this section with the proof of Proposition 2.5 about the limit profile.
Proof of Proposition 2.5.Recall that, although we omit N ∈ N from the notation, • p denotes the L p -norm on (T d N , π).By the triangle inequality and (1.5)-(1.6),we get where the last term has been estimated via a first-order Taylor expansion (recall that h t ∈ C ∞ (T d ), for all t > 0).The first term on the right-hand side above is O( 1 N ) thanks to Theorem 2.4, while the second term is o( 1 N ) by the quantitative local CLT for RW(T d N ) (see, e.g., [LL10] for the analogous result on Z d ).Since all these estimates are uniform over finite time intervals bounded away from zero and infinity, this concludes the proof.
7. An application to cutoff of the dual particle system As a further consequence of Corollary 2.3 and the quantitative concentration result in Theorem 2.4, we provide sharp estimates for the mixing time of the particle system discussed in [QS23], dual to the averaging process, known as binomial splitting process.In a few words, this model is the k-particle generalization of the processes RW(T d N ) and CRW(T d N ) previously introduced.Here, we provide a quick description of this particle system and refer to [QS23, Section 2.1] (see also [PR23, Section 1.2]) for more details, results, and background.
Given the discrete torus T d N , d ≥ 1, and an integer k ∈ N, let Ω k = Ω N,k denote the configuration space of k indistinguishable particles on T d N , namely, On such a configuration space, we consider (ζ k t ) t≥0 = (ζ N,k t ) t≥0 as the Markov process which evolves as follows: start with a particle configuration ζ ∈ Ω k at time t = 0; at the Poisson event times of Avg(T d N ) involving, say, the nearest neighbors x, y ∈ T d N , reassign, independently and uniformly at random, x or y as the new position of each particle which was originally sitting on either x or y.The name "binomial splitting" is now explained: the update corresponds to placing on x a binomially-distributed fraction of the total ζ(x) + ζ(y) particles.Further, the dynamics conserves the total number of particles, and, although particles interact when performing simultaneous jumps, at equilibrium, they are distributed as if they were independent.Indeed, a simple detailed balance computation ensures that µ k,π = Multinomial(k, π) , is the unique reversible measure for the particle system with k particles.Finally, it is immediate to see that the particle systems with k = 1 and k = 2 particles correspond, respectively, to the processes RW(T d N ) and CRW(T d N ) described in Sections 1.3 and 3.1, once the particles' labels are removed.
We are interested in proving a total variation (TV) cutoff phenomenon for this process, as k = k N → ∞, strengthening the result in [QS23,Theorem 2.3  where for the second inequality we used the stochastic domination Z k,q Z k,p if q > p, while for the last one we used Cantelli inequality.Furthermore, provided that e −aw k /2 > 4c, we get, again by Cantelli inequality, Hence, we are done as soon as we can show that σ 2 /r 2 → 0. Recall (7.6) and e −aw k > 4c; then, we have where for the second line we used Jensen inequality, while for the third one we used the second inequality in (2.

1. 1 .f
Setting and model definition.For every integer d ≥ 1, let T d N := (Z/N Z) d be the discrete d-dimensional torus of size N ∈ N.Moreover, let P(T d N ) denote the space of probability mass functions on T d N , namely,P(T d N ) := η ∈ [0, 1] T d N x∈T d N η(x) = 1 .For such a compact subset P(T d N ) of R T d N , we further let C(P(T d N )) indicate the Banach space of continuous functions on P(T d N ) endowed with the uniform norm.The averaging process on T d N (shortly, Avg(T d N )) is the Markov jump process evolving on P(T d N ) and with infinitesimal generator L = L N given, for all f ∈ C(P(T d N )(η xy ) − f (η) , η ∈ P(T d N ) .
.1) In analogy with the notation used for RW(T d N ), when ξ ⊗ξ ′ = 1 x ⊗1 y for some x, y ∈ T d N , we simply write P CRW x,y = P CRW ξ⊗ξ ′ .The introduction of CRW(T d N ) and the duality relation with Avg(T d N ) allow us to efficiently rewrite the mean Dirichlet form of Avg(T d N ).Proposition 3.1.For all d ≥ 1, N ∈ N, t ≥ 0 and e ∈ T d N with |e| = 1, we have
].For this purpose, let, for all k ∈ N, (P k t ) t≥0 denote the Markov transition kernel associated to (ζ k t ) t≥0 , and write µP k t for the distribution of ζ k t when ζ k 0 ∼ µ, for some µ ∈ P(Ω k ).(Here, in analogy with P(T d N ), P(Ω k ) stands for the space of probability distributions on Ω k .)Further, we consider d Lower bound (a).The approach we follow goes via the intertwining relation (7.4), which we employ as follows: d k (t) ≥ µ k,ξ P k t − µ k,π TV = sup k (t) = d N,k (t) := sup µ∈P(Ω k ) µP k t − µ k,π TV , t ≥ 0 ,