A lecture on the averaging process

: To interpret interacting particle system style models as social dynamics, suppose each pair { i,j } of individuals in a ﬁnite population meet at random times of arbitrary speciﬁed rates ν ij , and update their states according to some speciﬁed rule. The averaging process has real-valued states and the rule: upon meeting, the values X i ( t − ) ,X j ( t − ) are replaced by 12 ( X i ( t − ) + X j ( t − )) , 12 ( X i ( t − ) + X j ( t − )). It is curious this simple process has not been studied very systematically. We provide an expository account of basic facts and open problems.


Introduction
The models in the field known to mathematical probabilists as interacting particle systems (IPS) are often exemplified, under the continuing influence of Liggett [9], by the voter model, contact process, exclusion process, and Glauber dynamics for the Ising model, all on the infinite d-dimensional lattice, and a major motivation for the original development of the field was as rigorous study of phase transitions in the discipline of statistical physics. Models with similar mathematical structure have long been used as toy models in many other disciplines, in particular in social dynamics, as models for the spread of opinions, behaviors or knowledge between individuals in a society. In this context the nearest neighbor lattice model for contacts between individuals is hardly plausible, and because one has a finite number of individuals, the finite-time behavior is often more relevant than time-asymptotics. So the context is loosely analogous to the study of mixing times for finite Markov chains (Levin-Peres-Wilmer [8]).
A general mathematical framework for this viewpoint, outlined below, was explored in a Spring 2011 course by the first author, available as informal beamer slides [2]. In this expository article (based on two lectures from that course) we treat a simple model where the "states" of an individual are continuous.

The general framework and the averaging process
Consider n agents (interpret as individuals) and a nonnegative array (ν ij ), indexed by unordered pairs {i, j}, which is irreducible (i.e. the graph of edges corresponding to strictly positive entries is connected). Interpret ν ij as the "strength of relationship" between agents i and j. Assume • Each unordered pair i, j of agents with ν ij > 0 meets at the times of a rate-ν ij Poisson process, independent for different pairs.
Call this collection of Poisson processes the meeting process. In the general framework, each agent i is in some state X i (t) at time t, and the state can only change when agent i meets some other agent j, at which time the two agents' states are updated according to some rule (deterministic or random) which depends only on the pre-meeting states X i (t−), X j (t−). In the averaging process the states are the real numbers R and the rule is A natural interpretation of the state X i (t) is as the amount of money agent i has at time t; the model says that when agents meet they share their money equally. A mathematician's reaction to this model might be "obviously the individual values X i (t) converge to the average of initial values, so what is there to say?", and this reaction may explain the comparative lack of mathematical literature on the model. Here's what we will say.
• If the initial configuration is a probability distribution (i.e. unit money split unevenly between individuals) then the vector of expectations in the averaging process evolves precisely as the probability distribution of an associated (continuous-time) Markov chain with that initial distribution (Lemma 1). • There is an explicit bound on the closeness of the time-t configuration to the limit constant configuration (Proposition 2). • Complementary to this global bound there is a "universal" (i.e. not depending on the meeting rates) bound for an appropriately defined local roughness of the time-t configuration (Propostion 4). • There is a duality relationship with coupled Markov chains (section 2.4).
To an expert in IPS these four observations will be more or less obvious, and three are at least implicit in the literature, so we are not claiming any essential novelty. Instead, our purpose is to suggest using this model as an expository introduction to the topic of these social dynamics models for an audience not previously familiar with IPS but familiar with the theory of finite-state Markov chains (which will always mean continuous time chains). This "suggested course material" occupies section 2. In a course one could then continue to the voter model (which has rather analogous structure) and comment on similarities and differences in mathematical structure and behavior for models based on other rules. We include some such "comments for students" here.
As a side benefit of the averaging process being simple but little-studied and analogous to the well-studied voter model, one can easily suggest small research projects for students to work on. Some projects are given as open problems in section 3, and the solution to one (obtaining a bound on entropy distance from the limit) is written out in section 3.4.

Related literature
The techniques we use have been known in IPS for a very long time, so it is curious that the only literature we know that deals explicitly with models like the averaging process is comparatively recent. Three such lines of research are mentioned below, and are suggested further reading for students. Because more complicated IPS models have been studied for a very long time, we strongly suspect that the results here are at least implicit in older work. But the authors do not claim authoritative knowledge of IPS.
Shah [12] provides a survey of gossip algorithms. His Theorems 4.1 and 5.1 are stated very differently, but upon inspection the central idea of the proof is the discrete-time analog of our Proposition 2, proved in the same way. Olshevsky and Tsitsiklis [11] study that topic under the name distributed consensus algorithms, emphasizing worst-case graph behavior and time-varying graphs.
The Deffuant model is a variant where the averaging only occurs if the two values differ by less than a specified threshold. Originating in sociology, this model has attracted substantial interest amongst statistical physicists (see e.g. Ben-Naim et al. [4] and citations thereof) and very recently has been studied as rigorous mathematics in Häggström [6] and in Lanchier [7]. Acemoglu et al [1] treat a model where some agents have fixed (quantitative) opinions and the other agents update according to an averaging process. They appeal to the duality argument in our section 2.4, as well as a "coupling from the past" argument, to study the asymptotic behavior.

Basic properties of the averaging process
Write I = {i, j . . .} for the set of agents and n ≥ 2 for the number of agents. Recall that the array of non-negative meeting rates ν {i,j} for unordered pairs {i, j} is assumed to be irreducible. We can rewrite the array as the symmetric matrix N = (ν ij ) in which Then N is the generator of the Markov chain with transition rates ν ij ; call this the associated Markov chain. The chain is reversible with uniform stationary distribution.
Comment for students. The associated Markov chain is also relevant to the analysis of various social dynamics models other than the averaging process. Throughout, we write X(t) = (X i (t), i ∈ I) for the averaging process run from some non-random initial configuration x(0). Of course the sum is conserved:

Relation with the associated Markov chain
We note first a simple relation with the associated Markov chain. Write 1 i for the initial configuration (1 (j=i) , j ∈ I), that is agent i has unit money and other agents have none, and write p ij (t) for the transition probabilities of the associated Markov chain.

Lemma 1.
For the averaging process with initial configuration 1 i we have EX j (t) = p ij (t/2). More generally, from any deterministic initial configuration x(0), the expectations x(t) := EX(t) evolve exactly as the dynamical system d dt The time-t distribution p(t) of the associated Markov chain evolves as d dt p(t) = p(t)N . So if x(0) is a probability distribution over agents, then the expectation of the averaging process evolves as the distribution of the associated Markov chain started with distribution x(0) and slowed down by factor 1/2. But keep in mind that the averaging process has more structure than this associated chain.

Proof.
The key point is that we can rephrase the dynamics of the averaging process as when two agents meet, each gives half their money to the other. In informal language, this implies that the motion of a random penny -which at a meeting of its owner agent is given to the other agent with probability 1/2 -is as the associated Markov chain at half speed, that is with transition rates ν ij /2.
To say this in symbols, we augment a random partition X = (X i ) of unit money over agents i by also recording the position U of the "random penny", required to satisfy Given a configuration x and an edge e, write x e for the configuration of the averaging process after a meeting of the agents comprising edge e. So we can define the augmented averaging process to have transitions

D. Aldous and D. Lanoue
This defines a process (X(t), U (t)) consistent with the averaging process and (intuitively at least -see below) satisfying (2. 2) The latter implies EX i (t) = P(U (t) = i), and clearly U (t) evolves as the associated Markov chain slowed down by factor 1/2. This establishes the first assertion of the lemma. The case of a general initial configuration follows via the following linearity property of the averaging process. Writing X(y, t) for the averaging process with initial configuration y, one can couple these processes as y varies by using the same realization of the underlying meeting process. Then clearly y → X(y, t) is linear.
How one writes down a careful proof of (2.2) depends on one's taste for details. We can explicitly construct U (t) in terms of "keep or give" events at each meeting, and pass to the embedded jump chain of the meeting process, in which time m is the time of the m'th meeting and F m its natural filtration. Then on the event that the m'th meeting involves i and j, as required.
For a configuration x, write x for the "equalized" configuration in which each agent has the average n −1 i x i . Lemma 1, and convergence in distribution of the associated Markov chain to its (uniform) stationary distribution, immediately imply EX(t) → x(0) as t → ∞. Amongst several ways one might proceed to argue that X(t) itself converges to x(0), the next leads to a natural explicit quantitative bound.

The global convergence theorem
A function f : I → R has (with respect to the uniform distribution) average f , variance var f and L 2 norm f 2 defined by The L 2 norm will be used in several different ways. For a possible time-t configuration x(t) of the averaging process, the quantity x(t) 2 is a number, and so the quantity ||X(t)|| 2 appearing in the proposition below is a random variable.
Proposition 2 (Global convergence theorem). From an initial configuration x(0) = (x i ) with average zero, the time-t configuration X(t) of the averaging process satisfies where λ is the spectral gap of the associated MC.
Before starting the proof let us recall some background facts about reversible chains, here specialized to the case of uniform stationary distribution (that is, ν ij = ν ji ) and in the continuous-time setting. See Chapter 3 of Aldous-Fill [3] for the theory surrounding (2.4) and Lemma 3.
The associated Markov chain, with generator N at (2.1), has Dirichlet form where {i,j} indicates summation over unordered pairs. The spectral gap of the chain, defined as the gap between eigenvalue 0 and the second eigenvalue of N , is characterized as Writing π for the uniform distribution on I, one can define a distance from uniformity for probability measures ρ to be the L 2 norm of the function i → ρi−πi πi , and we write this distance in the equivalent form  This is optimal, in the sense that the rate of convergence really is Θ(e −λt ).
A few words about notation for process dynamics. We will write because we find the former "differential" notation much more intuitive than the integral notation. In the context of a social dynamics process we typically want to choose a functional Φ and study the process Φ(X(t)), and we write so that E(dΦ(X(t)) | F (t)) = φ(X(t))dt. We can immediately write down the expression for φ in terms of Φ and the dynamics of the particular process; for the averaging process, where x ij is the configuration obtained from x after agents i and j meet and average. This is just saying that agents i, j meet during [t, t + dt] with chance ν ij dt and such a meeting changes Φ(X(t)) by the amount Φ(x ij ) − Φ(x).
Comment for students. In proofs we refer to (2.5, 2.6) as "the dynamics of the averaging process". Everybody actually does the calculations this way, though some authors manage to disguise it in their writing. . So, writing Q(t) := ||X(t)|| 2 2 , The first equality is by the dynamics of the averaging process, the middle equality is just the definition of E for the averaging process, and the final inequality is the extremal characterization λ = inf{E(g, g)/||g|| 2 2 : g = 0, var(g) = 0}.
The rest is routine. Take expectation: and then solve to get EQ(t) ≤ EQ(0) exp(−λt/2) in other words Finally take the square root.

A local smoothness property
Thinking heuristically of the agents who agent i most frequently meets as the "local" agents for i, it is natural to guess that the configuration of the averaging process might become "locally smooth" faster than the "global smoothness" rate implied by Proposition 2. In this context we may regard the Dirichlet form E(f, f ) as measuring the "local smoothness", more accurately the local roughness, of a function f , relative to the local structure of the particular meeting process. The next result implicitly bounds EE(X(t), X(t)) at finite times by giving an explicit bound for the integral over 0 ≤ t < ∞. Note that, from the fact that the spectral gap is strictly positive, we can see directly that EE(X(t), X(t)) → 0 exponentially fast as t → ∞; Proposition 4 is a complementary result.

Proposition 4.
For the averaging process with arbitrary initial configuration x(0), This looks slightly magical because the bound does not depend on the particular rate matrix N , but of course the definition of E involves N .
Proof. By linearity we may assume x(0) = 0. As in the proof of

Duality with coupled Markov chains
Comment for students. Notions of duality are one of the interesting and useful tools in classical IPS, and equally so in the social dynamics models we are studying. The duality between the voter model and coalescing chains is the simplest and most striking example. The relationship we develop here for the averaging model is less simple but perhaps more representative of the general style of duality relationships. The technique we use is to extend the "random penny" (augmented process) argument used in Lemma 1. Now there are two pennies, and at any meeting there are independent decisions to hold or pass each penny. The positions (Z 1 (t), Z 2 (t)) of the two pennies behave as the following MC on product space,

Foundational issues
There are several "foundational" issues, or details of rigor, we have skipped over, and to students who are sufficiently detail-oriented to notice we suggest they investigate the issue themselves. (i) Give a formal construction of the averaging process consistent with the process dynamics described in section 2.2.
(ii) Relate this notation of duality to the abstract setup in Liggett [9]  where the "noise" processes W i (t) are defined as follows. First take n independent standard Normals conditioned on their sum equalling zero -call them (W i (1), 1 ≤ i ≤ n). Now take W(t) to be the n-dimensional Brownian motion associated with the time-1 distribution W(1) = (W i (1), 1 ≤ i ≤ n). By modifying the proof of Proposition 2, show this process has a limit distribution X(∞) such that E||X(∞)|| 2 2 ≤ 2σ 2 (n − 1) λn .
(iii) Give an example to show that EE(X(t), X(t)) is not necessarily a decreasing function of t. Answers to (ii,iii) are outlined in [2].

Open problems
(i) Can you obtain any improvement on Proposition 4? In particular, assuming a lower bound on meeting rates: for some universal function φ(t) ↓ 0 as t ↑ ∞? (ii) One can define the averaging process on the integers -that is, ν i,i+1 = 1, −∞ < i < ∞ -started from the configuration with unit total mass, all at the origin. By Lemma 1 we have where the right side is the time-t distribution of a continuous-time simple symmetric random walk, which of course we understand very well. What can you say about the second-order behavior of this averaging process? That is, how does var(X j (t)) behave and what is the distributional limit of (X j (t) − p j (t))/ var(X j (t)) ? Note that duality gives an expression for the variance in terms of the coupled random walks, but the issue is how to analyze its asymptotics explicitly.
(iii) The discrete cube {0, 1} d graph is a standard test bench for Markov chain related problems, and in particular its log-Sobolev constant is known [5]. Can you get stronger results for the averaging process on this cube than are implied by our general results?

Quantifying convergence via entropy
Parallel to Lemma 3 are quantifications of reversible Markov chain convergence in terms of the log-Sobolev constant of the chain, defined (cf. (2.4)) as where See Montenegro and Tetali [10] for an overview, and Diaconis and Saloff-Coste [5] for more details of the theory, which we do not need here. One problem posed in the Spring 2011 course was to seek a parallel of Proposition 2 in which one quantifies closeness of X(t) to uniformity via entropy, anticipating a bound in terms of the log-Sobolev constant of the associated Markov chain in place of the spectral gap. Here is one solution to that problem.
For a configuration x which is a probability distribution write for the entropy of the configuration (note [5] writes "entropy" to mean relative entropy w.r.t. the stationary measure). Consider the averaging process where the initial configuration is a probability distribution. By concavity of the function −x log x it is clear that in the averaging process Ent(X(t)) can only increase, and hence Ent(X(t)) ↑ log n a.s. (recall log n is the entropy of the uniform distribution). So we want to bound E(log n − Ent(X(t))). For this purpose note that, for a configuration x which is a probability distribution, Proposition 7. For the averaging process whose initial configuration is a probability distribution x(0), E(log n − Ent(X(t))) ≤ (log n − Ent(x(0))) exp(−αt/2) where α is the log-Sobolev constant of the associated Markov chain.
So D(t) := log n − Ent(X(t)) satisfies The rest of the proof exactly follows Proposition 2.