Convergence of the Population Dynamics algorithm in the Wasserstein metric

We study the convergence of the population dynamics algorithm, which produces sample pools of random variables having a distribution that closely approximates that of the {\em special endogenous solution} to a stochastic fixed-point equation of the form: $$R\stackrel{\mathcal D}{=} \Phi( Q, N, \{ C_i \}, \{R_i\}),$$ where $(Q, N, \{C_i\})$ is a real-valued random vector with $N \in \mathbb{N}$, and $\{R_i\}_{i \in \mathbb{N}}$ is a sequence of i.i.d. copies of $R$, independent of $(Q, N, \{C_i\})$; the symbol $\stackrel{\mathcal{D}}{=}$ denotes equality in distribution. Specifically, we show its convergence in the Wasserstein metric of order $p$ ($p \geq 1$) and prove the consistency of estimators based on the sample pool produced by the algorithm.


Introduction
We study an iterative bootstrap algorithm, known as the "population dynamics" algorithm, that can be used to efficiently generate samples of random variables whose distribution closely approximates that of the so-called special endogenous solution to a stochastic fixed-point equation ( . These equations appear in a variety of problems, ranging from computer science to statistical physics, e.g.: in the analysis of divide and conquer algorithms such as Quicksort [27,14,28] and FIND [13], the analysis of Google's PageRank algorithm [30,18,9,23], the study of queueing networks with synchronization requirements [22,26], and the analysis of the Ising model [12], to name a few. In general, SPFEs of the form in (1.1) can have multiple solutions, but in most cases we are interested in computing those that can be explicitly constructed on a weighted branching process, known as endogenous solutions. In some cases, even the endogenous solution is not unique [5], but characterizing all endogenous solutions can be done using the special endogenous solution, which is the only attracting solution, and can be constructed by iterating (1.1) starting from some well-behaved initial distribution.
This work focuses on the analysis of a simulation algorithm that can be used to generate samples from a distribution that closely approximates that of the special endogenous solution to a variety of SFPEs. The need for such an approximate algorithm lies on the numerical complexity of simulating even a few generations of a weighted branching process using naive Monte Carlo methods. The population dynamics algorithm, described in §14.6.4 in [25] and §8.1 in [1], circumvents this problem by resampling with replacement from previously computed iterations of (1.1), i.e., by using an iterative bootstrap technique. However, as is the case with the standard bootstrap algorithm, the samples obtained are neither independent nor exactly distributed according to the target distribution, which raises the need to study the convergence properties of the algorithm.
Before presenting the algorithm and stating our main results, it may be helpful to describe in more detail some of the examples mentioned above. Throughout the paper, we use x ∨ y = max{x, y} and x ∧ y = min{x, y} to denote the maximum and the minimum, respectively, of x and y.
• The linear SFPE or "smoothing transform": appears in the analysis of the number of comparisons required by the sorting algorithm Quicksort [27,14,28], and can also be used to describe the distribution of the ranks computed by Google's PageRank algorithm on directed complex networks [30,18,9,23].
• The maximum SFPE or "high-order Lindley equation": arises as the limiting waiting time distribution on queueing networks with parallel servers and synchronization requirements [22,26] and in the analysis of the branching random walk [1].
• The discounted tree-sum SFPE: appears in the worst-case analysis of the FIND algorithm [13] and the analysis of the "discounted branching random walk" [6].
• Although the analysis presented here does not directly apply to this case, we mention that the population dynamics algorithm can also be used to simulate the fixed points of the belief propagation equations on random graphical models [25]: where the {R i } are i.i.d. copies ofR independent of the vector (Q, N, {C i }) and the {R i } are i.i.d. copies of R independent of the vector (Q,Ñ , {C i }), with Φ and Ψ potentially different.
We refer the reader to [1] for even more examples, including some involving minimums.
The existence and uniqueness of solutions to any of these SFPEs is in itself a non-trivial problem. We refer the reader again to [1] for a broad survey of known results and open problems on this topic. The most well-studied equations are the linear (1.2) and maximum (1.3) SFPEs, which have been extensively studied in [24,16,2,4,5,3,17] and [7,21], respectively. However, to provide some context to where the population dynamics algorithm fits in, we briefly mention that the existence of solutions is often established by showing that the transformation T that maps the distribution µ on R to the distribution of where the {X i } are i.i.d. random variables distributed according to µ, independent of the vector (Q, N, {C i }), is strictly contracting under some suitable metric. Note that in this case, we have that the sequence of probability measures µ n+1 = T (µ n ) converges as n → ∞ to a fixed point of (1.1). Moreover, as long as the initial distribution µ 0 has sufficiently light tails, one can show that {µ n } converges to the special endogenous solution to (1.1), and the contracting nature of T provides an upper bound of the form d(µ n , µ) ≤ d(T (µ n−1 ), T (µ)) ≤ cd(µ n−1 , µ) ≤ c n d(µ 0 , µ), n = 1, 2, . . . , for some constant 0 < c < 1, where d is the distance under which T is a contraction.
As will be discussed in more detail later (see Examples 2.7), all the examples provided earlier define contractions under d p , the Wasserstein metric of order p, for some p ≥ 1. For completeness, we also include a result (Theorem 2.5) that gives easy to verify conditions guaranteeing that for some 0 < c < 1, where R (k) and R have distributions µ k and µ, respectively.
It follows that from a computational point of view, it suffices to have an algorithm for computing µ k for a fixed number of iterations k ∈ N. The population dynamics algorithm produces a sample of observations approximately distributed according to µ k , which can also be helpful in searching for the existence of endogenous solutions, as stated in [1]. We now describe how to obtain an exact sample of µ k , which will also make clear the need for a computationally efficient method.

Constructing endogenous solutions on weighted branching processes
As mentioned earlier, the attracting endogenous solution to (1.1), provided it exists, can be constructed on a structure known as a weighted branching process [27]. We now elaborate on this point. To ease the exposition, we will use (i, j) = (i 1 , . . . , i n , j) to denote the index concatenation operation. Next, let (Q, N, {C i } i≥1 ) be a real-valued vector with N ∈ N. We will refer to this vector as the generic branching vector. Now let {(Q i , N i , {C (i,j) } j≥1 } i∈U be a sequence of i.i.d. copies of the generic branching vector. To construct a weighted branching process we start by defining a tree as follows: let A 0 = {∅} denote the root of the tree, and define the nth generation according to the recursion Now, assign to each node i in the tree a weight Π i according to the recursion see Figure 1. If P (N < ∞) = 1 and C i ≡ 1 for all i ≥ 1, the weighted branching process reduces to a Galton-Watson process.
To generate a sample from µ k we first need to fix the initial distribution µ 0 , e.g., by letting µ 0 be the probability measure of a constant, say zero or one. Now construct a weighted branching process with k generations, and let {R (0) i } i∈A k be i.i.d. random variables having distribution µ 0 . Next, define recursively for each i ∈ A k−r , 1 ≤ r ≤ k, The random variable R (k) ∅ is distributed according to µ k , and its generation requires on average (E[N ]) k i.i.d. copies of the generic branching vector (Q, N, {C i } i≥1 ). It follows that if the goal was to obtain an i.i.d. sample of size m from distribution µ k , one would need to generate on average m(E[N ]) k copies of the generic branching vector. However, in applications one typically has E[N ] > 1, e.g., N ≡ 2 for Quicksort, E[N ] ≈ 30 in the analysis of PageRank on the WWW graph, and E[N ] can be in the hundreds for MapReduce implementations related to the maximum SFPE. This makes the exact simulation of R (k) using a weighted branching process impractical.
The population dynamics algorithm, described below, uses a bootstrap approach to produce a sample of size m of random variables that are approximately distributed according to µ k , and that although not independent, can be used to obtain consistent estimators for moments, quantiles and other functions of µ k .

The population dynamics algorithm
The population dynamics algorithm is based on the bootstrap, i.e., in the idea of sampling with replacement random variables from a common pool. As described above, the algorithm starts by generating a sample of i.i.d. random variables having distribution µ 0 , with the difference that when computing the next level of the recursion, it samples with replacement from this pool as needed by the map Φ. In other words, to obtain a pool of approximate copies of R (j) we bootstrap from the pool previously obtained of approximate copies of R (j−1) . The approximation lies in the fact that we are not sampling from R (j−1) itself, but from a finite sample of conditionally independent observations that are only approximately distributed as R (j−1) . The algorithm is described below.
Let (Q, N, {C r }) denote the generic branching vector defining the weighted branching process. Let k be the depth of the recursion that we want to simulate, i.e., the algorithm will produce a sample of random variables approximately distributed according to µ k . Choose m ∈ N + to be the bootstrap sample size. For each 0 ≤ j ≤ k, the algorithm outputs P (j,m) , which we refer to as the sample pool at level j.  b.) While j ≤ k: copies of the generic branching vector, independent of everything else. We conclude this section by pointing out that the complexity of the algorithm described above is of order km, while the naive Monte Carlo approach described earlier, which consists on sampling m i.i.d. copies of a weighted branching process up to the kth generation, has order (E[N ]) k m.
Our main results establish the convergence of the algorithm in the Wasserstein metric of order p (p ≥ 1), as well as the consistency of estimators constructed using the pool P (k,m) . The following section contains all the statements, and the proofs are given in Section 3.

Main results
We start by defining the Wasserstein metric of order p.
Definition 2.1 Let M (µ, ν) denote the set of joint probability measures on R × R with marginals µ and ν. Then, the Wasserstein metric of order p (1 ≤ p < ∞) between µ and ν is given by An important advantage of working with the Wasserstein metrics is that on the real line they admit the explicit representation where F and G are the cumulative distribution functions of µ and ν, respectively, and f −1 (t) = inf{x ∈ R : f (x) ≥ t} denotes the generalized inverse of f . It follows that the optimal coupling of two real random variables X and Y is given by (X, With some abuse of notation, we use d p (F, G) to denote the Wasserstein distance of order p between the probability measures µ and ν, where F (x) = µ((−∞, x]) and G(x) = ν((−∞, x]) are their corresponding cumulative distribution functions.
Our main results establish the convergence of d p (F k,m , F k ) as m → ∞, both in mean and almost surely, wherê and P (k,m) = R (k,m) 1 , . . . ,R (k,m) m is the pool generated by the population dynamics algorithm. The theorems are proven under two different assumptions, the first one imposing a Lipschitz condition on the mean of Φ, and the second one requiring Φ to be Lipschitz continuous almost surely.
[linear0] For the linear SFPE (1.2), it suffices that the inequality holds for {X i } and {Y i } having the same mean.

Assumption 2.3
Suppose that for any vector (q, n, {c r }), with n ∈ N ∪ {∞}, and any sequences of numbers {x r } and {y r } for which Φ(q, n, {c r }, {x r }) and Φ(q, n, {c r }, {x r }) are well defined, there exists a function ϕ : R → R + such that [20] gives that

Remarks 2.4 (i) To see that Assumption 2.3 implies Assumption 2.2, note that Lemma 4.1 in
and therefore Assumption 2.2 holds with , provided the expectation is finite. However, much tighter bounds can be obtained for specific examples, and we can usually find p ≥ 1 such that H p < 1.
(ii) The existence of a p ≥ 1 for which H p < 1 is important for obtaining estimates for the rate of convergence of the algorithm that are uniform in k, and has also important implications for the convergence of R (k) → R as k → ∞, as the next result shows.
where R (k) and R are distributed according to µ k and µ, respectively. For the linear SFPE (1.2), we have that (2.2) also holds under either of the following conditions: As the proof of Theorem 2.5 shows, one can take c p = H p under the main set of conditions as well as under conditions (i), whereas for (ii) we have c p = ρ 1 ∨ ρ p . As a consequence of the proof of Theorem 2.5 we also obtain the following explicit bound for the moments of R (k) .   [27,14,28] For the PageRank algorithm studied in [30,18,9] • Using the inequality for any real numbers {x i , y i } and any n ≥ 1, we obtain that the maximum SFPE (1.3) satisfies Assumption 2.3 with ϕ(t) = |t| as well. Furthermore, in the analysis of queueing networks with parallel servers and synchronization requirements from [22,26], where T ≡ 0 (equivalently, Q ≡ 1), the stability condition of the system implies that H p < 1 for any p ≥ 1 whenever the system is stable. Lemma 2.6 then implies that E[|R (k) | p ] is uniformly bounded in k for all p ≥ 1.
• In the case of the discounted tree sum SFPE (1.4), inequality (2.3) implies that we can also take ϕ(t) = |t| in Assumption 2.3. For the analysis of the FIND algorithm in [13] in particular, we have N ≡ 2, C 1 = U = 1 − C 2 and Q ≡ 1, with U uniformly distributed on [0, 1], and we can take H p = 2E[U p ] = 2/(p + 1) < 1 for any p > 1 in Assumption 2.2. Lemma 2.6 then gives that E[|R (k) | p ] is uniformly bounded in k for all p > 1 and therefore satisfies for some ξ between x and y.

Assumption 2.2 is then satisfied for
Our first result establishes the convergence in mean of d p (F k,m , F k ) under the "optimal" moment conditions, that is, assuming only that m } be an i.i.d. sample from distribution F j , and let F j,m denote their corresponding empirical distribution function. Then, is a constant that only depends on p and q.
Remarks 2.9 (i) Note that Assumption 2.2 does not require that H p < 1, i.e., it is not necessary for Φ to define a contraction for the algorithm to work. However, when H p < 1 the bound provided by Theorem 2.8 becomes independent of k, ensuring that the complexity of the population dynamics algorithm remains linear in k, rather than exponential, i.e., (E[N ]) k , as the naive algorithm. When H p ≥ 1 for all p ≥ 1 the bound given above may grow with the level of the recursion, i.e., the value of k, and the convergence of the sequence {µ k } as k → ∞ may not be guaranteed.
(ii) Even in the case when H p ≥ 1 for all p ≥ 1, the explicit bounds provided by Theorem 2.8 may be useful for determining whether endogenous solutions exist, since they guarantee that we can accurately approximate R (k) .
(iii) We also point out that the first inequality in Theorem 2.8 implies that the rate at which corresponds to implementing the population dynamics algorithm by sampling without replacement from a "perfect" i.i.d. pool of observations from µ j−1 , this convergence rate is in some sense optimal.
(iv) For all the examples given in Examples 2.7, we have H p < 1 and sup k≥0 E[|R (k) | q ] < ∞ for some q > p, making the bound provided by Theorem 2.8 independent of k. Moreover, for the Quicksort and FIND algorithms, as well as for the queuing networks with parallel servers and synchronization requirements, the best possible rate of convergence is achieved, We now turn our attention to the almost sure convergence of d p (F k,m , F k ), for which we provide two different results. The first one holds under Assumption 2.2 as above, but under rather strong moment conditions. Note that for the linear case Assumption 2.2, in its general form, holds for The moment condition requiring the finiteness of the 2p absolute moment also appears in some related (stronger) results for the convergence of the Wasserstein distance between a distribution function and its empirical measure, specifically, concentration inequalities [15] and a central limit theorem [11]. In our case, where we seek only to establish the almost sure convergence of the algorithm, this condition is too strong, so we provide below an improved result under the finer Assumption 2.3. Theorem 2.11 Fix 1 ≤ p < ∞ and suppose that Φ satisfies Assumption 2.3. Assume further that Our last result relates the convergence of d p (F k,m , F k ) to the consistency of estimators based on the pool P (k,m) . More precisely, the value of the algorithm lies in the fact that it efficiently produces a sample of identically distributed random variables whose distribution is approximately F k . A natural estimator for quantities of the form E[h(R (k) )] is then given by However, the random variables in P (k,m) are not independent of each other, and the consistency of such estimators requires proof. In the sequel, the symbol P → denotes convergence in probability.

Definition 2.12
We say that Θ n is a weakly consistent estimator for θ if Θ n P → θ as n → ∞. We say that it is a strongly consistent estimator for θ if Θ n → θ a.s.
Our last result shows the consistency of estimators of the form in (2.4) for a broad class of functions. Proposition 2.13 Fix 1 ≤ p < ∞ and suppose that h : R → R satisfies |h(x)| ≤ C(1 + |x| p ) for all x ∈ R and some constant C > 0. Then, the following hold: We conclude that the population dynamics algorithm can be used to efficiently generate sample pools of random variables having a distribution that closely approximates that of the special endogenous solution to SFPEs of the form in (1.1). Furthermore, these sample pools can be used to produce consistent estimators for a broad class of functions. The gain of efficiency of the algorithm compared to a naive Monte Carlo approach, combined with the consistency guarantees proved in this paper, make it extremely useful for the numerical analysis of many problems where SFPEs appear.

Proofs
This section includes the proofs of Theorems 2. To start, for any k ∈ N let (i,r) } r≥1 has the same distribution as the generic branching vector (Q, N, {C r } i≥1 ) and the {U ii. For 1 ≤ j ≤ k and each 1 ≤ i ≤ m, Note that the random variables {R and have distribution F j , and therefore, F j,m is an empirical distribution function for F j . The distribution functionsF j,m are those obtained through the population dynamics algorithm.
Throughout the proofs we will also use repeatedly the sigma-algebra F k = σ(E k ) for k ∈ N. We point out that all the random variables {(R We are now ready to prove Theorem 2.8.

Proof of Theorem
be a sequence of random vectors constructed as explained above.
Next, note that from the triangle inequality we obtain Now let χ be a Uniform(0,1) random variable independent of everything else, and define the random variablesR which conditionally on F j are distributed according toF j,m and F j,m , respectively. Then, from the definition of d p we have It follows from the observation that the random variables X Next, suppose first that Assumption 2.2 for any {X i } and {Y i }, and note that For the linear case when only Assumption 2.2 [linear0] holds, note that and therefore, It now follows from (3.2) and Minkowski's inequality, that Iterating the recursion above we obtain This completes the first part of the proof.
Next, assume that max 0≤r≤k E[|R (r) | q ] < ∞ for q > p ≥ 1, q = 2p, and use Theorem 1 in [15] to obtain that where C = C(p, q) is a constant that does not depend on F r . The second statement of the theorem now follows.
We now turn to the proof of Theorem 2.10. To simplify its exposition we first provide a preliminary result for the mean Wasserstein distance between a distribution and its empirical distribution function.
Proof. Fix ǫ > 0 and define for x ≥ 0 the functions Next, use Proposition 7.14 in [8] followed by the monotonicity of the L p norm, to see that where g −1 (t) = inf{x ∈ R : g(x) ≥ t} is the generalized inverse of function g.

Next, to bound (3.4) note that
where in the last equality we used the observation that {x < a −1 (m)} = {a(x) < m}, respectively,

Hence, (3.4) is bounded from above by a constant times
Since sup t≥1 log(t + 1)/ log t < ∞ and we obtain that (3.5) is finite. Finally, the same steps used to bound (3.5) give that (3.6) is bounded by We now give the proof for the first result on the almost sure convergence of the algorithm. The idea of the proof is to first identify a recursive formula for the Wasserstein distance d p (F k,m , F k ) as it was done for the convergence in mean theorem. Once we do this, the main difficulty lies in ensuring that the errors in the bound converge sufficiently fast to satisfy the criterion for almost sure convergence in the Borel-Cantelli lemma. In the case when we have a bit more than 2p finite moments this can be done using Chebyshev's inequality, similarly to the proof of the strong law of large numbers under finite fourth moment conditions. We start with this case below.
Proof of Theorem 2.10. We will start the proof by deriving an upper bound for d p (F k,m , F k ).
To this end, we construct the random variables according to the construction given at the beginning of the section. Recall that F j = σ(E j ), where E j is given by (3.1), and that Assumption 2.2 holds for both p and 2p.
We start by noting that the triangle inequality followed by (3.3) give Next, define for j ≥ 1, X It follows that which in turn implies that where in the last step we used the inequality (x + y) β ≤ x β + y β for 0 < β ≤ 1 and x, y ≥ 0. Iterating (3.8) k − 1 more times we obtain . Now note that by the Glivenko-Cantelli lemma and the strong law of large numbers, as m → ∞, and therefore, by Definition 6.8 and Theorem 6.9 in [29], d p (F j,m , F j ) → 0 a.s. for each j ≥ 1. It suffices then to show that for each 1 ≤ j ≤ k the sums m −1 m i=1 Z (j,m) i → 0 a.s. as well.
To see this note that for any ǫ > 0, Moreover, using the same arguments we used in the proof of Theorem 2.8, we obtain that Next, note that by Theorem 2.8 we have It follows that for any 1 ≤ j ≤ k, Finally, since by Lemma 3.1 we have that = 0 a.s. This completes the proof.
We now move on to the proof of Theorem 2.11, where we only have a bit more than p finite moments. In this case, we cannot use Chebyshev's inequality to verify the condition for the Borel-Cantelli lemma, and a finer analysis of the errors is required. In particular, our proof uses the Lipschitz condition from Assumption 2.3 to derive a large-deviations bound for the sum of independent random variables appearing in the recursive analysis of d p (F k,m , F k ). Before proceeding to the main proof, we give three preliminary results. The first one provides an upper bound for the generalized inverse of any distribution function having finite q absolute moments. Lemma 3.2 Let G be a distribution function on R, and let G −1 be its generalized inverse. Suppose that G has finite absolute moments of order q > 0. Then, for any u ∈ (0, 1), Proof. Let X be a random variable having distribution G, and define G + (x) = P (X + ≤ x) = G(x)1(x ≥ 0) and G − (x) = P (X − ≤ x) = P (X ≥ −x)1(x ≥ 0). Then, Now use Markov's inequality to obtain that for all x > 0, The first inequality implies that for any u ∈ (0, 1), The next two preliminary results provide key steps for the proof of Theorem 2.11, which essentially consist on giving a large-deviations bound (uniform in m) for the sample mean of (conditionally) i.i.d. random variables. The random variables {Y (j,m) i } defined below will be used as upper bounds for d p+δ j+1 (F j,m , F j ) in the proof of Theorem 2.11, and the estimates we need have to be very tight considering that we no longer have finite second moments, so the rate of convergence to their mean can be very slow. The lemma below gives an upper bound for the truncated summands.

9)
for i = 1, . . . , m. Then, on the event sup m≥n d p+δ j (F j,m , F j ) p+δ j ≤ η , we have Proof. We start by noting that To bound each of the probabilities in (3.15) use Chernoff's bound to obtain that Note that by Remark 2.4(i), we have that on the event sup m≥n d p+δ Next, use the inequality e x ≤ 1 + xe x for x ≥ 0 to obtain that E e θY (j,m) It follows that by choosing θ = (2/ǫ) log m/m we obtain be defined according to (3.9). Then, for any q j+1 < r j < q j and all t ≥ n, Proof. To simplify the notation, let (1,r) } r≥1 .
Next, note that , where E j is given by (3.1), and note that Moreover, if we let q j = p + δ j and use Lemma 3.2, we obtain that, conditionally on F j , Furthermore, by Minkowski's inequality, we have that on the event It follows that conditionally on where K j η 1/q j + ||R (j) || q j < ∞ by Remark 2.4(ii).
Thus, we have that on the event {sup m≥n d q j (F j,m , F j ) q j ≤ η}, the union bound and Markov's inequality yield where by assumption q j+1 < r j < q j , and we have used the observation that U i D = 1 − U i . Finally, note that by Remark 2.4(i), we have We conclude that We are now ready to prove Theorem 2.11, which proves by induction that d p+δ (F k,m , F k ) → 0 a.s. as m → ∞.
To prove that d p+δ j+1 (F j+1,m , F j+1 ) → 0 a.s. as m → ∞, we start by constructing the random variables {(R as explained at the beginning of this section. Now note that for any ǫ, η > 0, (3.14) To analyze (3.13) note that its convergence to zero as n → ∞ is equivalent to the a.s. convergence of d p+δ j (F j,m , F j ) to zero as m → ∞, which corresponds to the induction hypothesis (3.11).
Next, to prove that (3.12) converges to zero we first define the random variables {Y (j,m) i : 1 ≤ i ≤ m} according to (3.9), and define the events Now use (3.3) and Assumption 2.3 to obtain P sup m≥n d p+δ j+1 (F j+1,m , F j+1,m ) p+δ j+1 > ǫ, sup Now set q j = p + δ j and r j = q j+1 + δ/(2k), and note that q j+1 < r j < q j ≤ p + δ. Then, by ) and θ k = E[h(R (k) )]. By assumption, we have that d p (F k,m , F k ) → 0 in L p and therefore in probability, as m → ∞. Hence, for every subsequence {m i } i≥1 there is a further subsequence {m i j } j≥1 such that d p (F k,m i j , F k ) → 0 a.s. as j → ∞. Definition 6.8 and Theorem 6.9 in [29] now give that Θ k,m i j → θ k a.s. as j → ∞. (3.17) We conclude that for any subsequence {m i } i≥1 we can find a further subsequence {m i j } j≥1 such that (3.17) holds, and therefore, Θ k,m P − → θ k as m → ∞.
The remaining two proofs in the paper correspond to Theorem 2.5 and Lemma 2.6, which although not directly related to the Population Dynamics algorithm, may be of independent interest.
We now move to the linear SFPE (1.2), for which it is known (see [19]) that R admits the explicit representation as described in Section 1.1. When conditions (i) hold we have E[R (k) ] = 0 for all k ≥ 0 and the arguments used above remain valid.
Suppose now that conditions (ii) hold, in which case we can take R (k) = k−1 j=0 i : i ∈ U } are i.i.d. copies of R (0) . Therefore, Minkowski's inequality gives where W j i∈A j |Π i ||Q i | and W k (R (0) ) i∈A k |Π i ||R (0) i |. Now use Lemma 4.4 in [19] to obtain that under conditions (ii) there exist a constants K p , K ′ p < ∞ such that This completes the proof.
Finally, we provide the proof of Lemma 2.6.
Proof of Lemma 2.6. By ( Hence, and we obtain that