Finitary coding for the sub-critical Ising model with finite expected coding volume

It has been shown by van den Berg and Steif that the sub-critical Ising model on $\mathbb{Z}^d$ is a finitary factor of a finite-valued i.i.d. process. We strengthen this by showing that the factor map can be made to have finite expected coding volume (in fact, stretched-exponential tails), answering a question of van den Berg and Steif. The result holds at any temperature above the critical temperature. An analogous result holds for Markov random fields satisfying a high-noise assumption and for proper colorings with a large number of colors.


Introduction and main results
Let (S, S) and (T, T ) be two measurable spaces, and let X = (X v ) v∈Z d and Y = (Y v ) v∈Z d be (S, S)-valued and (T, T )-valued stationary random fields (i.e., Z d -processes) for some d ≥ 1. A coding from Y to X is a measurable function ϕ : T Z d → S Z d , which is translation-equivariant, i.e., commutes with every translation of Z d , and which satisfies that ϕ(Y ) and X are identical in distribution. Such a coding is also called a factor map or homomorphism from Y to X, and when such a coding exists, we say that X is a factor of Y .
The coding radius of ϕ at a point y ∈ T Z d , denoted by R(y), is the minimal integer r ≥ 0 such that ϕ(y ) 0 = ϕ(y) 0 for almost all y ∈ T Z d which coincide with y on the ball of radius r around the origin in the graph-distance, i.e., y v = y v for all v ∈ Z d such that v 1 ≤ r. It may happen that no such r exists, in which case, R(y) = ∞. Thus, associated to a coding is a random variable R = R(Y ) which describes the coding radius. We refer to R d as the coding volume. A coding is called finitary if R is almost surely finite. When there exists a finitary coding from Y to X, we say that X is a finitary factor of Y .
We say that a non-negative random variable R has exponential tails if P(R ≥ r) ≤ Ce −cr for some C, c > 0 and all r ≥ 0, and that it has stretched-exponential tails if P(R ≥ r) ≤ Ce −r c holds instead. When there exists a coding from Y to X whose coding radius has (stretched-)exponential tails, we say that X is a finitary factor of Y with (stretched-)exponential tails.
In this paper, we shall be concerned with finitary factors of i.i.d. (independent and identically distributed) processes, distinguishing between the cases when the i.i.d. process is finite-valued or infinite-valued. We use the abbreviation ffiid to denote a finitary factor of an i.i.d. process (perhaps infinite-valued), and fv-ffiid to denote a finitary factor of a finite-valued i.i.d. process.
Our main example is the (ferromagnetic) Ising model in d ≥ 2 dimensions -a classical discrete spin system in statistical mechanics. A Gibbs measure for the Ising model on Z d at inverse temperature β > 0 is a probability measure µ on {−1, +1} Z d which satisfies that, if the random field X = (X v ) v∈Z d has distribution µ, then for any vertex v ∈ Z d , interactions weaker (the high-temperature model even satisfies high-noise; see below), and indeed, we believe that Theorem 1.7 extends to the anti-ferromagnetic q-state Potts model at any temperature (and q as in the theorem above), though we do not pursue this here.
Our third result is not about a particular model, but rather about a class of translation-invariant high-noise Markov random fields, which we proceed to define. Let S be finite, let µ be a probability measure on S Z d and let X = (X v ) v∈Z d be distributed according to µ. We say that µ is a Markov random field if its conditional finite-dimensional distributions depend only on the immediate neighborhood of the finite set being inspected, i.e., if for any finite V ⊂ Z d and any ξ ∈ S V , where ∂V denotes the set of vertices at distance 1 from V . The Ising model and proper colorings (or rather the Gibbs measures for those models) are two examples of Markov random fields. Suppose that µ is a translation-invariant Markov random field and, for s ∈ S, denote We say that µ satisfies high-noise if The quantity γ is called the multigamma admissibility. It is essentially the probability that an update can be made to the spin at the origin without knowing anything about the values of the spins at its neighbors (see [10] for a more detailed explanation). We remark that Dobrushin's uniqueness condition [6] (or, alternatively, the "disagreement percolation" condition of van den Berg and Maes [4]) implies that if µ satisfies high-noise, then it is the only random field with the same conditional finite-dimensional distributions as µ.
Theorem 1.9. Let µ be a translation-invariant Markov random field satisfying high-noise. Then µ is fv-ffiid with stretched-exponential tails.
Theorem 1.9 improves on a result of Häggström and Steif [10] who showed that any translationinvariant high-noise Markov random field is fv-ffiid. Theorem 1.9 applies to numerous models of statistical physics, including the Potts model (both ferromagnetic and anti-ferromagnetic) at high temperature, the hard-core model at low fugacity and the Widom-Rowlinson model at low fugacity (see [10] for more details on this for the Potts and Widom-Rowlinson models). On the other hand, Theorem 1.1 does not follow from Theorem 1.9, as the Ising model does not satisfy high-noise when β is only slightly smaller than β c (d). Similarly, Theorem 1.7 does not follow from Theorem 1.9 (even for large values of q), as it is clear that γ = 0 for proper q-colorings, regardless of how large q is.
The three theorems will be proved using a general result introduced in Section 2 about finitary codings for limiting distributions of probabilistic cellular automata. Background. We give here only a brief background and refer the reader to [5] for a more complete description of known results. A fundamental problem in ergodic theory is to understand which processes are isomorphic to which other processes (meaning that there is an almost everywhere invertible factor from one to the other). The very simplest of processes are the i.i.d. processes, and therefore, of particular interest are those processes which are isomorphic to an i.i.d. process; such processes are termed Bernoulli. The celebrated isomorphism theorem of Ornstein [19] states that any two i.i.d. processes of equal entropy are isomorphic (this result was later extended by Keane and Smorodinsky [15] who showed that any two such finite-valued processes are in fact finitarily isomorphic). Ornstein [19] further showed that any factor of an i.i.d. process is Bernoulli. This shed a more probabilistic light on the notion of Bernoullicity.
The notion of a finitary factor of i.i.d. has the advantage that it allows to compute a symbol in the target process by only revealing (almost surely) finitely many variables of the i.i.d. process. This gives a more concrete construction of the target process, which may also be useful for exact simulation algorithms. Besides this appealing feature, finitary factors of i.i.d. have particular relevance in the context of probabilistic models such as those considered here. Let us take the Ising model as an example. It has been shown [20] (see also [1]) that the so-called "plus state" (this is the Gibbs measure obtained by taking + boundary conditions) is a factor of an i.i.d. process (i.e., is Bernoulli) for any value of the inverse temperature β. Thus, the phase transition is not reflected in this notion. However, as shown in [5], it is indeed reflected in the notion of a finitary factor: the "plus state" is a finitary factor of an i.i.d. process when β < β c (d), but not when β > β c (d).
In constructing a finitary coding from an i.i.d. process to a given process, it is desirable for efficiency purposes (e.g., for simulation algorithms) that the i.i.d. process be "small" and that the coding radius also be typically small. One such qualitative meaning of this is that the i.i.d. process is finite-valued and that the coding volume has finite expectation. A more quantitative meaning of this would be to require bounds on the entropy of the i.i.d. process and on the tail of the coding radius. Our results are a mixture of the two as they yield a finitary coding from a finite-valued i.i.d. process with stretched-exponential tails for the coding radius. In particular, our result about the Ising model (Theorem 1.1) answers a question of van den Berg and Steif [5,Question 2], who asked whether the sub-critical Ising measure is fv-ffiid with finite expected coding volume. Notation. We consider Z d as a graph in which two vertices u and v are adjacent if |u−v| = 1, where We use 0 to denote the origin (0, . . . , 0) ∈ Z d and e 1 := (1, 0, . . . , 0) ∈ Z d . We use N to denote the non-negative integers. Organization. In Section 2, we formulate the result about finitary codings for limiting distributions of probabilistic cellular automata (Theorem 2.1) and use it to prove Theorem 1.1, Theorem 1.7 and Theorem 1.9. In Section 3, we introduce an abstract tool (Proposition 3.1 and the more general Proposition 3.2) and show how to deduce Theorem 2.1 from it. In Section 4, we introduce and explain an algorithm, which is then used in Section 5 to prove Proposition 3.2. We end with open questions in Section 6. Acknowledgments. I would like to thank Nishant Chandgotia, Peleg Michaeli, Ron Peled and Jeff Steif for useful discussions and comments, and Matan Harel for help in proving Lemma 5.5. I am also grateful to the anonymous referee for suggestions which greatly improved the presentation.

Finitary codings for limiting distributions of PCAs
The goal of this section is to define the notion of a probabilistic cellular automaton (PCA) and other relevant notions, formulate a general result about finitary codings for limiting distributions of PCAs (Theorem 2.1 below), and then use this theorem to deduce the results stated in Section 1.
Before doing so, we give an informal description of the relevant ideas and concepts in the case of the Ising model: Consider the continuous-time Glauber dynamics for the sub-critical Ising modeleach vertex has an exponential clock (with rate 1), and when its clock rings, it updates its spin value according to the conditional distribution given by the values of its neighbors as in (1). This is an ergodic process, whose unique stationary measure is µ (of Theorem 1.1), and thus, the distribution at time t converges to µ as t → ∞, regardless of the initial configuration. As we are interested in finding a coding from a finite-valued process, we instead opt to use a discrete analogue of these dynamics, given by a PCA: at each discrete time step n, every vertex is independently set to active or inactive with some fixed probability, and every active vertex which has no active neighbors then updates its spin value as before. This too is an ergodic process and the distribution at time n converges to µ as n → ∞. Convergence alone is not sufficient to obtain a coding of µ, as the latter requires an exact sample from µ. To get such a sample, one can employ the coupling-from-the-past technique of Propp and Wilson [21]. This then yields a finitary coding for µ from an infinite-valued i.i.d. process (showing that µ is ffiid). Using a result of Martinelli and Olivieri [18] that the convergence of the above process to stationarity occurs at an exponential rate, one may further show that this coding has a coding radius with exponential tails (showing that µ is ffiid with exponential tails). To get from this a (finitary) coding from a finite-valued i.i.d. process, still requires quite some work. All the above, including this last step, has been carried out by van den Berg and Steif [5]. Thus, they showed that µ is fv-ffiid. However, they gave no information on the coding radius beyond its almost sure finiteness. Our main contribution is to show how one can carry out this last step in a controlled manner which preserves the good tails of the coding radius (yielding stretched-exponential tails). We elaborate on this in the next sections.
The above includes general arguments about certain dynamics, along with some model-specific information. Indeed, van den Berg and Steif separated the two parts of the argument, and proved the more general result [5,Theorem 3.4] that the limiting distribution of a monotone, exponentially ergodic PCA is fv-ffiid. In order to accommodate for the different situations considered in Section 1, which include non-monotone models (proper colorings and high-noise Markov random fields), we work here in the more general setting of exponentially uniformly ergodic PCAs (instead of monotone, exponentially ergodic PCAs), defined below. The proof of [5,Theorem 3.4] may be extended to this setting to show that the limiting distribution of an exponentially uniformly ergodic PCA is fv-ffiid. As mentioned before, the main challenge, and our primary contribution, is to show that this can be done while simultaneously controlling the coding radius.
Theorem 2.1. The limiting distribution of an exponentially uniformly ergodic PCA is fv-ffiid with stretched-exponential tails.
The results of Section 1 will follow from Theorem 2.1 by showing that the corresponding measures are limiting distributions of exponentially uniformly ergodic PCAs. Theorem 2.1 will be proved in Section 3. Let us also mention the following result which will easily follow from our definition of an exponentially uniformly ergodic PCA (unlike Theorem 2.1 which requires work).
Theorem 2.2. The limiting distribution of an exponentially uniformly ergodic PCA is ffiid with exponential tails.
We emphasize the differences between the two theorems: the second gives a finitary coding with exponential tails but does not provide any control on the i.i.d. process, while the first gives a coding from a finite-valued i.i.d. process but does slightly worse in terms of the tails of the coding radius.
Let us now proceed to give precise definitions. We begin by defining what a PCA is. For our purposes, a PCA is a discrete-time evolution on S Z d for some non-empty finite set S, which can be described as follows. Let (W v,i ) v∈Z d ,i∈Z be a collection of i.i.d. random variables taking values in a finite set A. Let F, F ⊂ Z d be finite and let f : We stress that different choices of W v,i and f could give rise to the same time evolutions (i.e., the same distribution), however, for our purposes, a PCA is the data of the distribution of the W v,i , the sets F and F and the function f . In particular, we note that a PCA comes equipped with a simultaneous coupling of the time evolutions started from all starting states ξ. We remark that the usual definition of a PCA requires that F = {0}, in which case, conditioned on {ω v,i } v , the random variables {ω v,i+1 } v are mutually independent. For the above approach to the construction of finitary codings, we will have F = F = N (0) ∪ {0} (recall that N (0) is the neighborhood of the origin), in which case, there are local conditional dependencies.
A PCA is said to be ergodic if there exists a probability measure µ on S Z d such that, for any starting state ξ, the distribution of (ω v,i ) v∈Z d converges weakly to µ as i → ∞. An ergodic PCA converging to µ can be used to obtain an approximate sample from µ| Λ , the marginal of µ on a finite subset Λ of Z d , by running the time evolution of the PCA until some large time t and observing the restricted process (ω v,t ) v∈Λ at that time, noting also that the latter is determined by a finite collection of random variables, namely, As is usual in these situations, determining how large t should be in order to obtain a sample whose distribution is close to the limiting distribution, is not an easy task. One way around this is to devise a method to exactly sample from the limiting distribution. Coupling-from-the-past provides such a method, at the cost, however, of requiring a type of uniform ergodicity. To define this notion, we first extend the definition given in (2) of the time evolution of the PCA to allow starting at any integer time as follows. The time evolution started from ξ We say that an ergodic PCA is uniformly ergodic if is almost surely finite for all v. We remind the reader that in our definitions, a PCA always comes equipped with a function f , so that the notion of uniform ergodicity depends on this f . While this might not be the standard notion of uniform ergodicity, it will be the relevant one for us. We also remark that for monotone PCAs, ergodicity implies uniform ergodicity (see [5,Lemma 3.5]). We say that an ergodic PCA is exponentially uniformly ergodic if τ v has exponential tails. For a uniformly ergodic PCA, we define the random field ω * = (ω * v ) v∈Z d by noting that this is almost surely well-defined and does not depend on ξ. We point out that while earlier we needed W v,i with i ≥ 0, for (5) and (6) we use W v,i with i < 0.
The following proposition encompasses the essence of coupling-from-the-past (in its infinitevolume version). An analogous statement for monotone ergodic PCAs was shown in [5] and a similar statement for PCAs arising from high-noise Markov random fields was shown in [10]. The proofs of these statements are easily adapted to the setting described here, and we include a short proof for completeness. Proposition 2.3. Suppose µ is the limiting distribution of a uniformly ergodic PCA having time evolution ω. Then ω * has distribution µ.
The proposition implies that the limiting distribution of a uniformly ergodic PCA is ffiid. Indeed, a moment of thought reveals that (4)-(6) describe such a finitary coding from the process ((W v,i ) i<0 ) v∈Z d . Moreover, if the PCA is exponentially uniformly ergodic, then the coding radius of this coding has exponential tails, so that the limiting distribution is in fact ffiid with exponential tails. This establishes Theorem 2.2. The point is, however, that this coding is not from a finitevalued process. Restricting the i.i.d. process to be finite-valued, while keeping control of the coding radius, is the missing step in order to establish Theorem 2.1 and is what most of the remainder of the paper is devoted to. Before coming back to this in the next section, we explain how to deduce the results of Section 1 from Theorem 2.1.
We now prove Theorem 1.1, Theorem 1.7 and Theorem 1.9. In light of Theorem 2.1, this boils down to showing that in each case the corresponding measure is the limiting distribution of an exponentially uniformly ergodic PCA.
2.1. The Ising model -proof of Theorem 1.1. To deduce from Theorem 2.1 that the subcritical Ising measure is fv-ffiid with stretched-exponential tails, we must know that it is the limiting distribution of an exponentially uniformly ergodic PCA. This was shown by van den Berg and Steif (see the proof of Theorem 4.1 in [5]) who relied on a deep result of Martinelli and Olivieri [18] about the continuous-time Glauber dynamics for the Ising model (see also [5,Proposition 4.2]).  Although for our purposes, we only need to know that a PCA as in Proposition 2.4 exists and its details are not important for us, in order to provide the reader with a full picture for the case of the Ising model, we nevertheless give a complete and formal description of the PCA used in the proof of Proposition 2.4 (but written in a slightly different way than in [5]). In fact, we have already given an informal description of this PCA in the beginning of Section 2. To define it precisely, set To complete the description of the PCA, we must also describe the distribution of the i.i.d. random variables (W v,i ) v∈Z d ,i∈Z . We let each W v,i consist of a pair of independent random variables, the first of which is a Bernoulli random variable with parameter, say, 1/2, and the second of which has the distribution of W, where W takes values in {−2d, . . . , 2d + 1} and satisfies Observe that such a random variable exists since (p k ) −2d≤k≤2d is increasing. Recalling (1), one may easily verify that any Gibbs measure for the Ising model on Z d at inverse temperature β is a stationary measure for this PCA. We note that this PCA is monotonic in the sense that f (η, (φ, ψ)) ≤ f (η , (φ, ψ)) for any (φ, ψ) and (η, η ) such that η v ≤ η v for all v ∈ F , and we remark that due to this monotonicity, Proposition 2.4 is essentially a statement about the probability that the value of the spin at the origin after time t depends on whether the starting state is the constant plus or constant minus state -namely, that this probability is exponentially small in t.

2.2.
High-noise Markov random fields -proof of Theorem 1.9. To deduce Theorem 1.9 from Theorem 2.1, we need to know that a translation-invariant high-noise Markov random field is the limiting distribution of an exponentially uniformly ergodic PCA. This was shown by Häggström and Steif in [10] (essentially Proposition 2.1 there). Given this proposition, Theorem 1.9 is an immediate corollary of Theorem 2.1.

2.3.
Proper colorings -proof of Theorem 1.7. Theorem 1.7 will follow from Theorem 2.1 once we establish the following. Proposition 2.6. Let d ≥ 2 and q ≥ 4d(d + 1). Let µ be the unique Gibbs measure for proper q-colorings of Z d . Then µ is the limiting distribution of an exponentially uniformly ergodic PCA.
Proof. The proof uses ideas of Huber [13,14] for exact sampling of proper colorings on finite graphs and ideas of Häggström and Steif [10] from the proof of Proposition 2.5. The proof of the latter proposition uses an auxiliary PCA on a larger space (which the authors there call a super-PCA), which "bounds" the original PCA simultaneously for all starting states, and thus allows to "detect" when the original PCA has coalesced. Huber used a similar idea (which he called bounding chains), together with model-specific arguments, to provide an exact sampling algorithm for proper colorings (and other models) on a finite graph. Putting these ideas together, we show how this can be done for proper colorings of Z d .
We first describe the PCA in words: at each time step, every vertex is independently set to active or inactive with some fixed probability, and every active vertex which has no active neighbors then updates its color to be uniformly chosen from the set of colors not appearing at any of its neighbors. More precisely, a uniform permutation of the colors is chosen, and the first color not appearing at any neighbor is chosen. This PCA may be realized as follows. Let S := {1, . . . , q} and let S q be the symmetric group on S.
The time evolution ω of this PCA is then given by (4), where the i.i.d. random variables (W v,i ) are chosen to be uniformly distributed over A, so that each W v,i represents an unbiased coin toss (the unbiasedness will not be important for us) and an independent uniformly chosen permutation of the colors. It is straightforward to check that any Gibbs measure for proper q-colorings is a stationary distribution for this PCA.
To show that this PCA is exponentially uniformly ergodic, we use the method of bounding chains discussed above. Consider the following PCA (or super-PCA in the language of [10]) on (2 S ) Z d given byf : The time evolutionω of this PCA is then defined as in (4), using the same random variables (W v,i ) as above, so that the two PCAs are coupled, with the crucial property thatω bounds ω in the sense that whereξ is the maximal element in (2 S ) Z d defined byξ v := 2 S for all v ∈ Z d . In particular, recalling (5), we have It therefore suffices to show thatτ v has exponential tails. We begin by observing that, for t ≥ 0, where p t is of course independent of v. To ease notation, let us denote Y t (v) :=ωξ ,0 v,t . Let α denote the probability that a vertex is updated in any given time step, i.e., α = β(1 − β) 2d , where β is the probability that a vertex is activated (for an unbiased coin toss, we have β = 1/2 and α = 2 −2d−1 , but this will not be used). Note that |ĝ(η, π)| ≤ 2d + 1 for anyη ∈ (2 S ) N (0) and π ∈ S q . Hence, since |Y t (u)| > 2d + 1 if and only if u has never been updated by time t.
Let us see what happens when the origin is updated. Let D := u∈N (0) Y t (u) be the set of colors which may appear in some neighbor of 0, and let D := u∈N (0),|Yt(u)|=1 Y t (u) be those colors which are known to appear in some neighbor of 0. Observe that if π ∈ S q is such that g(D , π) / ∈ D, then g(D, π) = g(D , π) andĝ((Y t )| N (0) , π) = {g(D , π)}. Thus, given Y t and given that 0 is updated at time t + 1, the probability that |Y t+1 (0)| > 1 is at most the probability that the g(D , π) ∈ D. When π ∈ S q is chosen uniformly, g(D , π) is uniformly distributed in S \ D , so that the latter probability . This shows that Together with (7), this yields Thus, p t decays exponentially in t when q ≥ 4d(d + 1), and Proposition 2.6 follows.

A general result and proof of Theorem 2.1
In this section, we introduce a general result which will allow us to deduce Theorem 2.1. This result is an abstract tool and is not, a priori, related to the problems originally discussed in Section 1.
Let X = (X v,i ) v∈Z d ,i≥0 be a process taking values in a finite set S. Let B = (B n ) n≥0 be a strictly increasing sequence of subsets of Z d × N with B 0 := {(0, 0)}, and consider the associated σ-algebras An N-valued random field τ = (τ v ) v∈Z d is said to be a B-stopping-process for X if, for every v, τ v is an almost surely finite stopping time with respect to the filtration (F n v ) n≥0 . When we say that such a stopping-process is stationary, we shall mean that the same stopping rule is used at every vertex (rather than just meaning that its law is translation-invariant). Given a B-stopping-process τ , we denote by X τ the random field Note that (X τ ) v takes values in the finite-configuration space n≥0 S Bn . We say that B is linear if ∆ n := max max{|u|, i} : (u, i) ∈ B n ≤ ∆n for some ∆ ≥ 1 and all n ≥ 0.
be a finite-valued i.i.d. process, let B be linear and let τ be a stationary B-stopping-process for X. Suppose τ v has exponential tails and E|B τv | < M for some integer M . Then X τ is a finitary factor of ((X v,i ) 0≤i<M ) v∈Z d with stretched-exponential tails.
Before using Proposition 3.1 to prove Theorem 2.1, we briefly explain the proposition and how it relates to the setting of the theorem. Recall that, given a uniformly ergodic PCA, (4)-(6) explicitly express the random field ω * as a finitary factor of the i.i.d. process ((W v,i ) i<0 ) v∈Z d , defined via certain stopping times. Moreover, it is clear from this and from (3) that the value of the output ω * u for any given u depends only on the variables W v,i within a certain "cone" in space-time emanating from (u, 0) (this is because as one goes back in time, the spatial dependency grows linearly). The above setup generalizes this situation to an abstract setting (which has nothing to do with coupling-from-the-past or PCAs), where the sequence (B n ) n replaces the cones arising from (3), the stopping process replaces the coupling-from-the-past stopping times given in (5), and the variables . With this interpretation in mind, for any given u, we may think of (X τ ) u as containing all the variables that are "needed" for the computation of the output at u, and the proposition states (ignoring the tails of τ v and the coding) that if, on average, the number of variables needed to compute the output at a given vertex is less than M , then one can "emulate" the process X τ (consisting of all the needed variables) from a process which has precisely M variables at each vertex. In other words, if one has an algorithm which can a priori need access to any number of variables at a given vertex, but typically does not need many such variables, then by "transporting" variables from one space-time location to another as needed, it is possible to rewrite the algorithm in such a way that it only has access to a bounded number of variables at each vertex. We note that we continue to refer to Z d × N as space-time, although the interpretation of N as a time dimension is perhaps less proper.
Proof of Theorem 2.1. Suppose that µ is the limiting distribution of an exponentially uniformly ergodic PCA with time evolution ω, defined via variables (W v,i ), sets F and F , and function f . By Proposition 2.3, it suffices to show that ω * , defined by (6), is fv-ffiid with stretched-exponential tails.
Recall the definition of τ v from (5) and the definition of ∆ from (3). By definition of τ v and (3) (or rather the analogue of (3) for the time evolution started at time −t and run up to time 0), the value of ω * v is a deterministic function of the variables (W v+u,−i ) |u|≤∆i,0≤i≤τv (actually, the variable W v,0 corresponding to i = 0 is not needed, but we include it nevertheless). Moreover, this function does not depend on v, in the sense that, for some deterministic function ψ, we have that that ω * is a finitary factor of X τ with coding radius 0. It therefore suffices to show that X τ is fv-ffiid with stretched-exponential tails. Indeed, letting M be any integer larger than E|B τv |, Proposition 3.1 yields that X τ is a finitary factor of (( process, this yields the required coding for ω * . Our method of proof of Proposition 3.1 gives a slightly stronger result. We call σ a simple stopping-process if it is a B * -stopping-process, where B * is defined by B * n := {0} × {0, 1, . . . , n}. In this case, X σ can unambiguously be thought of as (X v,i ) v∈Z d ,0≤i≤σv .
be a finite-valued i.i.d. process, let B be linear, let τ be a stationary B-stopping-process for X and σ a stationary simple stopping-process for X. Suppose τ v has exponential tails and E|B τv | < Eσ v + 1. Then X τ is a finitary factor of X σ with stretchedexponential tails.
Note that, since σ is simple, the condition E|B τv | < Eσ v + 1 may be more naturally written as E|B τv | < E|B * σv |. Proposition 3.1 is the special case of Proposition 3.2 in which σ is taken to be the deterministic simple stopping-process given by σ v = M − 1 for all v. The rest of the paper is devoted to the proof of Proposition 3.2. Remark 3.3. One may make slight modifications to the proof of the proposition to obtain various improvements. For instance, the same conclusion holds under the weaker assumptions that τ v has only stretched-exponential tails and that ∆ n grows polynomially fast in n. In fact, one could even allow somewhat heavier tails and faster growing ∆ n at the expense of obtaining a coding radius with heavier tails. This is true even to the extent that, with no assumptions on the tails of τ v or on the growth of ∆ n , the conclusion still holds albeit with no information on the coding radius. On the other hand, under the stronger assumption that τ is also a simple stopping-process, the coding radius can be shown to have exponential tails.
Remark 3.4. Proposition 3.2 holds also for random simple stopping-processes σ (though we do not allow randomness in the B-stopping-process τ ), provided the randomness is made independent for each vertex in the following sense: There exists an i.i.d. process X = (X v ) v∈Z d , independent of X, such that, for each v, σ v is an almost surely finite stopping time with respect to the filtration is the smallest σ-algebra containing F n v and the one generated by X v . By working conditionally on X , the proofs go through essentially unchanged.

The algorithm
In this section, we provide the algorithm used to construct the finitary coding stated in Proposition 3.2. We then use it in Section 5 to prove the proposition.
Throughout this section, we work in the setting of Proposition 3.2 so that X = (X v,i ) v∈Z d ,i≥0 is an i.i.d. process taking values in a finite set S, B is linear, τ is a stationary B-stopping-process for X and σ is a stationary simple stopping-process for X. In addition, τ v has exponential tails and E|B τv | < Eσ v + 1. We may also assume without loss of generality that σ v is bounded.
We construct an algorithm which, given a realization Y of the "source" process X σ , deterministicly computes an output Z having the distribution of the "target" process X τ . In the special case where σ v = M − 1 deterministically for all v (as in Proposition 3.1), we could imagine that there is a single space-time landscape, initially containing variables in the subset Z d × {0, 1, . . . , M − 1} of spacetime, and that these variables may be "transported" from their original locations to new locations as needed to construct Z. However, in general, as σ is a stopping-process, we do not know which subset of space-time initially contains variables unless we expose some of the variables, but this would bias them and so we could not easily use them to construct Z. Thus, instead of revealing the entire random field Y at once, the algorithm slowly reveals more and more of Y as is needed to generate more and more of Z. As both Y and Z are realizations of stopping-processes, it is convenient to think that the input to the algorithm is in fact a realization of X, which the algorithm uses to simultaneously construct both Y and Z in such a manner that the variables of X used to construct Z are a subset of those used to construct Y. We may thus imagine that there are in fact three space-time landscapes: one corresponding to the original process X, one to the source process Y, and one to the target process Z. When the algorithm wishes to reveal an additional piece of Y, the required variable is easily generated -it is simply read from the same location in the X process. On the other hand, when an additional piece of Z needs to be generated, it must be matched to a variable used by Y. Here comes into play the crucial assumption that E|B τv | < Eσ v + 1, which ensures that Z uses less variables than Y on average. Thus, from the point of view of Z, as any variable used by Y is "available" to be used by Z, there are many available variables (much more than needed) for Z, and one needs only to find a suitable way of "transporting" these from the source to the target.
We call the variables of the source process Y inputs and the variables of the target process Z outputs. We stress that transporting a variable from (u, i) to (w, j) simply means that the source location (u, i) and target location (w, j) are matched to one another so that the input Y u,i and the output Z w,j are identified. Moreover, when we say that an input is generated, say at location (u, i), we simply mean that the corresponding variable X u,i is revealed and identified with Y u,i , and when we say that an output is generated, say at location (u, i), we mean that a suitable input is transported to (u, i).
The algorithm consists of a "simulator" for each vertex v ∈ Z d , which has an associated source location (thought of as a space-time location of Y) and target location (thought of as a space-time location of Z). At each time step n, the simulators simultaneously execute a common procedure (this will guarantee that any output of the algorithm is translation-equivariant). The goal of the v-simulator is to ensure that its stopping time τ v is reached (with respect to the target process Z) and that all relevant outputs for Z v (i.e., those corresponding to space-time locations in v + B τv ) have been generated (that is, to determine an integer t v ≥ 0 and a configuration ξ ∈ S v+Bt v for which τ v (ξ) = t v ). Once this happens, v will be "satisfied", the final output Z v will be known, and the v-simulator will remain idle; until then, the v-simulator will be in a constant state of searching (in that its source location will change at every time step), trying to find an unused input at the source location which it can transport to the target location. We note that there is a complex interplay between the different simulators. On the one hand, they are competing for shared resources, namely, the inputs. On the other hand, as different sites v may rely on common outputs in order to compute their final output Z v , the simulators may occasionally "unintentionally help" each other reach their goals (as long as it helps them too) by generating an output which is also required by another simulator (though we do not exploit this in the proof). This is in fact the origin of some complications, which presumably cannot be avoided. Our algorithm is inspired partly by the algorithms in [5,12] (see Section 4.4 for a comparison between our algorithm and the one in [5]).

4.1.
Informal description of the algorithm. The goal of the v-simulator is to make sure that the final output Z v becomes known after some finite number of steps. To do this, the v-simulator proceeds as follows: Initially, at time step n = 0, it reveals the variable X v,0 , which corresponds to the single space-time location in v + B 0 = {(v, 0)} (see Section 4.3 for a formal definition of sets of the form v + A). It then consults the stopping rule τ v to see whether or not it should continue. If it has reached the stopping time, i.e., τ v = 0, then the final output is known, namely, Z v is the element in S B 0 given by (Z v ) (v,0) = X v,0 , so that the simulator is satisfied and can stop. If it has not reached the stopping time, i.e., τ v > 0, its next goal becomes to generate the outputs in v + (B 1 \ B 0 ). Let us come back to how this is done in a moment. Once these have been generated (which may require many steps of the algorithm), the v-simulator consults the stopping rule τ v again, this time to check whether τ v = 1. If indeed τ v = 1, then it is satisfied and the final output is known, namely, Z v is an element in S B 1 given by the generated outputs at space-time locations v + B 1 . If instead τ v > 1, the v-simulator continues in a similar manner, with the general rule being that once the v-simulator learns that τ v > k, it continues to generate the outputs in v + (B k+1 \ B k ), and then to check the stopping rule in order to determine whether or not it should continue. Eventually the stopping time is reached, the final output is known and the v-simulator is satisfied.
Let us now explain how the v-simulator generates the outputs in v + (B k \ B k−1 ). Firstly, it does so one output at a time (in an arbitrary order), and so we merely focus on how it generates a single output at space-time location (w, j). Of course, one way to do this is simply to use the original variable residing at that location, namely, X w,j . However, since we want to obtain a coding from X σ , we must be sure to only use inputs (those variables residing in the scope of the source process), i.e., we cannot use X w,j unless σ w ≥ j. We also cannot use an input if it has already been used (transported away) by some other simulator at a previous time. Thus, we may need to search for an input at a different location (u, i) and transport it from there to (w, j). Roughly speaking, the simulator moves along the space-time landscape of the source process, checking to see whether there is an unused input which it can transport to the target location (w, j). At every time step, it checks a single source location (u, i). If the input at that location is not available for use, the simulator simply advances its current source location, and does nothing further in that step of the algorithm. This procedure is repeated until the simulator eventually finds an unused input that it can transport. At that time, assuming the required output has not meanwhile been generated by another simulator, it transports it. Either way, the output at (w, j) is sure to have been generated by the end of that step.
Of course, as we are trying to construct a coding, the above procedure must be carried out simultaneously by all the simulators. This leads to some interaction between the different simulators. Let us now give some more specific details about this and the above procedure. We first explain how the v-simulator behaves with regards to the source process in each step: • If the simulator is satisfied, it does nothing. If it is unsatisfied, it will necessarily move its source location and it does so as follows. It first tries to move up one step in the pile of the vertex u it is currently at. If it cannot, i.e., if it is already at the top of an exhausted pile (in the sense that the stopping time σ u has been reached), then it moves to the bottom of the pile located one step to the right of u (i.e., to u + e 1 ). Here we informally refer to the inputs at locations (u, i) as the pile at u, and think of the pile there as initially empty and then growing as inputs there are generated until it becomes exhausted (i.e., until it reaches its full size given by the stopping time σ u ). • The above choice implies that if the v-simulator is at the top of a pile which has not yet been exhausted (we shall later call such a pile loaded ), then the input just above the top of that pile has not yet been used/revealed by any simulator. Thus, it is an unbiased input (having the same distribution as X 0,0 ) and is available to be transported. In this situation, regardless of whether or not it is indeed transported, the source location is moved one step up the pile. • We initially set the v-simulator's source location to be (v, −1) so that it is necessarily at the top of a loaded pile when the algorithm starts. • Let us point out that when the pile sizes are deterministically fixed (as in the situation of Proposition 3.1), the evolution of the source location is also deterministic (up to knowing at what time the simulator becomes satisfied and stops). However, in general, as σ is a simple stopping-process, the evolution is random: to decide whether or not a pile is exhausted, we must inspect the variables in the pile. • We could have chosen different conventions here. Our choice has the advantage that there cannot be more than one unsatisfied simulator at any location at any given time. This means that we do not need to worry about different simulators trying to transport the same input. Next, we explain how the v-simulator behaves with regards to the target process: • If the simulator is satisfied, it does nothing. If it is unsatisfied, it might move its target location and it might not. Specifically, it moves precisely when its source is at the top of a loaded pile. Indeed, when this happens, we are assured that the required output can be generated. Moreover, when it moves, it moves to the next element in v + B ∞ , where the elements of B ∞ = n≥0 B n are ordered in any way which respects the inclusions B 0 ⊂ B 1 ⊂ · · · . • Note that the times at which the target location changes is completely determined by the source. In particular, even if the output at the target location has been previously generated by some other simulator, this does not mean that the v-simulator will necessarily advance its target location. In other words, the output at the target location may have already been generated, and it may take the simulator many more steps until it finds an unused input (i.e., its source is at the top of a loaded pile), only to realize at that point in time that it does not need it after all (in which case that input will be wasted -it will not be transported later). This is not the most efficient choice, but it is the one we make. • We point out that, unlike for the source, there may be many different simulators at a given target location at the same time. This situation just means that the different simulators all wish to generate the same output. Among these simulators, many may also be at the top of a loaded pile (in the source), which means that they can transport an input. Thus, we must take care that different simulators do not generate the same output. We must therefore prioritize the simulators in some manner. To this end, we simply make the choice that the lexicographical-minimal simulator (among those at the top of a loaded pile) takes priority, namely, it is the one to generate the output, while the others do not transport an input (note that this is again not the most efficient way to do things, since we are throwing away inputs which could have been used later, but this is not too wasteful and we simply made a choice which we found convenient).
We emphasize that the algorithm may transport an input away from a certain location at some point in time, and then transport some other input into that same location at a later point in time. That is, even if eventually there is an input at location (u, i) (in the sense that σ u ≥ i) and the output at that same location is eventually needed by some simulator (in the sense that (u, i) ∈ v + B τv for some v), there is no guarantee that the variable that will eventually end up to be the output at (u, i) is the one that was originally the input there. The important property is that any given input can only be transported away once, and any given output can only be generated (i.e., transported into) once. This is another reason it is helpful to imagine separate space-time landscapes for the source and target processes.
We refer the reader to Figure 1 for an illustration of the algorithm.

4.2.
Further explanation of the figure. Figure 1 illustrates the first several steps of the algorithm. The figure contains a detailed caption, and here we provide some additional information.
Let us first address the setting considered in the figure. Of course we consider d = 1 as it would be difficult to provide a useful picture for two of more dimensions. On the other hand, the specific B n considered there is not essential, and the reason for that choice was to allow the simulators to "climb up" in a short number of steps. We note that this choice for B n may be regarded as a simplification of what would be used for the one-dimensional case of Theorem 2.1 (since the B n are only "one-sided cones", whereas the theorem would require symmetric "two-sided cones").
Let us now consider the evolution of the simulators throughout the steps depicted in the figure. Initially, the source and target locations of each v-simulator are set to (v, −1) and (v, 0), respectively. This means that v-simulator is currently trying to generate the output at location (v, 0) and it is currently looking for an unused input (which it would like to transport) just above the source location (v, −1), namely, at (v, 0). Indeed, initially there is always an unused input there (since σ v ≥ 0 by assumption). This situation is depicted at the top of the figure. Thus, at step n = 1 of the algorithm, every v-simulator moves its source location one step up the pile to (v, 0), (vacuously) transports the input from (v, 0) to (v, 0), and advances its target location to (v, 1) (note that (0, 1) is the successor of (0, 0) in the chosen ordering of B ∞ ). At this stage, some simulators have already become green (satisfied) and thus have τ v = 0 -these are simulators 0, 3 and 8. Let us follow what happens next to simulator 1 (which is still unsatisfied): since the 1-simulator is yellow (it is at the top of a loaded pile), it moves its source location one step up the pile to (1, 1), (vacuously) transports the input from (1, 1) to (1,1), and advances its target location to (2, 1) (because (1, 1) follows (0, 1) in the order on B ∞ ). Since at the end of step 2, the 1-simulator is red (it is no longer in a loaded pile), in step 3 it does not change its target location and simply moves its source location to the bottom of the next pile, which is (2, 0). Since it is still red, in step 4 it again only moves its source location, this time to (3,0). We stress that even though, at the end of step 3, the output at the 1-simulator's target location (2, 1) has already been generated (it was transported from location (3, 0) by the 2-simulator in step 3), the 1-simulator still does not advance its target location; it will only do so once it becomes yellow. Finally, since the 1-simulator is still red at the end of step 4, in the next step (which is not depicted in the figure) it will move its source location one step up the pile to (3,1). At this stage, we still do not know the eventual value of τ 1 , we only know that τ 1 ≥ 1 (since there is an output in 1 + B 1 which is needed). Similarly, at the end of step 4, we know that Left: The source process Y n and the source locations (U n v , I n v ) of the simulators. A gray background at space-time location (u, i) indicates that the input Y n u,i has been generated. An × indicates an unloaded vertex, while a question mark indicates a loaded vertex. Right: The target process Z n and the target locations (W n v , J n v ) of the simulators. A gray background at space-time location (w, j) indicates that the output Z n w,j has been generated. Simulators: The simulators are depicted in green, yellow or red according to whether they are satisfied, unsatisfied but at the top of a loaded pile, or otherwise. A green simulator does not move as it has finished running (case (i) in the algorithm). A yellow simulator advances its source location by moving up one step in its current pile, reads the unused input at that new location, transports this input to its current target location (if needed), and then advances its target location by moving to the "next place in line" according to the ordering on B ∞ (case (iv) in the algorithm). We note that when two yellow simulators occupy the same target location, only one of them actually generates the output (i.e., transports an input to that location). A red simulator does not have access to an unused input, and so it advances its source location by either moving up the current pile if it is not yet at the top (case (ii) in the algorithm) or otherwise by moving to the bottom of the next pile (case (iii) in the algorithm), while its target location remains unchanged. In particular, a red simulator does not advance its target location even if the corresponding output is (or was previously) generated by a different simulator. See Section 4.2 for further details about the figure. τ 0 = τ 3 = τ 8 = 0, τ 2 = τ 7 = 1, τ 6 ≥ 1, τ 4 ≥ 2 and τ 5 ≥ 2. In particular, we know the final output for vertices {0, 2, 3, 7, 8}, but not yet for {1, 4, 5, 6}.

4.3.
Formal definition of the algorithm. Before providing the algorithm, we require some preparation.
Let us employ the following useful convention regarding stopping times. Suppose that π is an almost surely finite stopping time with respect to the filtration (F n 0 ) n≥0 defined in (8). We may regard π as a deterministic function from n≥0 S Bn to N ∪ { * } having the property that, for any n ≥ 0, ξ ∈ S Bn and ξ ∈ S B n+1 such that ξ | Bn = ξ, we have π(ξ) ∈ {0, . . . , n, * }, we have π(ξ ) = π(ξ) when π(ξ) = * , and we have π(ξ ) ∈ {n + 1, * } when π(ξ) = * . The interpretation here is that a value of * means that the stopping time has not been reached. Note, in particular, that for m ≥ 0 and η ∈ S B n+m , the expression π(η) > n depends only on η| Bn (where it is understood that * > n for all integer n). With this in mind, we note that (with a slight abuse of notation), if A ⊂ Z d × N contains B n , then the expression π(η) ≤ n is well-defined for any η ∈ S A and depends only on η| Bn , and thus, the expression b ∈ B π(η) is also well-defined for any b ∈ B n+1 (and depends only on η| Bn ). We further abuse notation by identifying an element η ∈ (S ∪{∅}) Z d with the element η ∈ S A in the obvious way, by taking A := {a ∈ Z d × N : η(a) = ∅} and η := η| A .
We order the elements of B ∞ := n≥0 B n in such a manner that, for any n, every element of B n appears before every element of B ∞ \ B n . This induces a notion of successor for elements in At each step n ≥ 0, we define variables: • (U n v , I n v ) ∈ Z d × N, the source location of the v-simulator. • (W n v , J n v ) ∈ Z d × N, the target location of the v-simulator. • T n v ∈ {0, 1}, the indicator of whether the v-simulator transported input (generated output). Once the above variables are defined at step n, we further define several objects, all of which are deterministic functions of the above variables. For some of these definitions to make sense, it is important to note that the following properties are satisfied at every step n: Equation (10) says that each input is transported away at most once by at most one simulator. Similarly, (11) says that every output is generated (transported into) at most once by at most one simulator. As the target location of the v-simulator will be updated immediately after the required output is generated, there is a shift in the time index in (11). Thus, T n v = 1 means that at time step n the v-simulator transported an input from the source location (U n v , I n v ) to the target location (W n−1 v , J n−1 v ), thus generating the output at (W n−1 v , J n−1 v ). Consider the set of source-target locations of simulators at transport times: We use D n and X to construct two (S ∪ {∅})-valued processes Y n = (Y n u,i ) u∈Z d ,i≥0 and Z n = (Z n w,j ) w∈Z d ,j≥0 , which represent the partial information on Y and Z (the realizations of X σ and X τ ) that has been revealed by time n. The algorithm may use one of two "update methods": for (v, u, i, w, j) ∈ D n , we define (A) Y n u,i = Z n w,j := X u,i , (B) Y n u,i = Z n w,j := X w,j . If (u, i) is not in the projection of D n on the 2nd and 3rd coordinates, then set Y n u,i := ∅, and similarly, if (w, j) is not in the projection of D n on the 4th and 5th coordinates, then set Z n w,j := ∅. Note that (10) and (11) ensure that both update methods are well-defined. We stress that the two update methods are never used in conjunction with one another -either update method (A) is used throughout all steps of the algorithm or update method (B) is.
The Y process is associated with σ, and the Z process with τ . Thus, in update method (A), σ "sees" the original process X, while τ "sees" a transformed process in which inputs have been transported between space-time locations; in update method (B), the situation is reversed -τ sees the origin process and σ sees a transformed process. Another point of view is that D n (after forgetting the first coordinate) defines a bipartite graph between two copies of Z d × N in which any vertex of one copy is matched to at most one vertex in the other copy. The two update methods can then be thought of as orienting all edges from the first copy to the second, or vice versa, where the orientation of an edge determines the direction of flow of information, with the original process X always associated with the copy from which the edges are oriented outwards (so that variables are transported along the edges in the direction of orientation). As we are interested in realizing the τ process via the σ process, the natural choice is to transport variables from the latter to the former as in update method (A). Nevertheless, it will turn out to be a helpful idea to consider also the reversed direction of flow. Thus, update method (A) will yield the required coding, whereas update method (B) will only be used as a comparison tool in the analysis (namely in the proof of Lemma 5.3). As such, we mainly have update method (A) in mind in our definitions.
We further define • L n u := max{i : (v, u, i, w, j) ∈ D n for some v, w, j}, the last input revealed at u.
. Thus, L n u is the size of the pile at u (in the source process) at time n. A vertex u is loaded at time n if there are more inputs available at u than have already been used by time n, i.e., if the pile at u has not been exhausted by time n. A vertex v is satisfied at time n if the output at the target location of the v-simulator is not needed in order to compute the final output Z v . In particular, due to way that the target location evolves, this implies (but is not precisely equivalent to) that the outputs that v needs for its final output have already been generated by time n (see (13) below), so that the final output is known at this time.
The fact that the notions of loaded and satisfied are well-defined is not obvious from their definitions. The fact that the notion of loaded is well-defined follows from the above discussion about stopping times and the following property which will hold at each step n: Similarly, the fact that the notion of satisfied is well-defined follows from the following property, which will hold for all n: Finally, for (w, j) ∈ Z d × N, we also define Thus, Q n (w, j) consists of those simulators who both wish to generate the output at (w, j) and can also do so (they wish to do so as they are unsatisfied, meaning that they need that output, and as the output has not yet been generated; they can do so as they are at the top of a loaded pile in the source process). Since only one such simulator can be allowed to actually generate the output at (w, j), we will let the lexicographical-minimal one do so.
With these definitions, we can now present the algorithm. We refer the reader to Section 4.1 for an informal description and to Figure 1 for an illustration.
Algorithm Finitary coding from X σ to X τ )) end if end for end for 4.4. Comparison between our algorithm and that of van den Berg and Steif in [5]. The two algorithms are similar in spirit (though they are not set up in the same way) and we focus here on the moral differences between the two. We have identified two such differences, the primary one being in how they relate to unneeded variables and, consequently, in how they transport such variables between space-time locations. Here, "needed" may refer to either an input or an output, where an input (output) at location (u, i) is needed by time n if Y n u,i = ∅ (Z n u,i = ∅). Roughly speaking, the algorithm in [5] declares an input variable unneeded at a certain time once it is guaranteed that the output variable at the same location will not be needed at any later time (and was also not needed until that time). Only inputs which are marked as unneeded in this sense are allowed to be transported. On the other hand, our algorithm never declares an input variable unneeded. Instead, we only concern ourselves with whether an input was not needed by a certain time, and any such variable is allowed to be transported at that time. If at a later time it turns out that the output at the same location was needed after all, another input variable will be transported to that location. In other words, the algorithm in [5] transports an input from location (u, i) to another location (w, j) only if the output at (u, i) is never needed, whereas our algorithm does not have this restriction, and may transport from (u, i) to (w, j) at some time, and then from (u , i ) to (u, i) at a later time. The latter approach is essential in the generality of Proposition 3.1 and Proposition 3.2. The reason is that, while for some choices of B = (B n ) n , any particular output variable could only be potentially needed by finitely many vertices (e.g., as for the "cones" used in the proof of Theorem 2.1, where the output at (w, j) can only be needed by vertices at distance at most ∆j from w), in general, any vertex might need that variable at some time (e.g., as for the "cubes" given by B n = {(u, i) : |u| ≤ ∆n, 0 ≤ i ≤ n}) so that it is not possible to know (in a finitary manner) whether or not an output variable will be needed eventually. The second difference between the algorithms is that, unlike the algorithm in [5], ours is somewhat wasteful (by design; see Section 4.1) in that in certain situations it decides not to use an available input variable (and to simply throw it away). We found this useful (though it is probably not essential) for keeping track of how far variables are transported, which was important for understanding the coding radius.

Proof of Proposition 3.2
In this section, we use the algorithm described in Section 4 to prove Proposition 3.2.
The following claim establishes some simple properties of the algorithm. Let denote the partial order on Z d in which u u if u = u + ke 1 for some k ≥ 0. We also denote by the partial order Proof. The claim follows easily by induction on n.
The following lemma states precisely the intuitive fact that transporting inputs from one spacetime location to another does not change the resulting distribution. Denote the state at time n by S n := (U n , I n , W n , J n , T n , D n ), where U n = (U n v ) v∈Z d , I n = (I n v ) v∈Z d and so forth.
Lemma 5.2. The distribution of (S n , Y n , Z n ) n≥0 does not depend on whether update method (A) or (B) is used in the algorithm.
Proof. Observe that the algorithm does not explicitly depend on the update method used, but rather depends on it implicitly through the definitions of Y n and Z n . We prove by induction that the distribution of S n := (S m , Y m , Z m ) 0≤m≤n does not depend on the update method. This is immediate for n = 0, since S 0 is deterministic. Fix n ≥ 1 and observe that S n is measurable with respect to S n−1 . It thus suffices to show that (i) when using update method (A), conditioned on S n−1 , (X u,i ) (v,u,i,w,j)∈D n \D n−1 is a sequence of independent random variables having the distribution of X 0,0 , and (ii) when using update method (B), conditioned on S n−1 , (X w,j ) (v,u,i,w,j)∈D n \D n−1 is such a sequence. Indeed, (i) follows easily from (10) and (ii) from (11).
As will be explained in the proof of Proposition 3.2 below, Lemma 5.3 implies that the algorithm "locally terminates" in finite time in the sense that the final output at any vertex is determined at some finite step. Nevertheless, this does not yet imply that the algorithm yields a finitary coding. What is missing is some control on the propagation of information in each step. This is the content of the following lemma. Let ∆ be as in (9). Denote D n v := {(u, i, w, j) : (v, u, i, w, j) ∈ D n }.
Lemma 5.4. When using update method (A), for any n ≥ 0 and v ∈ Z d , the following random variables are measurable with respect to (X u,i ) |u−v|≤5∆n 2 ,0≤i≤σu : Since all the operations in the algorithm are translation-equivariant, we have thus obtained a coding from X σ to X τ .
Let us check that this coding is finitary and that its coding radius R has stretched-exponential tails. Indeed, since Lemma 5.4 implies that {N 0 ≤ n} and Z n 0 are measurable with respect to (X u,i ) |u|≤5∆n 2 ,0≤i≤σu , it follows that R ≤ 5∆N 2 0 . Lemma 5.3 then yields that P R > 5∆n 2 ≤ P N 0 > n = P 0 is not satisfied at time n = e −Ω n 1/(d+2) .

5.2.
Proof of Lemma 5.3. For the proof of Lemma 5.3, we require a large-deviation-type result, which we now describe. Let X = (X i ) i∈Z be a sequence of non-negative random variables. We say that X is stopping-like if there exists ∆ > 0 such that for any finite I, J ⊂ Z and any non-negative numbers (r i ) i∈I∪J , the two events {X i > r i for i ∈ I} and {X j > r j for j ∈ J} are independent whenever the two sets i∈I [i − ∆r i , i + ∆r i ] and j∈J [j − ∆r j , j + ∆r j ] are disjoint. Observe that, if there exists a sequence (Y i ) i∈Z of independent random variables satisfying that, for any i ∈ Z and any r ≥ 0, the event {X i > r} is measurable with respect to {Y j } |i−j|≤∆r , then X is stoppinglike. Observe also that, if X is a stopping-like process, then (X i 1 {X i ≤r} ) i∈Z is a 2∆r-dependent process for any r > 0, where a process (Y i ) i∈Z is said to be k-dependent if (Y i ) i∈I and (Y j ) j∈J are independent whenever I, J ⊂ Z satisfy that |i − j| > k for all i ∈ I and j ∈ J.
Thus, it suffices to bound separately the two terms on the right, showing that each is e −Ω(n β ) . For the first term, we prove the stronger bound P Y 1 + · · · + Y n ≥ µn + n = e −Ω(n 1−2α/β ) .
For 0 ≤ i ≤ n, denote Note that (Z i ) 0≤i≤n is a martingale satisfying Z 0 = (EY 0 )n ≤ µn and Z n = Y 1 + · · · + Y n . Hence, by the Azuma-Hoeffding inequality (see, e.g., [24]), Thus, (14) will follow if we show that c i ≤ Cn α/β . Indeed, since Y is a Bn αb -bounded 2∆n α -dependent process, We now turn to the second term. Note that {Y 1 + · · · + Y n ≥ n} ⊂ E ∪ F , where For I ⊂ Z, denote d(I) := min{|i − j| : i, j ∈ I, i = j}. Since, for any I ⊂ Z and integer d ≥ 1, there exists a subset I ⊂ I such that |I | ≥ |I|/d and d(I ) ≥ d, we obtain P(E) ≤ P ∃I ⊂ I, |I| = ( n) β 2n α +2 , d(I) ≥ 2n α + 1 . Since the events {X i > r} i∈I are independent for any finite I ⊂ Z and 0 ≤ r < d(I)/2, we have Finally, it is immediate that P(F ) ≤ n · P(X 0 ≥ 1 B ( n) β ) = e −Ω(n β ) . Remark 5.6. The bound in Lemma 5.5 is tight, as the following simple example shows. Let (Y i ) i∈Z be independent unbiased coin tosses, and let X i be the length of the streak of heads containing position i, i.e., X i := max{k +m : Y j = 1 for i−k ≤ j < i+m, k, m ≥ 0}. Clearly, X is a stationary sequence (in fact, it is ffiid with exponential tails) and X 0 has exponential tails. Moreover, since X i is a stopping time with respect to ({Y j } |j−i|≤n ) n , it follows that X is stopping-like (with ∆ = 1). On the other hand, P(X b 1 + · · · + X b n ≥ an) ≥ P(Y 1 = · · · = Y (an) β = 1) = 2 − (an) β . Proof of Lemma 5.3. As Lemma 5.2 implies that both update methods yield the same probability for the event in question, we may assume here that update method (B) is used in the algorithm. Denote L ∞ w := sup n L n w . Let L denote the vertices which remain loaded indefinitely. Let v ∈ Z d . For an integer i, we write v + i for the element v + ie 1 ∈ Z d . Let us check that if, for some k ≥ 0, then v is satisfied at time N k . Assume towards a contradiction that v is not satisfied at time N k . Then, by Claim 5.1, .
In particular, the set of times T := {1 ≤ n ≤ N k : U n v ≺ U n+1 v } at which the v-simulator moved its source location to the right is of size |T | ≥ k + 1. Moreover, since, for 1 ≤ n ≤ N k , n ∈ T if and only if case (iii) of the algorithm is executed at step n for the vertex v, which in turn occurs only if U n−1 v / ∈ L and I n−1 v = L n−1 Note that if the input from some location (u, i) is transported by some v -simulator by time n (i.e., (v , u, i, w, j) ∈ D n for some (w, j)), then v u and every v such that v ≺ v u must be satisfied at time n. Thus, since by step N k +1, N k inputs were transported from locations (u, i) with v u v + k, but no more than M k inputs were transported by v -simulators with v v v + k, it follows that v is satisfied at time N k , which is a contradiction. Hence, v is satisfied at time N k .
We have thus shown that P(v is not satisfied at time n) ≤ P ∀k ≥ 0 (L k = ∅ and N k ≤ M k ) or N k > n . Using that L n v ≤ σ v (Y n ) ≤ m almost surely for some m ≥ 1, and taking k = n 4m , we get P(v is not satisfied at time n) ≤ P L n 4m = ∅ and N n 4m ≤ M n 4m . Let a be such that Eσ v + 1 > a > E|B τv |, and note that P L n 4m = ∅ and N n 4m ≤ M n 4m ≤ P L n 4m = ∅ and N n 4m ≤ an 4m + P M n 4m ≥ an 4m . It remains to bound the terms on the right-hand side. Note that, if u / ∈ L then L ∞ u = σ u (X). Thus, since (σ v+i (X) + 1) i∈Z is an i.i.d. sequence of bounded random variables with expectation strictly larger than a, standard large deviation bounds yield that P L n/4m = ∅ and N n/4m ≤ an 4m is exponentially small in n (alternatively, we could appeal to Lemma 5.5 with the sequence (m − σ v+i (X)) i∈Z to obtain the required stretched-exponential bound). Towards establishing the bound on the second term, observe that, by (8) and (9), (τ v+i ) i∈Z is a non-negative stopping-like stationary sequence with exponential tails. Thus, since |B n | = O(n d+1 ) by (9), Lemma 5.5 implies that P M n/4m ≥ an 4m ≤ e −Ω n 1/(d+2) as n → ∞.

5.3.
Proof of Lemma 5.4. Let F v,r denote the σ-algebra generated by (X u,i ) |u−v|≤r,0≤i≤σu . Set r 0 := 0 and let r n denote the smallest integer r > r n−1 for which the random variables stated in the lemma are F v,r -measurable. To prove the lemma, it suffices to show that r n ≤ r n−1 + 5∆n for n ≥ 1, as this implies that r n ≤ 5∆n 2 . We henceforth abbreviate "is F-measurable" to "is in F". Let n ≥ 1 and denote r := r n−1 . We aim to show that S n v and {Z n w,j } |w−v|≤∆n,j≥0 are in F v,r+5∆n , using that S n−1 v and {Z n−1 w,j } |w−v|≤∆(n−1),j≥0 are in F v,r . Throughout the proof, we repeatedly make use of the following easily verifiable properties: v U n v v + ne 1 and (W n v , J n v ) ∈ v + B n for all v ∈ Z d and n ≥ 0, which, in particular, by (9), imply that |U n v − v| ≤ n and |W n v − v| ≤ ∆n for all v ∈ Z d and n ≥ 0.
Step 1: Consider step n of the algorithm for v and let C n v ∈ {i, . . . , iv} denote which case of the algorithm was executed. Let us show that C n v is in F v,r+n . To this end, we first check that the event ) and {Z n−1 w,j } (w,j)∈v+B n−1 , both of which are in F v,r by the definition of r and by (9).
Next, let us check that L n−1 for all v , it suffices to check that the event {(v , u, i, w, j) ∈ D n−1 for some w, j} is in F v,r+n for any v − ne 1 v u ≺ v + ne 1 and any i. Indeed, this event is in F v ,r ⊂ F v,r+|v −v| ⊂ F v,r+n . Finally, we check that the event that U n−1 v is loaded at time n − 1 is in F v,r+n . Note that, by (12) Step 2: Observe that (U n v , I n v , W n v , J n v ) is in F v,r+n . Indeed, this follows from step 1, since in any case of the algorithm, (U n v , I n v , W n v , J n v ) is a deterministic function of (U n−1 v , I n−1 v , W n−1 v , J n−1 v ).
Step 3: Let us check that T n v is in F v,r+3∆n . To this end, it suffices to check that the event that v is the lexicographical-minimal element of Q n−1 (W n−1 v , J n−1 v ) is in F v,r+3∆n , as T n v equals 1 in this case and 0 otherwise. For this, it suffices to check that the set Q n−1 (W n−1 v , J n−1 v ) itself is in F v,r+3∆n . Since (W n−1 v , J n−1 v ) is in F v,r and since |W n−1 v − v| ≤ ∆(n − 1), it suffices to check that Q n−1 (w, j) is in F v,r+3∆n for any (w, j) such that |w − v| ≤ ∆n. Fix such a (w, j). We need to show that the event {v ∈ Q n−1 (w, j)} is in F v ,r+3∆n for any v ∈ Z d . Fix v and note that v / ∈ Q n−1 (w, j) unless |v − w| ≤ ∆(n − 1). Thus, we may assume that |v − w| ≤ ∆(n − 1). Then, by what we have shown in step 1 and since Z n−1 w,j is in F v ,r (by the definition of r), the event {v ∈ Q n−1 (w, j)} is in F v ,r+n ⊂ F w,r+n+|v −w| . We have therefore shown that Q n−1 (w, j) is in F w,r+2∆n ⊂ F v,r+2∆n+|w−v| ⊂ F v,r+3∆n .
Step 4: Observe that S n v is in F v,r+3∆n . Indeed, since D n v is determined by {(U t v , I t v , W t v , J t v , T t v )} 1≤t≤n , this follows immediately from steps 2 and 3.

Open questions
We have shown in Theorem 1.1 that the sub-critical Ising measure is fv-ffiid with stretchedexponential tails, and we know from Remark 1.3 that it is also ffiid with exponential tails. The following question naturally arises: Question 6.1. Let d ≥ 2 and let µ be the unique Gibbs measure for the Ising model on Z d at inverse temperature β < β c (d). Is µ fv-ffiid with exponential tails?
A similar situation occurs in the more general setting of PCAs considered in Section 2, where we have shown in Theorem 2.1 that the limiting distribution of an exponentially uniformly ergodic PCA is fv-ffiid with stretched-exponential tails, and we know from Theorem 2.2 that it is also ffiid with exponential tails. A positive answer to the following natural question would yield a positive answer to the previous one: Question 6.2. Let µ be the limiting distribution of an exponentially uniformly ergodic PCA (as defined in Section 2). Is µ fv-ffiid with exponential tails?
As we have mentioned in remarks after Theorem 1.1, the critical Ising measure is known to be ffiid, but is not known to be fv-ffiid. This question was raised by van den Berg and Steif [5, Question 1] and we reiterate it here: Question 6.3. Let d ≥ 2 and let µ be the unique Gibbs measure for the Ising model on Z d at the critical inverse temperature β = β c (d). Is µ fv-ffiid?