Ergodicity of some classes of cellular automata subject to noise

Cellular automata (CA) are dynamical systems on symbolic configurations on the lattice. They are also used as models of massively parallel computers. As dynamical systems, one would like to understand the effect of small random perturbations on the dynamics of CA. As models of computation, they can be used to study the reliability of computation against noise. We consider various families of CA (nilpotent, permutive, gliders, CA with a spreading symbol, surjective, algebraic) and prove that they are highly unstable against noise, meaning that they forget their initial conditions under slightest positive noise. This is manifested as the ergodicity of the resulting probabilistic CA. The proofs involve a collection of different techniques (couplings, entropy, Fourier analysis), depending on the dynamical properties of the underlying deterministic CA and the type of noise.


Introduction
Consider a configuration of symbols (or colors) from a finite set S on the sites of the hypercubic lattice Z d . A cellular automaton (CA) is a dynamical system on such configurations, obtained by iterating a local update rule simultaneously at every site of the lattice. When the updates are random, we have a Markov process called a probabilistic cellular automaton (PCA): at each time step, the new symbol at each site is randomly updated, independently of the others, according to a distribution prescribed by the current pattern of symbols on a finite collection of neighbouring sites.
CA and PCA have been widely studied with various motivations [60,56,58,21,9,61,38,34,1,7,52,46,44]. Despite the multiplicity of viewpoints, a central problem is to describe the asymptotic behaviour of the system and its dependence on the initial condition. Indeed, even when the local behaviour is simple, the global behaviour is generally difficult to predict, and there are only few CA or PCA for which we have a complete and explicit description of the asymptotic behaviour.
The most basic question about the asymptotic behaviour of a PCA is its ergodicity. A PCA is said to be ergodic if it asymptotically "forgets" its initial condition, meaning that the distribution of its configuration always converges to one and the same distribution regardless of the initial condition. In other words, a PCA is ergodic if its action on probability measures has a unique fixed point that attracts all the other measures. This paper concerns the ergodicity problem for the family of PCA obtained by perturbing CA with noise.
In computer science, deterministic CA are used as models of massively parallel computers (see e.g., [21] and the relevant chapters of [1] and [52]). In order to study the reliability of computation against noise, one is interested in the effect of small random perturbations on the dynamics of the CA. A prerequisite for the ability to perform computation reliably in presence of noise is that the system should be able to remember at least one bit of information from its input, for otherwise the output will be pure noise and independent of the input. Thus, a CA that becomes ergodic when perturbed by noise cannot serve as a fault-tolerant computer in presence of noise.
From the perspective of probability theory, noisy CA constitute a class of PCA that are close to being deterministic. In models originating from statistical physics, low noise corresponds to low temperature, and the study of low-noise PCA shares the same kind of challenges as in low-temperature models. In particular, the ergodicity question in the low-noise regime is closely related to the question of presence or absence of phase transition at low temperature [36,23,39,12]. From a more abstract point of view, CA form a rich class of topological dynamical systems, and the introduction of random perturbations allows one to study probabilistic notions of sensitivity and stability. For deterministic CA, the common tools for describing the possible asymptotic behaviour require the update rule to have specific algebraic or combinatorial structure. One approach to analyze the asymptotic distribution is to interpret the dynamics in terms of "particles" that move and interact [2,3,18,37,26]. An alternative approach relies on the CA to have an algebraic structure [41,50,17,28]. For deterministic CA, ergodicity is equivalent to nilpotency, a property which is algorithmically undecidable [33]. Nonetheless, nilpotent CA are not so widespread and a typical CA often exhibits different asymptotic behaviour depending on its initial condition. In fact, using the computation capabilities of CA, one can design deterministic CA having about any behaviour wanted [27].
The case of PCA is quite different: constructing a CA whose trajectories remain distinguishable under the influence of noise is a notoriously difficult problem. Most CA seem to be highly unstable against noise, meaning that they forget their initial conditions under slightest positive noise. This is manifested as the ergodicity of the resulting PCA. The only known example of a one-dimensional CA that remains non-ergodic under sufficiently small positive noise has a sophisticated construction due to Peter Gács [19,20]. In higher dimensions, a family of examples is provided by Andrei Toom [57], but the problem remains highly non-trivial.
A variety of tools have been developed to study the ergodicity of PCA. However, most of these tools only allow to handle PCA for which all the transition probabilities are sufficiently large (i.e., the high-noise/high-temperature regime). In particular, ergodicity is often difficult to prove for noisy CA when the noise is small, even in cases where it appears clear from heuristics or simulations. Consider for instance the simplest case of a one-dimensional PCA with binary alphabet and neighbourhood of size 2, under the additional assumption of left-right symmetry of the update rule. Such a PCA is identified by three parameters. The standard methods can be used to handle more than 90% of the volume of the cube [0, 1] 3 where the parameters lie [58,Chap. 7]. However, when approaching some edges of the cube, none of the known criteria for ergodicity holds, although one may expect the ergodicity to be the norm, as soon as the parameters belong to the interior of the cube.
To understand the frontiers between ergodicity and non-ergodicity, we pursue the program of identifying dynamical and combinatorial properties for a CA that guarantee the ergodicity of its random perturbations. We prove the ergodicity of various families of CA (nilpotent, permutive, gliders, CA with spreading symbols, surjective, algebraic) subject to noise, using a collection of different techniques (couplings, entropy, Fourier analysis).
The results are summarized in Section 2.4. Section 2.1 is dedicated to notation and terminology. In Section 2.2, we discuss the notion of ergodicity and prove two general results regarding the unique invariant measure of ergodic PCA. The various models of noise considered in this paper are introduced in Section 2.3. The ergodicity results are divided into three sections based on the method of proof they use: the coupling method (Sec. 3), the entropy method (Sec. 4) and the Fourier analysis method (Sec. 5). We conclude with some open problems in Section 6.

Notation and terminology
We shall generally refer to the book by Kůrka [38] and the survey by Kari [34] for background on deterministic cellular automata and to the surveys by Toom et al. [58] and Mairesse and Marcovici [46] for background on probabilistic cellular automata.
Let S be a finite set of symbols, and d ≥ 1 an integer. A configuration on the d-dimensional lattice Z d is a map x : Z d → S assigning a symbol x k from S to each site k of Z d . We will often denote the set of all configurations by X S Z d . A map F : X → X is a cellular automaton (CA) on X if there exist n 1 , n 2 , . . . , n m ∈ Z d and f : S m → S such that (F x) k f (x k+n1 , . . . , x k+nm ) (1) for each x ∈ X and k ∈ Z d . The function f : S m → S is called the local rule of the CA and the set N {n 1 , n 2 , . . . , n m } its neighbourhood. The set N (k) k + N = {k + n : n ∈ N } consists of the neighbours of site k. We also introduce N 0 {0}, and N t+1 N t + N = {a + b ; a ∈ N t , b ∈ N } for t ≥ 0, so that N t can be thought of as the neighbourhood of the CA F t . Similarly, we define N t (k) k + N t .
The restriction of a configuration x to a set K ⊆ Z d is denoted by x K . The translation (or shift) by a ∈ Z d is the map σ a : X → X defined by (σ a x) k x a+k for each k ∈ Z d .
The set X of configurations is equipped with the product topology. If K ⊆ Z d is a finite set and y K ∈ S K , we call the set [y K ] {x ∈ S Z d : x k = y k for all k ∈ K} a cylinder with base K. Each cylinder set is both open and closed, and the collection of all cylinder sets is a countable basis for the product topology of X . According to the Curtis-Hedlund-Lyndon theorem (see e.g. [38,Thm. 5.2]), the CA on X are precisely identified by the maps F : X → X that are continuous and commute with all translations.
For a probabilistic cellular automaton (PCA) the local rule is randomized, and is independently applied at every site. More specifically, the local rule of a PCA is a stochastic matrix ϕ : S m ×S → [0, 1], so that b∈S ϕ(a 1 , a 2 , . . . , a m )(b) = 1 for each a 1 , a 2 , . . . , a m ∈ S. Starting from a configuration x, the symbol at each site k is updated at random according to the distribution ϕ(x k+n1 , x k+n2 , . . . , x k+nm )(·), independently of the other sites. This is described by a transition kernel Φ, where Φ(x, [y K ]) k∈K ϕ(x k+n1 , x k+n2 , . . . , x k+nm )(y k ) (2) for every configuration x ∈ X and each cylinder set [y K ]. An evolution (or trajectory) of the PCA is a Markov process with transition kernel Φ, that is, is a sequence X 0 , X 1 , . . . of random configurations satisfying almost surely for every cylinder set [y K ] and every t ≥ 0. A bi-infinite evolution . . . , X −1 , X 0 , X 1 , . . . is defined similarly. A PCA has positive rates if its local rule is strictly positive, meaning that for each a 1 , a 2 , . . . , a m ∈ S and b ∈ S, we have ϕ(a 1 , a 2 , . . . , a m )(b) > 0. The set of all continuous observables h : X → C, denoted by C(X ), is a Banach space with the uniform norm h sup x∈X |h(x)|. A local observable is an observable h ∈ C(X ) that can be written as a linear combination of characteristic functions of cylinder sets, so that h(x) depends on the symbols at only finitely many sites. The local observables form a dense linear subspace of C(X ), which we shall denote by C 0 (X ).
The set of all Borel probability measures on X is denoted by M (X ). A measure µ ∈ M (X ) is uniquely determined by the probabilities it associates to cylinder sets. Furthermore, a sequence µ 1 , µ 2 , . . . ∈ M (X ) of probability measures converges weakly to another measure µ ∈ M (X ) if and only if µ n (E) → µ(E) for every cylinder set E ⊆ X . With the weak topology, the space M (X ) is compact and metrizable. Let F A denote the sub-σ-algebra of the Borel sets generated by the cylinder sets with base A. We denote by the total variation distance between the restrictions of µ and ν to F A . This is the distance between the distributions of X A and Y A where X and Y are random configurations distributed according to µ and ν.
A PCA kernel Φ naturally defines two continuous linear operators, one on M (X ) and the other on C(X ). Following the usual convention (e.g. [58]), we write Φ on the right-hand side of measures and on the left-hand side of observables. Given a measure µ, we denote by µΦ the measure defined by We will be concerned with PCA that are close to being deterministic. We say that a PCA Φ is an ε-perturbation of a deterministic CA F if Φ and F share the same alphabet S, and have a common neighbourhood N for which their local rules satisfy ϕ(a 1 , a 2 , . . . , a m ) f (a 1 , a 2 , . . . , a m ) ≥ 1 − ε for all a 1 , a 2 , . . . , a m ∈ S, meaning that under Φ, a deviation from F may occur independently at each site with probability at most ε. In other words, Φ is an for every configuration x ∈ X and every finite set K ⊆ Z d .

Ergodicity
The compactness of M (X ) ensures that every PCA has at least one invariant measure (see e.g. [58,Prop. 2.5]). The non-empty set of invariant measures for a PCA is closed and convex. A PCA Φ is ergodic if it has a unique invariant measure π that attracts every initial measure µ, in the sense that µΦ t → π weakly as t → ∞. Note that a PCA with a unique invariant measure may not be ergodic [8] (see also [32]). When the convergence is uniform among all initial measures, equivalently, when Φ t ( · , [u]) → π([u]) uniformly for each cylinder set [u], we say that Φ is uniformly ergodic. It is not known whether a PCA could exist that is ergodic but not uniformly ergodic. We conjecture that every ergodic PCA is uniformly ergodic. 1 Observe that the unique invariant measure of an ergodic PCA is shift-invariant, that is, π •σ −k = π for every k ∈ Z d . In view of the result of Goldstein et al. [23], it seems plausible that the unique invariant measure of a positive-rate ergodic PCA is always spatially mixing (≡ mixing under the shift action), that is, for every two cylinder sets [u] and [v]. The spatial mixing of the invariant measure is known for certain classes of ergodic PCA (see e.g. [39,55,45,43,10,6], as well as [59,47,31] in which the unique invariant measure is explicitly known). However, even the weaker condition of spatial ergodicity (≡ ergodicity under the shift action) is not known for general ergodic PCA. We now present a more general condition that guarantees the spatial mixing of the unique invariant measure. Let Φ be a uniformly ergodic PCA with unique invariant measure π. Then, for every finite set A ⊆ Z d , the (maximum) distance from stationarity on A at time t, decreases to 0 as t → ∞. Roughly speaking, the next proposition shows that if the speed at which d A (t) approaches zero depends only on |A|, but not on the shape of A, then the unique invariant measure is spatially mixing.
Proposition 2.1 (Spatial mixing of unique invariant measure). Let Φ be a uniformly ergodic PCA, and for each finite set A ⊆ Z d , let d A (t) denote the distance from stationarity on A at time t. Suppose there is a family of functions ρ n (t), n ∈ N such that d A (t) ≤ ρ |A| (t) and ρ n (t) → 0 as t → ∞. Then, the unique invariant measure of Φ is spatially mixing.
Proof. Let π be the unique invariant measure of Φ and N the neighbourhood of its local rule. Consider two finite patterns u ∈ S A and v ∈ S B , and let k ∈ Z d . Then, This is because, given x, the random choices used to determine the patterns on A and k + B at time t are independent (see Fig. 1). Thus, choosing Observe that t k → ∞ as k ∞ → ∞. Hence, All the ergodic PCA appearing in this paper satisfy the hypothesis of the above proposition. It would be interesting to know whether the unique invariant measures of these PCA satisfy any stronger mixing property, such as the strong mixing property of extremal Gibbs measures [22,Sec. 7.1]. A similar argument as above shows that under the hypothesis of Proposition 2.1, the unique invariant measure is spatially k-fold mixing for all k. We conjecture that for all the classes of ergodic PCA studied in this paper (with the possible exception of those in Section 3.5), the unique invariant measure is in fact measure-theoretically isomorphic to a Bernoulli process (see [55,4]).
A rather different type of question about a probability measure on X is whether the probabilities it associates to cylinder sets can be computed by an algorithm. It turns out that the unique invariant measure of an ergodic PCA is always computable provided the transition probabilities of the PCA are computable numbers. A real number x is said to be computable if it can be approximated with arbitrary accuracy using an algorithm, that is if there exists a computable function f x : N → Q such that |f x (n) − x| < 1 /n for all n ∈ N. We say that a PCA Φ is computable if the values of its local rule ϕ are computable real numbers. Let S # denote the set of patterns )| < 1 /n for every u ∈ S # and n ∈ N. Observe that if Φ is a computable PCA and µ is a computable measure, then µΦ is also a computable measure.
Proposition 2.2 (Computability of unique invariant measure). Let Φ be a computable PCA with a unique invariant measure π. Then, π is computable.
Proof. We first present the sketch of the proof and then get into more details. Let w ∈ S A be a finite pattern and suppose we want to approximate π([w]) within accuracy 1 /n. The idea is that for every finite set B ⊇ A, we can approximately identify the set of measures that are close to being invariant when restricted to the σ-algebra F B of events happening on B. More specifically, for B ⊇ A and m ∈ N, let We will show that given B and m ≥ n, we can algorithmically generate a finite set R B,m of representatives from Q B,m such that for every µ ∈ Q B,3m , there is a ν ∈ R B,m with ν − µ B < 1 /(3m) < 1 /(2n). A compactness argument will show that for all sufficiently large B and m, every two measures ν, ν ∈ R B,m associate approximately the same probabilities to the cylinder set [w], namely |ν ([w]) − ν([w])| < 1 /(2n). Since π ∈ Q B,3m , it will then follow that for each ν ∈ R B,m , the value ν([w]) approximates π([w]) with accuracy 1 /n. More precisely, the algorithm thus proceeds as follows. Denote by I k [−k, k] d ∩ Z d the centered hypercube of size (2k + 1) d in Z d . We choose m 0 such that I m0 ⊇ A. For m = m 0 , m 0 + 1, . . . we generate a set R Im,m with the above-mentioned property and calculate ε max{|ν ([w]) − ν([w])| : ν, ν ∈ R Im,m }. Once ε < 1 /(2n), we stop and return ν([w]) for an arbitrarily chosen element of R Im,m .
Let us first show that ε will eventually become smaller than 1 /(2n). Indeed, suppose that for every m, there are two elements µ m , µ m ∈ Q Im,m such that |µ m ([w]) − µ m ([w])| ≥ 1 /(2n). By compactness, there is a sequence m 1 < m 2 < . . . such that µ mi converges weakly to a measure µ and µ mi converges weakly to a measure µ . Clearly |µ ([w]) − µ([w])| ≥ 1 /(2n) and in particular, µ = µ. On the other hand, from the definition of Q B,m , it follows that both µ and µ must be invariant under Φ, hence a contradiction with the uniqueness of the invariant measure.
It remains to show that for each B and m, a set R B,m with the prescribed properties can be generated. The PCA Φ induces an affine mapping from probability measures on S N (B) to probability measures on S B . It follows easily that µ Φ − µΦ B ≤ µ − µ N (B) for every µ, µ ∈ M (X ). Fix an arbitrary symbol ∈ S, and for finite C ⊆ Z d and k ∈ N, define where δ x denotes the Dirac measure centered at x. When restricted to F C , the elements of M C,k are precisely those measures whose probabilities are rational with denominator k. In particular, for every µ ∈ M (X ), there exists a measure ν ∈ M C,k such that ν − µ C < |S| |C| /k. Given B and m, construct the set For this measure, we have which means ν ∈ R B,m . Hence, R B,m has the desired properties.

Models of noise
In this article, we study the ergodicity problem for perturbations of deterministic CA. We mainly focus on perturbations obtained when adding random and independent errors to the updates of a deterministic CA. The transition probabilities of the resulting PCA will thus have the form Φ(x, E) Θ(F x, E), where F is the deterministic CA and Θ a noise kernel. We call such a perturbation a noisy version of F . The noise kernel is itself assumed to be a PCA transition kernel (albeit a simple one) so that the updates of the symbols at distinct sites are independent. The noise is said to be positive if its kernel has positive rates.
Zero-range noise. 2) concern zerorange noise. Various classes of zero-range noise will be considered, each with its own interpretation. Each of our proof techniques will be well suited for some of these noise models.
Memoryless noise. A zero-range noise is memoryless if its noise matrix can be written as θ(a, b) = (1 − ε)δ a (b) + εq(b), where 0 ≤ ε ≤ 1, q is a probability distribution on S, and δ a is the distribution with unit mass at a. Under a memoryless noise, a symbol is erased with probability ε and replaced with an independent random symbol drawn from distribution q. We call ε the error probability and q the replacement distribution of the noise.
Additive noise. Suppose that the alphabet S is identified with a finite Abelian group (G, +). Under an additive noise with noise distribution q, each symbol a is replaced with a symbol a+N , where N is an S-valued random variable with distribution q. The noise matrix can thus be written as θ(a, b) q(b−a) for each a, b ∈ S.
Permutation noise. The permutation noise is an extension of additive noise, where each symbol a is replaced with a symbol ς(a), where ς is a random permutation of S drawn according to a fixed distribution q. Observe that the noise matrix of a permutation noise can be written as a convex combination of permutation matrices A ς (a, b) δ ς(a) (b), and therefore is a doubly-stochastic matrix. Conversely, the Birkhoff-von Neumann theorem implies that every zero-range noise with a doubly-stochastic matrix is in fact a permutation noise. In particular, a permutation noise is precisely a zero-range noise that preserves the uniform distribution on S. The notion of noise in a weakly symmetric communication channel (see [11]) is a special case of the permutation noise.
Birth-death noise. In some of our examples (see Sections 3.5), the alphabet has the form S = {0, 1} n , where 1 and 0 represent the presence and absence of "particles" or "walls" at n different "layers" of the system. Under a birth-death noise particles/walls appear and disappear independently at each layer, thus the noise matrix has the form for a = (a 1 , a 2 , . . . , a n ) and b = (b 1 , b 2 , . . . , b n ) in S.

Summary of results
We prove several results regarding the ergodicity of noisy CA. Each result concerns a class of CA with a specific dynamical property subject to a specific type of noise. The results are divided into three categories, depending on the type of tools used in their proofs: coupling arguments (Sec. 3), entropy (Sec. 4) and Fourier analysis (Sec. 5). The ergodicity in the high noise regime (Thm. 3.5) is rather standard and can be proven using various approaches. Here we present a coupling proof using the so-called envelope PCA (introduced in [6]) which we find most elegant. The nilpotent CA are special in that they are ergodic in absence of noise. A coupling argument will show that the ergodicity persists for small perturbations of nilpotent CA (Thm. 3.9). The ergodicity of a CA that has a spreading symbol is intuitively plausible. We provide three different proofs (Thms. 3.10, 3.11 and 5.3) each with a different model of noise and having a different degree of generality. Theorems 3.12 and 3.14 concern the ergodicity of simple systems of "particles" (or "gliders") moving and interacting on the lattice, where the noise occasionally destroys particles or creates new ones. The ergodicity of permutive CA subject to permutation noise (Thm. 3.16) is a special case of a result of Vasilyev [59]. The argument is based on the identification of a certain finite-state time-inhomogeneous Markov chain that is hidden inside the model. We also present an alternative (though similar) argument using entropy (Sec. 4.4). Surjective CA constitute a broad class of CA (including e.g., those addressed in Theorems 3.14, 3.16 and 5.1). For general surjective CA with additive noise, we are only able to prove "ergodicity modulo shift" (Thm. 4.1), that is the convergence towards equilibrium when the starting measure is shift-invariant. The ergodicity of the XOR CA subject to noise (Thm. 5.1) is an application of the Fourier analysis approach to the ergodicity problem developed by Toom et al. [58,Chap. 4].
Aside from cases VII and VIII in which the invariant measures are explicitly known, in all the classes of ergodic PCA treated in this paper, the unique invariant measure is spatially mixing and computable. The computability of the unique invariant measure holds in general, as demonstrated in Proposition 2.2. The spatial mixing is proven in each case with the help of Proposition 2.1. Furthermore, Proposition 3.3 below provides a perfect sampling algorithm for the unique invariant measure in cases I-IV and VI.
Let us remark that except for Theorems 3.11, 3.14 and 3.16, all the results in this paper are valid in any number of dimensions. The proof of Theorem 3.11 makes use of a result on oriented bond percolation in 1 + 1 dimensions, and thus relies crucially on the CA being one-dimensional. Nevertheless, it might be possible to use the same idea in higher dimensions. Theorems 3.14 and 3.16 are restricted to the one-dimensional case for expositional convenience. In higher dimensions, the definition of a permutive CA is more cumbersome.
For the sake of comparison, let us now recall an example of a simple CA which, in presence of small noise, remains non-ergodic. Needless to say, this example belongs to none of the CA families II-X mentioned in the above table.
In other words, in one iteration of T , each symbol on the lattice is replaced with the symbol that is in majority among its northern neighbour, eastern neighbour and itself. Observe that T is monotonic (i.e., switching some 0s into 1s in a configuration x may turn some 0s in T x into 1s but not the other way around) and symmetric with respect to 0 ↔ 1 exchange. Moreover, it can be shown that T has the erosion property on the all-0 configurations (and by symmetry, also on the all-1 configuration). Namely, T keeps the all-0 (resp., all-1) configuration unchanged, and if x is any configuration in which all but finitely many sites have symbol 0 (resp., symbol 1), then there is a finite time t for which T t x is the all-0 configuration (resp., the all-1 configuration). Toom [57] (see [58,Chaps. 9 and 10]) proved that for sufficiently small ε > 0, every ε-perturbation of the NEC-majority CA is non-ergodic. In fact, he showed that in any monotonic CA T , any homogeneous configuration z on which T has the erosion property is stable against perturbations in the sense that the trajectory of any small perturbation of T starting from z remains forever concentrated on configurations that agree with z on the great majority of sites.

Coupling method
Intuitively, a PCA is ergodic if it "forgets" its initial condition. In some cases, it is possible to prove ergodicity in a constructive fashion by means of a coupling, that is by running the process simultaneously from different initial conditions using a common source of randomness, and showing that all trajectories eventually merge.
In this section, we use coupling arguments to prove the ergodicity of some classes of noisy CA. The arguments for most of the results in this section (Secs. 3.2-3.4, 3.5.2) are based on "backward" couplings (a.k.a. coupling from the past). Only in Section 3.5.1 we use a "forward" coupling. The coupling in the last result (Sec. 3.6) is rather different and merges the trajectories only on a finite window.

Forward and backward couplings
Recall that a coupling of two probability measures µ and ν is simply a pair (X, Y ) of random variables defined on the same probability space such that X is distributed according to µ and Y is distributed according to ν. Couplings can be used to obtain upper bounds on the total variation distance between two measures. In the special case where µ, ν ∈ M (X ) are measures on the configuration space X , the inequality holds for every coupling (X, Y ) of µ and ν and each finite set A ⊆ Z d . This is known as the coupling inequality (see e.g. [42]). By a coupling of a PCA Φ we mean a coupling of two trajectories of Φ, that is, a sequence (X t , Y t ) t≥0 where both (X t ) t≥0 and (Y t ) t≥0 are distributed according to the evolution of the PCA Φ.
The following lemma is a basic tool for proving the ergodicity of a PCA.
Then, (µΦ t ) t≥0 converges weakly to π. Proof. For every finite set A ⊂ Z d , we have, by the coupling inequality which goes to 0 as t → ∞, meaning that (µΦ t ) t≥0 converges weakly to π.
Following the same idea, we have the following criterion for uniform ergodicity.
This CA satisfies F 12 (x) = 0 Z for all x ∈ {0, 1, 2} Z . Without noise the system dies out; the noise adds small local perturbations that do not propagate. Ergodicity is proven in Theorem 3.9.
Spreading CA (Sec. 3.4 and Sec. 5. 2) The local rule is given by F (x)i xi−1xixi+1 mod 3. Without noise, fractal patterns can appear but they are unstable because of the spreading symbol. Noise helps destroying these patterns by introducing the spreading symbol at random positions evenly distributed on the lattice. Ergodicity is established in Theorems 3.10 and 3.11.
Gliders with annihilation (Sec Then, the PCA is uniformly ergodic and its unique invariant measure is spatially mixing.
Proof. Let π be an invariant measure for Φ and µ any other measure. Following the argument of Lemma 3.1, for every two configurations x, y ∈ X and each finite set Integrating over x with respect to µ and over y with respect to π, we find that µΦ t − π A ≤ |A| ρ(t). Therefore, the PCA is uniformly ergodic with unique invariant measure π. Furthermore, d A (t) ≤ |A| ρ(t) and the spatial mixing of π follows from Proposition 2.1.

.5.2)
This CA consists of non-interacting particles moving with constant speed in between walls. The particles reflect upon hitting the walls. Without noise, the behaviour is very regular: the walls are static and the movement of each particle is periodic. Noise mixes things up. Theorem 3.14 shows the ergodicity. Since the CA is surjective, Theorem 4.1 also shows the ergodicity "modulo translations". The invariant measure is the uniform measure.
The local rule is given by The noisy version is ergodic by Theorems 3.16 or 4.1.

Additive CA
The local rule is given by F (x)i xi−1 + xi + xi+1 mod 3. This CA randomizes its initial condition even in absence of noise: starting from a sufficiently random configuration, its distribution converges to the uniform Bernoulli measure (see e.g. [41]). The ergodicity of the noise version is given by Theorem 3.16 or 4.1. See also Theorem 5.1. One way to couple the evolutions of a given PCA from two different initial configurations is to update the configurations iteratively using a common source of randomness. Let Φ be a PCA with local function ϕ. An update function for ϕ is a function f : S m × [0, 1] → S such that for all (a 1 , a 2 , . . . , a m ) ∈ S m and b ∈ S, we have whenever U is a random variable uniformly distributed over the unit interval [0, 1]. The update function together with a collection of independent random samples uniformly drawn from [0, 1] can be used to simulate the PCA. Let S [0, 1] Z d . Given an update function f , we define the global update map Ψ : X × S → X by For t ≥ 1, we recursively define Ψ t : X × S t → X by Ψ 1 (x; u) Ψ(x; u) and By construction, when U (U i ) i∈Z d is a collection of independent random variables uniformly distributed over [0, 1], the configuration Ψ(x; U ) is distributed according to measure Φ(x, ·). More generally, if U 1 , U 2 , . . . , U t are independent random configurations uniformly chosen from S, that is, if (U n i ) i∈Z d ,1≤n≤t is a collection of independent random variables uniformly distributed over [0, 1], then the sequence is distributed according to the evolution of Φ from time 0 to time t with initial configuration x. It is sometimes useful to simulate the PCA from the past. Let (U n i ) i∈Z d ,n∈N − be a collection of independent uniformly distributed random variables chosen from . . , U 0 ) can be interpreted as the configuration at time 0 obtained when simulating the PCA Φ from configuration x at time −t and using the random samples ( In words, p t (Φ) is the probability that, when we simulate Φ with configuration x at time −t and using the random samples (U n i ) i∈Z d ,n∈N − , the symbol at the origin at time 0 is independent of x. The following proposition provides another criterion for uniform ergodicity in terms of p t (Φ). Under the same criterion, one can algorithmically generate a perfect sample from the unique invariant measure of Φ. This is an adaptation to PCA of the coupling-from-the-past algorithm of Propp and Wilson [51], which is developed in [6]. In the present setting, a perfect sampling algorithm for a probability measure µ ∈ M (X ) is an algorithm that, given a finite set A ⊆ Z d and using an unbounded source of independent random samples uniformly drawn from [0, 1], outputs a random pattern W A such that P( Then, Φ is uniformly ergodic. Furthermore, the unique invariant measure of Φ is spatially mixing and has a perfect sampling algorithm. Proof. Let us imagine simulating the PCA Φ from time −t in the past up to time 0, starting from two configurations X −t and Y −t . We can couple the configurations obtained at time 0 by using a family U = (U n i ) i∈Z d ,n∈N − of independent uniform random samples from [0, 1], and setting X 0 Take X −t to be a fixed configuration x and choose Y −t at random, independently from U , according to an invariant measure π of the PCA. By the coupling inequality, for every finite set A ⊂ Z d , we have Since x is arbitrary and p t (Φ) → 1 as t → ∞, it follows that Φ is uniformly ergodic with unique invariant measure π. Furthermore, from (31) we get d A (t) ≤ |A| 1 − p t (Φ) . Therefore, the conditions of Proposition 2.1 are satisfied and π is spatially mixing.
Let us now present a perfect sampling algorithm for the unique invariant measure π of Φ. We assume that we have access to a family U = (U n i ) i∈Z d ,n∈N − of independent uniform random samples from [0, 1]. Let A be a finite subset of Z d . Since p t (Φ) → 1 as t → ∞, we know that almost surely, there exists an integer T ≥ 1 depending on U , such that the map x → Ψ T (x; U −T , U −T +1 , . . . , U 0 ) A is constant. This constant is distributed exactly according to π. More specifically, for a finite pattern w ∈ S A , the probability that Ψ T (x; U −T , U −T +1 , . . . , U 0 ) A = w is exactly π([w]). Furthermore, since Ψ t (x, U −t , U −t+1 , . . . , U 0 ) A depends only on x A+N t and on (U n i ) i∈A+N −n ,−t<n≤0 , we can indeed check for each t = 1, 2, . . . whether the function x → Ψ t (x, U −t , U −t+1 , . . . , U 0 ) A is constant or not.

The high-noise regime
In this section, we prove an ergodicity criterion holding in the high-noise regime. In particular, it gives a simple condition ensuring the ergodicity of deterministic CA when perturbed by a high enough zero-range noise (see Prop. 3.6 and its two corollaries).
Let Φ be a PCA with alphabet S, neibhourhood N = {n 1 , . . . , n m } and local rule ϕ. In order to prove the ergodicity of Φ using Proposition 3.3, we need to design an update function f : S m ×[0, 1] → S for which the dependence of f (a 1 , . . . , a m ; u) on (a 1 , . . . , a m ) ∈ S m is weak. A natural idea is to choose an update function with the property that for every b ∈ S, we have whenever U is a uniform sample from [0, 1]. In that case, with probability at least b∈S min a1,...,am∈S the knowledge of (a 1 , . . . , a m ) ∈ S m will not be used for computing the value f (a 1 , . . . , a m , U ). The notion of envelope PCA pursues this idea and provides a simple ergodicity criterion in the high-noise regime.
Instead of running the PCA from different initial configurations, we define a new PCA on an extended alphabet, containing a symbol ? representing sites whose values are not known (i.e., which may differ between the different copies) and we run it from a single initial configuration containing only the symbol ? . Each time we are able to make the different copies match on a site, the symbol ? is replaced by a symbol b ∈ S on which the different copies agree. An evolution of the envelope PCA thus encodes a coupling of different copies of the original PCA, with a symbol ? denoting sites where the copies disagree. If the density of symbol ? converges to 0 when time goes to infinity, it means that the original PCA is forgetting its initial condition, hence it is ergodic.
Let us now go into more details. We introduce a new alphabetS = S ∪ { ? }, containing an additional question mark symbol, and we define a partial order onS by declaring a ≺ ? for every a ∈ S. We say that a ∈ S is compatible with b ∈S if a b. The envelope of the PCA Φ is another PCAΦ with alphabetS, neighbourhood N and local ruleφ : for a 1 , . . . , a m ∈S and b ∈ S, where the minimum is taken over all a 1 , . . . , a m in S. The probability of transition to symbol ? is then given bỹ From a configuration x ∈S Z d , the symbol at site k is thus updated to a symbol b ∈ S with a probability that is the minimum of transition probabilities according to Φ to symbol b, among all possible neighbourhood patterns for site k that are compatible with x. With the remaining probability, the symbol at site k is updated to ? . The envelope PCA was introduced in [6] as a tool to prove the ergodicity of a PCA and to generate perfect samples from its unique invariant measure. While it is particularly convenient for the high-noise regime, the envelope PCA has also been successfully exploited to prove the ergodicity of some models in the low-noise regime [30]. Similar ideas have been pursued by others [16]. The idea of the envelope PCA is reminiscent of the minorant PCA introduced by Toom et al. [58,Chap. 3], which can be used in a more or less similar fashion to prove ergodicity in the high-noise regime.
The following corollary of Proposition 3.3, gives a sufficient condition for ergodicity in terms of the envelope PCA.
Lemma 3.4. Suppose that the densityΦ t ? Z d , [ ? ] of symbols ? at time t starting from the initial configuration ? Z d converges to 0 as t → ∞. Then, the PCA Φ is uniformly ergodic, and its unique invariant measure is spatially mixing and admits a perfect sampling algorithm.
The fact that the symbol ? dies out is equivalent to the ergodicity of the envelope PCAΦ, but the ergodicity of the original PCA Φ does not in general imply the ergodicity ofΦ. When the alphabet has more than two elements, the definition of the envelope PCA can be refined so as to keep more information about the possible values that a question mark symbol represents [6].
In the evolution of the envelope PCA, at each time step, the symbol at a site is updated to ? only if at least one of its neighbours is in state ? , and in that case, it becomes a ? with probability at most This quantity measures the dependence of the transition probabilities on the value of the neighbourhood.
Let us consider an oriented graph G describing the dependence relation between the sites in the space-time diagram of the PCA. The vertices of G are the elements of Z d × N, and there is an edge from (k, t) to ( , t + 1) if k ∈ + N . For a given parameter p ∈ [0, 1], the directed site percolation on G consists in declaring each site to be open with probability p and closed otherwise, independently for different sites. One can show that there is a critical value p c (N ) ∈ (0, 1), such that when p < p c (N ), there is almost surely no infinite open (oriented) path. By comparison with a branching process, one can easily show that p c (N ) ≥ 1/ |N |. In one dimension, the value of p c (N ) is known to be in [48]).
By dominating the appearances of symbol ? in the space-time diagram of the envelope PCA by a directed site percolation with parameter p ? (Φ), one proves that when p ? (Φ) < p c (N ), the symbol ? dies out.
Theorem 3.5. Let Φ be a PCA with neighbourhood N , and let p c (N ) denote the critical value of the (d + 1)-dimensional directed site percolation with neighbourhood N . If p ? (Φ) < p c (N ), then the PCA Φ is uniformly ergodic, and the unique invariant measure of Φ is spatially mixing and admits a perfect sampling algorithm.
As a consequence, we obtain the following proposition, and the two corollaries that follow from it.
then the noisy version of F with noise θ is uniformly ergodic. Furthermore, the unique invariant measure in that case is spatially mixing and admits a perfect sampling algorithm.
Proof. The noisy version of F with noise θ satisfies p ? ≤ 1 − b∈S min a∈S θ(a, b).
Corollary 3.7. Let F be a deterministic CA with neighbourhood N , and let θ be a memoryless noise with error probability ε. If ε > 1−p c (N ), then the noisy version of F with noise θ is uniformly ergodic, and has an invariant measure that is spatially mixing and which admits a perfect sampling algorithm.

Small perturbations of nilpotent CA
A CA F is nilpotent if there is a non-negative integer N such that F N is a constant function. Clearly, the unique value of F N has to be a configuration α Z d with the same symbol α ∈ S at each site. Observe that the NEC-majority CA (Example 2.3) is not nilpotent, for it has two distinct fixed points. Without noise, a nilpotent CA "forgets" its initial configuration in a finite number of steps. It is therefore hard to imagine that adding noise could keep the CA from forgetting its initial configuration. On the other hand, the envelope PCA introduced in the previous section is not directly applicable to prove the ergodicity of the noisy CA. Indeed, suppose that F is nilpotent. If F itself is not a constant function, then for an ε-perturbation of F with small ε, the value p ? is close to 1, hence Theorem. 3.5 does not say anything about the ergodicity of such perturbations of F . Nevertheless, the ergodicity can still be shown using a different coupling-from-the-past argument.
Theorem 3.9. Let F be a nilpotent CA. There exists ε c > 0 such that for ε < ε c , every ε-perturbation of F is uniformly ergodic. Furthermore, the unique invariant measure of such a perturbation is spatially mixing and admits a perfect sampling algorithm.
Proof. Let ε > 0, and let Φ be an ε-perturbation of F . We prove that if ε is small enough, we can couple all the trajectories of Φ.
Since Φ is an ε-perturbation of F , its local rule can be written as Let U = (U n k ) k∈Z d ,n∈N − be a collection of independent random samples uniformly drawn from [0, 1]. We simulate Φ from the past using the update function f and the samples U . Let K be a finite subset of Z d . We prove that almost surely, there exists a time T > 0 such that the trajectories from all possible starting configurations at time −T provide the same pattern X 0 K on K at time 0. In particular, p t (Φ) → 1 as t → ∞, and the uniform ergodicity and the spatial mixing of the invariant measure follow from Proposition 3.3.
Let N ≥ 1 be such that F N is constant. The value of this constant has to be a configuration α Z d with the same symbol α at every site. Let N denote the neighbourhood of the local rule of F . Consider the following subset of the space-time Z d × N − : We say that an error has occurred at position (k, −t) if U −t k ≤ ε. Since F N is a constant function, if the set (k, −t) + W contains no error, then we know that X −t k = α. For k ∈ Z d and t ≥ 0, let us define the random set We recursively define a sequence of sets A 0 , A 1 , . . . by setting A 0 K × {0} and for i ≥ 0. Clearly, t = iN for every (k, −t) ∈ A i . Observe that if A i = ∅, then running the simulation from time −iN till 0 using the samples in U will lead to a pattern X 0 K on K at time 0 that does not depend on the choice of the configuration X −iN at time −iN (see Fig. 4).
It remains to prove that if ε is small enough, then almost surely, there exists an integer after which all the sets A i are empty.
We set m i N i . If there is an error inside (k, −iN ) + W , then |E(k, −iN )| = m N . Let ( , −t) be a space-time position with t = iN + j and 0 ≤ j ≤ N − 1. Then, we have ( , −t) ∈ (k, −iN ) + W if and Consequently, |A i+1 | is bounded by the sum of at most |A i | × M independent random variables, each taking value L with probability ε, and 0 with probability 1 − ε. If ε < 1/(LM ), a comparison with a branching process shows that there is extinction: almost surely, the sets A i are eventually empty. The claim follows.
Let us remark that the bound given for ε in the above proof is rough and can certainly be improved.

CA with a spreading state
Let F be a deterministic CA with symbol set S and neighbourhood N . We say that a symbol α ∈ S is spreading under F if |N | ≥ 2 and F (x) k = α whenever x k+n = α for some n ∈ N . By definition, a CA can have at most one spreading symbol. For comparison, let us note that in Toom's NEC-majority CA (Example 2.3), neither of the two symbols 0 and 1 is spreading. Here, we prove the ergodicity of perturbations of a CA with a spreading symbol for two classes of perturbations. Another class of perturbations is treated in Section 5.2, under the extra assumption that the alphabet is binary.

Memoryless noise
Consider a memoryless noise θ with error probability ε and replacement distribution q, so that θ(a, b) = (1 − ε)δ a (b) + εq(b). We say that the noise is α-positive if q(α) > 0. Theorem 3.10. Let F be a CA with spreading state α. Then, every perturbation of F by an α-positive memoryless noise is uniformly ergodic. Furthermore, the unique invariant measure of the perturbation is spatially mixing and admits a perfect sampling algorithm.
The proof we propose below has the same flavour as the one of Theorem 3.9 for nilpotent CA, and uses the idea of coupling from the past. Observe however that unlike for nilpotent CA, in some sense, the errors that are introduced here by the random noise favour ergodicity.
Let q : [0, 1] → S be a function with the property that if U is a random variable uniformly distributed over [0, 1], then P(q(U ) = b) = q(b). We use an update function f : where f denotes the local rule of F . Observe that if U is a random variable uniformly distributed over [0, 1], then P f (a 1 , . . . , a m ; U ) = b U ≤ ε = q(b) and P f (a 1 , . . . , a m ; U ) = f (a 1 , . . . , a m ) U > ε = 1.
As in the proof of Theorem 3.9, we let U = (U n k ) k∈Z d ,n∈N − be a collection of independent random samples uniformly drawn from [0, 1]. We simulate Φ from the past using the update function f and the samples U . We prove that almost surely, there exists a time T > 0 such that the trajectories from all possible starting configurations at time −T provide the same value X 0 0 for site 0 at time 0. It follows that p t (Φ) → 1 as t → ∞, and the uniform ergodicity of Φ and the spatial mixing of its invariant measure follow from Proposition 3.3.
We say that an error has occurred at space-time position (k, −t) if U −t k ≤ ε. By construction, we know that if there is an error at position (k, −t), then the value X −t k does not depend on the past: it is only a function of U −t k . For k ∈ Z d and t ≥ 0, let us define the set We recursively define sets A 0 , A 1 , . . . by setting A 0 {(0, 0)} and for i ≥ 0. The set A = i≥0 A i can be seen as an oriented tree, that is, a directed acyclic graph with edges from each (k, −t) ∈ A to the points of E(k, −t). Observe that a point (k, −t) ∈ A is a leaf of the tree if and only if there is an error at position (k, −t). Now, let us distinguish two cases (see Fig. 5): (I) The tree A is finite. In this case, there exists an integer T ≥ 0 such that A T = ∅ (hence A i = ∅ for all i ≥ T ), and the value X 0 0 is only a function of the finite family of samples U −t k with k ∈ N t and 0 ≤ t ≤ T − 1.
(II) The tree A is infinite. In this case, almost surely the tree contains an infinite number of leaves.
Indeed, each point (k, −t) is an error with probability ε, independently for different points. Furthermore, conditioned on the event that (k, −t) is a leaf, the symbol X −t k takes value α with probability q(α) > 0, independently for different leaves. Thus, almost surely, the tree A contains at least one leave labeled by the symbol α, at some time −T . Using the fact that α is a spreading symbol, we can then trace the tree up to time 0 to find that X 0 0 = α.
In both cases, the value X 0 0 is almost surely uniquely determined by a finite number of samples in the family U . In particular, almost surely there is a time T > 0 such that if we simulate the PCA from time −T using the samples in U , all possible choices of the configuration X −T lead to the same value X 0 0 for site 0 at time 0.
Theorem 3.11. Let F be a one-dimensional CA with neighbourhood N = {0, 1} and spreading state α.
There exists an ε c > 0 such that for ε < ε c , every α-positive ε-perturbation of F is uniformly ergodic, with an invariant measure that is spatially mixing and admits a perfect sampling algorithm. Figure 5: Illustration of the proof of Theorem 3.10. Errors are represented by red dots. In the first case, the tree is finite, and X 0 0 is a function of the values given by the memoryless noise q at the errors. In the second case, the tree is infinite: then it contains an infinite number of leaves, and there is almost surely one leaf having value α, so that X 0 0 = α.
Proof. Let Φ be an α-positive ε-perturbation of F . The local rule of Φ can be written as where f is the local rule of F andφ is another local rule. We have used 2ε instead of ε to make sure thatφ is also α-positive. Let δ 2ε · min{φ(a 0 , a 1 )(α) : a 0 , a 1 ∈ S} and note that δ > 0. Let f : S 2 × [0, 1] → S be an update function for ϕ with the property that f (a 0 , a 1 ; u) = f (a 0 , a 1 ) when u > 2ε and f (a 0 , a 1 ; u) = α when u ≤ δ.
Let U (U n i ) i∈Z,n∈Z be a collection of independent uniform samples from [0, 1]. We use the update function f and the collection U to simulate Φ from a time far in the past. Let X t denote the configuration at time t.
Since α is a spreading symbol for F and at each space-time point the local rule is applied with probability at least 1 − 2ε, the spread of α dominates an oriented site percolation with parameter 1 − 2ε. More specifically, consider the "space-time" graph with vertex set Z × Z and oriented edges (i, n − 1) → (i, n) and (i + 1, n − 1) → (i, n) for all i, n ∈ Z. in the open cluster of (0, 0), we necessarily have X n i = α. But even more is true. Let Q n {i : (i, n) ∈ C} be the set of descendants of (0, 0) at time n and denote by L n inf Q n and R n sup Q n the leftmost and rightmost elements of Q n (with the convention inf ∅ +∞ and sup ∅ −∞). Observe that if X 0 0 = α, then for every n > 0 and i with L n ≤ i ≤ R n , the value X n i is uniquely determined by the samples U m j with 0 < m ≤ n and −m ≤ j ≤ 0. Let us call the set C {(i, n) : n > 0 and L n ≤ i ≤ R n } the cone of (0, 0). The cone of a point (k, t) is defined in a similar fashion and is denoted by C(k, t).
In order to prove ergodicity, we claim that when ε is small enough (in particular, when 2ε < (1−p c ) 2 , where p c is the critical value for oriented bond percolation on Z × Z), the point (0, 0) is almost surely in the cone of a point (k, −t) with U −t k < δ (see Fig. 6). This implies that p t (Φ) → 1 as t → ∞, and the uniform ergodicity of Φ and the spatial mixing of its invariant measure follow from Proposition 3.3.
To prove the latter claim, we invoke a result of Durrett [14, Sec. 3] on oriented bond percolation. In the oriented bond percolation, each edge of the above-mentioned space-time graph is declared open with probability p, independently of the other edges. Observe that when p = 1 − √ 2ε, the oriented bond percolation with parameter p and the oriented site percolation with parameter 1 − 2ε can be coupled in such a way that a point (i, n) is open if and only if at least one of its two incoming edges are open. With such a coupling, the open bond-cluster of (0, 0) will be included in the open site-cluster of (0, 0). Let L lim sup n→∞ L n /n and R lim inf n→∞ R n /n. It follows from the result of Durrett that when p > p c , on the event that the open bond-cluster of (0, 0) is infinite, we almost surely have L < − 1 /2 < R.
As a consequence, when 2ε < (1 − p c ) 2 , there exists a value i 0 > 0 such that, with positive probability, every point (−i, 2i) with i ≥ i 0 is in the cone of (0, 0). Observe that the cone of (0, 0) is independent of the variable U 0 0 . Therefore, with positive probability, U 0 0 < δ and every point (−i, 2i) with i ≥ i 0 is in the cone of (0, 0). Let E(k, t) denote the event that U t k < δ and every point (k−i, t+2i) with i ≥ i 0 is in the cone of (k, t). Since the process (U n i ) i∈Z,n∈Z is ergodic with respect to the shift along (−1, 2), we find that with probability 1, the events E(k, −2k) occur for infinitely many k > 0. In particular, almost surely, there exists a point (k, −2k) with k ≥ i 0 for which U −2k k < δ and the cone of (k, −2k) includes (0, 0). This concludes the proof.
The assumption N = {0, 1} is not essential, and the proof can be extended to the case where N = { , + 1, . . . , r} is an interval in Z. Extending the result to more general neighbourhoods would require additional technical details.

Gliders with annihilation
A gliders CA is a deterministic CA describing the movement of particles of different types according to given velocities. More specifically, a gliders CA with N ≥ 1 particle types and particle velocities v 1 , . . . , v N ∈ Z d is a CA G with alphabet S {0, 1} N defined by for every x ∈ S Z d , k ∈ Z d and i ∈ {1, . . . , N }. Here, x k,i denotes the ith component of the symbol at site k in x, and x k,i = 1 indicates the presence of a particle of type i at site k. Thus, G simply shifts the particles of type 1 with vector v 1 , the particles of type 2 with vector v 2 and so forth. The neighbourhood of G is clearly N G {v 1 , . . . , v N }.
An annihilation rule is a composition h h in,jn • · · · • h i1,j1 of elementary annihilation rules. Observe that elementary annihilation rules may not commute. An annihilation CA is a CA A with neighbourhood N A {0} whose local rule is an annihilation rule. A gliders with annihilation is a composition F A • G of a gliders CA G followed by an annihilation CA A. In words, a gliders with annihilation represents the movement of N types of particles where certain pairs of particles annihilate upon encounter at the same position. Note that, due to the discrete nature of time, particles moving in opposite directions can possibly pass each other without encountering at the same position.
Recall that a birth-death noise on S = {0, 1} N is a zero-range noise under which particles of different type appear and disappear independently from one another. The matrix of a birth-death noise can therefore be written as Each matrix θ i has the form where β i ∈ [0, 1] and δ i ∈ [0, 1] respectively represent the birth rate and death rate of particles of type i. Theorem 3.12. Let F A • G be a gliders with annihilation, and let θ be a positive birth-death noise. The noisy version of F with noise θ is uniformly ergodic, with a spatially mixing invariant measure.
Proof. We couple the action of the noise θ on two configurations x and y in the following manner. For each site k ∈ Z and each i ∈ {1, . . . , N }, we draw independently a random number U k,i , uniformly distributed on [0, 1]. We update x and y using the same samples, and according to the following rule: • if x k,i = 0 (resp. y k,i = 0) and 1 − β i ≤ U k,i ≤ 1, we add a particle of type i at position k in configuration x (resp. in y), • if x k,i = 1 (resp. y k,i = 1) and 0 ≤ U k,i ≤ δ i , we remove the particle of type i at position k in configuration x (resp. in y), • otherwise, x k,i (resp. y k,i ) remains unchanged.
Let us first assume that δ i < 1 − β i . Then, if U k,i ∈ [0, δ i ], whatever the values of x k,i and y k,i are, we know that after the update, there is no particle of type i at position k in either configuration. On the other hand, if U k,i ∈ [1 − β i , 1], we know that after the update, there is a particle of type Let us make a coupling (X t , Y t ) t≥0 of the PCA recursively as follows. Let U (U t k,i ) k∈Z d ,1≤i≤N,t∈N be a collection of independent random samples uniformly drawn from [0, 1]. Starting with arbitrary configurations X 0 x 0 and Y 0 y 0 , at each time step, we first apply the deterministic CA F = A • G and then perturb the two configurations with the noise, using the random samples in U and the coupling strategy sketched above.
We say that two configurations x and y have a disagreement of type i at position k if x k,i = y k,i . For a finite subset K ⊂ Z d , let D K (x, y) = k∈K x k − y k 1 be the number of disagreements between x and y in K. Note that D K (x, y) ≤ N · |K|.
In the two configurations G(x) and G(y), there can be a disagreement G(x) k,i = G(y) k,i of type i at position k if and only if x k+vi,i = y k+vi,i . Let us recall that G has neighbourhood N G {v 1 , . . . , v N }. It follows that D K G(x), G(y) ≤ D K+N G (x, y). Next, observe that the action of the annihilating rule A does not increase the number of disagreements. Indeed, when applying an annihilation rule A i,j at position k, • if there is no disagreement of types i and j, then after the action of the annihilation rule, there is still no disagreement, • if exactly one of the two components i and j contains a disagreement, then in the updated configuration, still exactly one of the two components contains a disagreement, • if there are two disagreements of types i and j, then in the updated configuration, there are either no disagreement (if there were particles both types in one of the configuration, and none in the other) or still two disagreements (if one configuration has only a particle of type i and the other only a particle of type j).
The other components are not affected by the annihilation rule. Combining the effects of the glider G and the annihilation A, we find that for each two configurations x and y.
Applying the noise, the expected number of disagreements decreases by a factor at least 1 − ε, where ε min i∈{1,...,N } ε i . It follows that and thus Consequently, for every k ∈ Z d and t ≥ 0, we have Let r = max i∈{1,...,N } |v i | be the neighbourhood radius of G. The cardinality of the set N t G is bounded by (2rt + 1) d . Thus, we obtain It follows that P(X t k = Y t k ) → 0 as t → ∞, uniformly in the position k and the choice of the initial configurations x 0 and y 0 . The uniform ergodicity of the PCA and the spatial mixing of its unique invariant measure follow from Proposition 3.2.
For a finite set K ⊂ Z and two configurations x, y, we define D K (x, y) i∈K δ(x i , y i ). • We have a CA F that is almost contractive, meaning that for all x, y ∈ S Z d and K ⊂ Z d .
• We have a zero-range noise, identified by a matrix θ, that is contractive in the sense that there exists an ε > 0 with the following property: for every a, b ∈ S, there is a coupling (U, V ) of θ(a, ·) and If all these conditions are fulfilled, then the argument above shows that the noisy version of F with noise θ is uniformly ergodic. For instance, the uniform ergodicity of Theorem 3.12 persists if we replace the annihilation rule with any other interaction rule h : In the next section, we show how the coupling presented in the proof of Theorem 3.12 can be used to prove the ergodicity of another type of gliders with noise, even in a case where the approach via discrepancy functions is not sufficient.

Simple gliders with reflecting walls
Let us consider a one-dimensional gliders CA G with three types of particles: • particles of type 'W' have velocity 0; they play the role of walls, • particles of type 'R' have velocity 1; they move one unit to the right at each time step, • particles of type 'L' have velocity −1; they move one unit to the left at each time step.
The set of symbols is thus S = {0, 1} 3 and the neighbourhood is N = {−1, 0, 1}. We keep the same notations as in the previous section: for x ∈ S Z , k ∈ Z and i ∈ {W, R, L}, x k,i = 1 means that in x, there is a particle of type i at position k.
We combine G with a reflection rule I modeling the reflection of left and right particles on walls (see Fig. 8). The reflection rule I is the CA of neighbourhood {0} defined on the same configuration space S Z by for each x ∈ S Z and k ∈ Z. We call the composition I • G the (one-dimensional) gliders with reflecting walls.
As in the previous section, we consider a birth-death noise θ, defined by some parameters β W , β R , β L ∈ [0, 1] and δ W , δ R , δ L ∈ [0, 1] respectively representing the birth and death rates of the three types of particles. Theorem 3.14. Let F = I • G be the gliders with reflecting walls, and let θ be a positive birth-death noise. The noisy version of F with noise θ is uniformly ergodic, with an invariant measure that is spatially mixing and admits a perfect sampling algorithm.
Proof. We couple the action of the noise on configurations in the same manner as in the proof of Theorem 3.12. However, unlike in the previous result, we couple the PCA from the past.
To be more specific, we use an update function of the form n : S × [0, 1] 3 → S for the noise θ, where for each a ∈ S, u ∈ [0, 1] 3 and i ∈ {W, R, L}. If U is uniformly drawn from [0, 1] 3 , then for every a ∈ S, the value n(a, U ) is distributed according to θ(a, ·).
We use a family of independent samples (U n k ) k∈Z,n∈N − uniformly drawn from [0, 1] 3 to simulate the PCA from the past. To determine X n , we first apply the CA F on X n−1 and then update the value at each site k using the update function n and the sample U n k . First, observe that the evolution of the walls at different sites are independent and are not affected by the other types of particles. Namely, walls have velocity 0 and are not affected by the reflection rule, and moreover, the noise is zero-range and acts on walls independently of the other two types of particles. It follows that the presence or absence of a wall at position 0 and time 0 is almost surely uniquely determined by a finite (though random) number of samples U m 0,W with m ≤ 0. We claim that the presence of left-or right-moving particles at position 0 and time 0 is also almost surely a function of a finite number of random samples U m k . In order to know if there is a right-moving particle at position 0 and time 0, we trace back the possible trajectory of the particle in time. Each time we take a step back, we first determine the presence or absence of a wall at the current position so as to know whether the particle has changed direction or not. The potential ancestor at time −t can either be a right-moving particle or a left-moving particle, depending on whether the backward trajectory has met an even or odd number of walls.
Let ε R = min{β R + δ R , 2 − (β R + δ R )} and ε L = min{β L + δ L , 2 − (β L + δ L )} and set ε min{ε R , ε L } > 0. When tracing back the trajectory of a potential right-moving particle, at each step, we have a probability at least ε of learning whether there is indeed an ancestor particle or not. Therefore, almost surely, we eventually learn about the presence or absence of an ancestor. If so, when going up again in time, we can determine whether there is a right-moving particle at position 0 and time 0 or not. In the same fashion, we can almost surely determine the presence or absence of a left-moving particle at position 0 and time 0 by exploring a finite part of the samples in U .
It follows that p t → 1 as t → ∞, and Proposition 3.3 concludes the proof.
Remark 3.15. The two-dimensional version of gliders with reflecting walls is often called the mirror model (or the discrete Lorentz gas model) [53]. In the mirror model, mirrors are placed at some sites of the lattice Z 2 in either of the two diagonal directions. Particles (or beams of light) travel with speed 1 vertically or horizontally and are reflected upon hitting the mirrors. A similar argument as above shows the ergodicity of the mirror model in presence of positive birth-death noise.

Permutive CA with permutation noise
In this section, the kind of coupling is quite different, since it involves only finite Markov chains: for permutive CA with permutation noise, it is indeed possible to couple the evolution of all trajectories in any finite window. For the simplicity of the presentation, we focus on the one-dimensional setting. Analogous results can be obtained in higher dimensions. Let F be a CA of neighbourhood N = { , + 1, . . . , r} and local function f : S m → S, with m = r − + 1 ≥ 2. We say that F is left-permutive (resp. right-permutive) if, for all w ∈ S m−1 , the mapping τ w : S → S given by τ w (a) f (aw) (resp., τ w (a) f (wa)) is bijective. A CA is permutive if it is either left-or right-permutive; it is bipermutive if it is both left-and right-permutive. For example, when S is the ring Z n of integers modulo n, the affine CA defined by f (x, y) ax + by + c for a, b, c ∈ Z n is left-permutive (resp., right-permutive) if a (resp. b) is invertible in Z n .
Let F be a permutive CA. Using the bijections τ w one can prove that F is surjective. Every surjective CA with configuration space X preserves the uniform Bernoulli measure λ on X (see e.g. [38,Thm. 5.21]). The next proposition shows that when a permutive CA is subjected to a zero-range noise that preserves λ, the resulting PCA indeed converges to λ. The proof below is adapted from a work of Vasilyev [59,58]. An alternative proof (for additive noise) is provided at the end of Section 4.4.
Theorem 3.16. Every PCA resulting from adding positive permutation noise to a permutive CA is uniformly ergodic with the uniform Bernoulli measure as its unique invariant measure.
Proof. Let F be a permutive CA with local rule f , and Θ a permutation noise with noise matrix θ. Let Φ denote the resulting noisy CA. We will prove that for every n ∈ N and every initial measure µ on X , the marginal distribution of µΦ t on K = {−n, −n + 1, . . . , n} converges exponentially to the uniform Bernoulli distribution on S K , which we denote by λ K . More specifically, we will prove that for each n ∈ N, there exists a real number ρ < 1 such that for every µ ∈ M (X ) and each t ∈ N, we have µΦ t − λ K ≤ ρ t , where as before, ν − ν K denotes the total variation distance between the marginal distributions of ν and ν on K.
Let us first assume that F is left-permutive with neighbourhood N = {0, 1, . . . , r}. By permutivity of F , for every w ∈ S r we have a bijection τ (K) w : S K → S K given by where f (K) denotes the map S N (K) → S K induced by the local rule f . When fixing the word w as a boundary condition on the right of K, the PCA Φ transforms a word x in S K to a random word Z in S K distributed according to a product distribution with marginal distribution θ(y k , ·) at site k ∈ K, where y = τ (K) w (x). We denote by P w (x, z) the probability that x ∈ S K is transformed into z ∈ S K , that is, P w (x, z) = k∈K θ(y k , z k ).
Since the map τ (K) w is bijective, it preserves the uniform distribution λ K . By assumption, the noise matrix θ also preserves the uniform distribution on S, so we obtain λ K P w = λ K .
For each w ∈ S r , the matrix P w is a positive stochastic matrix. Therefore, there exists ρ w < 1 such that for every two probability distributions q, q on S K , we have where q − q TV denotes the total variation distance between q and q . Let us set ρ max{ρ w ; w ∈ S r }. It follows that for any sequence (w t ) t≥0 of words of S r , we have q P w 0 P w 1 · · · P w t−1 − qP w 0 P w 1 · · · P w t−1 TV ≤ ρ t q − q TV .
In particular, for q = λ K , we obtain that for every distribution q on S K and every sequence (w t ) t≥0 of words in S r , qP w 0 P w 1 · · · P w t−1 − λ K TV ≤ ρ t q − λ K TV ≤ ρ t . Let now µ be a distribution on X . When iterating Φ, it induces a random sequence of words (W t ) t≥0 on {n + 1, . . . , n + r}. Conditioning on this sequence and using the above inequality, we get for every t ∈ N (see Fig. 9). If the neighbourhood of F is not of the form N = {0, 1, . . . , r}, then there exists a number s ∈ Z such that F • σ s is a left-permutive CA having a neighbourhood of that form. If we denote the noisy version of F • σ s by Φ s , the above inequality yields µ Φ t s − λ K ≤ ρ t for every distribution µ , in particular, for µ σ −st µ. With this choice, µ Φ t s = µΦ t and we obtain µΦ t − λ K ≤ ρ t , which concludes the proof. The right-permutive case is analogous.

Entropy method: surjective CA with additive noise
The purpose of this section is to prove that under the action of a surjective CA perturbed by positive additive noise, every shift-invariant probability measure is attracted towards the uniform Bernoulli measure. This does not settle the ergodicity question because we do not know if other non-shiftinvariant measures are attracted towards the same measure, and we do not know if the uniform Bernoulli measure is the only invariant measure.
The idea of the proof is as follows: we know that a surjective CA preserves the entropy per site of shift-invariant probability measures. On the other hand, positive additive noise increases the entropy unless the measure has maximal entropy. Combining these two, we get that a surjective CA followed by positive additive noise increases the entropy unless the measure has maximal entropy. This however is not quite enough to prove convergence to the measure of maximal entropy because entropy per site is not a continuous function of the measure and hence cannot serve as a simple Lyapunov function; we need to control how much the entropy increases.
The analysis of finite-state Markov chains via entropy is classic and goes back to the ideas of Boltzmann (see e.g. [49,Sec. II.7] or [40,Sec. II.4]). The use of entropy to describe the asymptotic behaviour of continuous-time interacting particle systems was pioneered by Holley [29,40] and has been very successful. For applications of the entropy method to PCA see [36,62,12].
In this section, we prove the following result.
Theorem 4.1. Let Φ be a PCA on configuration space X obtained by perturbing a surjective CA with a positive additive noise. Then, the uniform Bernoulli measure λ on X is invariant under Φ and µΦ t → λ weakly as t → ∞ for every shift-invariant measure µ on X .
Before entering the proof, let us note that the NEC-majority CA of Example 2.3 is not surjective. The non-surjectivity in that example follows easily from the Garden-of-Eden theorem, which is discussed below in the proof of Lemma 4.2. For a more direct argument, one can verify that, for instance, any configuration that has an occurrence of the pattern For clarity, we present the proof of Theorem 4.1 in the one-dimensional setting, but everything goes through similarly in the higher-dimensional case. The notion of additive noise requires that the set of symbols for the PCA is identified with a finite Abelian group. This identification is however arbitrary. In fact, the theorem remains true if the additive noise is replaced with any positive permutation noise. We stick to the additive noise to keep the presentation simple. At the end of this section, we also use the entropy method to give an alternate proof of Theorem 3.16 in case of additive noise.

Entropy
Let us fix the notation and terminology for entropy. The entropy of a random variable A taking values from a finite set Σ will be denoted by We recall that H(A) ≤ log |Σ| and the equality holds if and only if A is uniformly distributed over Σ. If B is another random variable on the same probability space, we write for the entropy of the conditional distribution of A given B. Note that this is a random variable, and is not the same as the usual notion of conditional entropy which is a number. The usual conditional entropy of A given B is given by where X is a (one-dimensional) random configuration with distribution µ. Among the shift-invariant measures on X S Z , the uniform Bernoulli measure is the unique measure with maximum entropy per site h log |S|.
For background on the entropy, we refer to the book of Cover and Thomas [11] in the context of information theory and to the book of Denker, Grillenberger and Sigmund [13] in the context of dynamical systems.

The effect of a surjective CA on entropy
We start by looking at how a surjective CA affects the entropy of a finite region.
Lemma 4.2. Let F be a one-dimensional surjective CA. There is a constant c > 0 such that for every random configuration X and every finite interval J ⊆ Z, we have Proof. Without loss of generality, we may assume that the neighbourhood of the local rule of F is of the form N {−r, −r + 1, . . . , r}. We write ∂N (J) N (J) \ J for the external boundary of a set J ⊆ Z with respect to N . Similarly, we write ∂N 2 (J) N 2 (J) \ J.
Let x ∈ X be an arbitrary configuration. For an interval J, the pattern (F (x)) J is uniquely determined by the patterns x J and x ∂N (J) . Conversely, since by the Garden-of-Eden theorem (see e.g. [7, Theorem 5.3.1]), every surjective CA is pre-injective, the pattern x J is uniquely determined by the patterns (F (x)) J , (F (x)) ∂N (J) and x ∂N 2 (J) .
To see the latter, let y be any configuration such that y J = x J and y ∂N 2 (J) = x ∂N 2 (J) . Define a configuration y that agrees with y on N 2 (J) and with x outside J. Then x and y are asymptotic to each other. Since x and y disagree on J, so do x and y . By pre-injectivity, F (x) and F (y ) must be different from each other. Since x and y disagree only on J, F (x) and F (y ) can only disagree on N (J). On the other hand, F (y) and F (y ) agree on N (J). Therefore, F (x) and F (y) must disagree on N (J) = J ∪ ∂N (J). Now consider the random configuration X. Since X J is uniquely determined by (F X) J , (F X) ∂N (J) and X ∂N 2 (J) , we have the inequality for the entropy. Since |∂N (J)| = 2r and ∂N 2 (J) = 4r, the second term on the right-hand side is bounded from above by 6r log |S|. Therefore, with c 6r log |S|.
Remark 4.3. The same argument is used in [35] to show that h(F µ) = h(µ) for every shift-invariant measure µ on X . Indeed, for a random configuration X with distribution µ one has The opposite inequality is true in general.

The effect of noise on entropy
Lemma 4.2 says that a one-dimensional surjective CA reduces the entropy of a finite window by at most a constant c, uniformly on the size of the window. We now show that if the window is large, the extra entropy added by the noise is large enough to compensate the lost entropy, at least if the entropy of the window is not too close to maximal. We divide the argument into a few lemmas.
Recall that in order to describe an additive noise, we identify the alphabet S with a finite Abelian group (G, +). Under an additive noise, each symbol a is replaced with a symbol a + N , where N is G-valued random variable. The noise variables at different sites are independent and all have distribution q. We are assuming that the noise is positive, hence q(b) > 0 for each b ∈ S. We denote by h log |S| the maximum possible entropy carried by a single site.
Lemma 4.4. For every ε > 0, there is a δ(ε) > 0 with the following property. If A and N are independent G-valued random variables and N is distributed according to q, then The inequality H(A + N ) ≥ H(A) holds in general as long as A and N are independent.
Proof. The entropy of a G-valued random variable A and its noisy version A + N (where N is independent of A) are related in the following way: Lemma 4.5. For every ε > 0, there is a ρ(ε) > 0 with the following property. Let A and N be G-valued random variables and C another random variable. Suppose that N is distributed according to q, and is independent of A and C. Then, The inequality H(A + N | C) ≥ H(A | C) holds in general as long as A and N are independent conditioned on C.
Proof. For each ε > 0, denote δ(ε) the number whose existence is guaranteed by Lemma 4.4. Lemma 4.4 immediately gives a corresponding almost sure statement about the entropy of conditional distributions H(A | C) and H(A + N | C). Namely, if conditioned on C, the random variables A and N are independent and N has distribution q, then with probability 1. (In the proof of Theorem 4.1, we will only need Lemma 4.5 in situations where C is a discrete variable and the conditional distributions are elementary.) Now, suppose that Using Markov's inequality, we get Let π be an accumulation point of the measure orbit µ → µΦ → µΦ t → · · · starting from a shiftinvariant measure µ. We show that π is the uniform Bernoulli measure. In order to do that, we show that h(π) ≥ h − ε for every ε > 0, and use the fact that the uniform Bernoulli measure is the only shift-invariant measure with entropy h.
To be specific, let us use the following construction of a trajectory of the noisy CA with initial distribution µ. Let X (0) be a configuration with distribution µ. Let Z (1) , Z (2) , . . . be a sequence of independent random configurations independent of X (0) , each distributed according to the product measure with marginal q at each site. Construct X(t) recursively by setting X (t+1) F X (t) + Z (t+1) .
By Lemma 4.2, for every finite interval J ⊆ Z and every t ∈ N, we have Let us conclude this section by giving an alternate proof of Theorem 3.16 in case the noise is additive. For permutive CA under positive additive noise, the entropy argument can be easily formulated in terms of conditional entropy, hence providing convergence for every (not necessarily shift-invariant) measure. The argument is however not entirely different from the Markov chain proof given in Section 3.6; the Markov chain interpretation is implicit in the following proof.
Alternate proof of Theorem 3.16 with additive noise. Let F be a right-permutive CA with neighbourhood N {l, l + r, . . . , r}. Let X be a random configuration with arbitrary distribution and set Y F X. Then, for every k ∈ Z, The first equality is by permutiveness, and the second inequality is by the fact that Y (−∞,k) is a function of X (−∞,k+r) . Next, let Z be a noise configuration independent of X, and distributed according to a product measure with marginal q at each site. Then, where the last equality follows from the independence of Z (−∞,k) and Y k + Z k . Combining these two with Lemma 4.5, we get that for every ε > 0, In particular, if X (0) , X (1) , . . . represents the evolution of the noisy CA, then as t → ∞, uniformly in k. This implies convergence to the uniform Bernoulli measure of the distribution of X (t) .

Fourier analysis method
In this section, we apply (generalized) Fourier analysis to establish ergodicity under noise of CA with certain algebraic properties. For clarity and brevity, we focus on two concrete examples (the XOR CA and the binary spreading CA) and prove ergodicity under zero-range noise. Further development of this approach will be left to another paper. Our exposition is based on Chapter 4 of the survey by Toom et al. [58]. The idea is to show that the action of the PCA on local observables is "contractive" in an appropriate sense. When the CA has an algebraic property (e.g., additive), it is sometimes possible to choose a basis for the space of observables (e.g., the Fourier basis) with respect to which the CA maps each basis element into another basis element. Proving the ergodicity of the noisy CA would then be reduced to showing that the action of noise on the same basis is contractive.

XOR CA with zero-range noise
Let S {0, 1} be the binary alphabet. We identify S with the cyclic group Z/2Z. The XOR CA with neighbourhood N ⊆ Z d is identified with the map x → F x on X S Z d , where We consider the PCA Φ obtained by combining F with a zero-range noise kernel Θ, identified by the matrix which modifies each symbol independently according to transition probabilities 0 p − → 1 and 1 q − → 0. Since F is permutive, we already know (Theorem 3.16) the ergodicity of the noisy version as long as the noise is positive and preserves the uniform distribution, that is, if q = p ∈ (0, 1). In the case q = p ∈ (0, 1), the ergodicity also follows by a classic application of Fourier analysis (see [58,Example 1.3]) or by coupling from the past (see [15,Sec. 5d]). In this case, the convergence to the limit measure is super-exponentially fast (i.e., the probability of each cylinder set converges superexponentially fast to its limit value). In the degenerate case, that is, when p ∈ {0, 1} or q ∈ {0, 1}, Bramson and Neuhauser [5] have proved that the system is not ergodic, at least in the one-dimensional case with N = {−1, 0, 1}.
Following [58,Chap. 4], Fourier analysis can in fact be used to prove ergodicity in the entire domain 0 < p, q < 1.
Theorem 5.1. The XOR CA with positive zero-range noise is uniformly ergodic. Moreover, its unique invariant measure is spatially mixing.
Proof. Define the function χ : Z 2 → C by χ(a) (−1) a (i.e., χ(0) 1 and χ(1) −1). This is a character of the group Z 2 (i.e., a homomorphism into the multiplicative group of C), and along with the constant 1 (the trivial character), forms a basis for the two-dimensional space of functions Z 2 → C. For a finite set A ⊆ Z d , define χ A : X → C by (In particular, χ ∅ ≡ 1.) The collection of all functions χ A (for finite A ⊆ Z d ) is a basis (the Fourier basis) for the linear space C 0 (X ), which is orthonormal with respect to the inner product g, h π(gh), where h is the complex conjugate of h and π is the uniform Bernoulli measure on X (a.k.a. the Haar measure).
The basis {χ A : A ⊆ Z d finite} is particularly convenient, because the XOR CA F maps each character χ A into another character χ F * A . Namely, where F * A denotes the set of all j ∈ Z d for which the set {i ∈ A : j ∈ i + N } has an odd number of elements. (If we represent A as a configuration c : Z d → Z 2 with c i = 1 if and only if i ∈ A, then F * A will be represented by F * c where (F * c) k j∈N c k−j (mod 2).) To calculate the effect of noise, let x be an arbitrary configuration and Y a random configuration chosen according Θ(x, ·), so that each Y i is obtained from x i independently at random with transition probabilities prescribed by θ. We have Note how the multiplicative form of χ A and the independence of noise at different sites reduce the calculation of Θχ A to the calculation of θχ. For the latter, we have which can be written as the linear combination θχ = (q − p) + (1 − p − q)χ. It follows that Combining the effect of the CA F and the noise Θ, we get the representation in the Fourier basis.
In order to prove the ergodicity of a PCA Φ, we show that for each local function h ∈ C 0 (X ), the sequence Φ t h converges exponentially fast to a constant. In particular, ergodicity follows if we are able to show that Φ contracts the non-constant component of h. The non-constant part of h can, for instance, be measured by where h = A⊆Z d h A χ A is the representation of h in the Fourier basis. This is a semi-norm satisfying ⟪h⟫ = 0 if and only if h is constant. Suppose that Φ is contractive with respect to ⟪·⟫, in the sense that there is a constant 0 ≤ ρ < 1 such that ⟪Φh⟫ ≤ ρ ⟪h⟫ for all h ∈ C 0 (X ). Then, ⟪Φ t h⟫ ≤ ⟪h⟫ ρ t for every h ∈ C 0 (X ) and t ≥ 0. In particular, for every cylinder set [u], every two configurations x, y ∈ X and each t ≥ 0. Hence, we obtain the uniform ergodicity of Φ.
In order to verify that Φ is contractive, it is sufficient to verify that ⟪Φχ A ⟫ ≤ ρ for each non-empty finite A ⊆ Z d . Namely, for an arbitrary h ∈ C 0 (X ), the latter condition gives For the PCA Φ(x, E) Θ(F x, E), we have Note that ρ |q − p| + |1 − p − q| < 1 for p, q ∈ (0, 1). Therefore, ⟪Φχ A ⟫ ≤ ρ for every finite ∅ = A ⊆ Z d , and the uniform ergodicity of Φ follows.
To see the spatial mixing of the unique invariant measure π of Φ, observe that for u ∈ S A , we have ⟪1 [u] ⟫ = 1 − 2 −|A| ≤ 1, because Integrating (125) over y with respect to π, we therefore get π([u]) − Φ t (x, [u]) ≤ 2ρ t . Now, using (4), we obtain that d A (t) ≤ 2 |A| ρ t for every finite set A ⊆ Z d and t ≥ 1. The spatial mixing of the invariant measure thus follows from Proposition 2.1.
Remark 5.2. Observe that ⟪Φχ A ⟫ < 1 even in the degenerate (but non-deterministic) case, for instance, when p = 0 and q ∈ (0, 1). Namely, in the latter case we have ⟪Φχ A ⟫ = 1 − |q − p| |F * A| < 1. However, this is not sufficient for ergodicity, as the upper bound for ⟪Φχ A ⟫ depends on A and approaches 1 as A grows.

Binary spreading CA with zero-range noise
Consider a non-constant CA F with binary alphabet S {0, 1} in which 0 is spreading. Namely, x → F x is given by where N ⊆ Z d is a finite set. As in the case of the XOR CA, we consider a general zero-range noise kernel Θ defined by the transition matrix When q = 0, we recover Stavskaya's PCA (a.k.a. directed site percolation), which is non-ergodic for sufficiently small p ≥ 0 (see [58,Chap. 1]). Using coupling arguments, we already know the ergodicity of a CA with spreading symbol with either memoryless noise (Theorem 3.10) or sufficiently weak positive perturbation (Theorem 3.11). In the binary case, we get an alternative argument via (generalized) Fourier analysis, covering most of the parameter space. Proof. The proof is similar to that of Theorem 5.1 except that we use a different basis for C 0 (X ). Define χ : S → C by χ(0) 0 and χ(1) 1. Clearly, {1, χ} is a basis for the linear space C S . For a finite A ⊆ Z d , define χ A : X → C by It is easy to verify (e.g., using the inclusion-exclusion principle) that the functions χ A (for finite A ⊆ Z d ) form a basis for C 0 (X ). We call this basis the Möbius basis and each χ A a character of X . The advantage of the above basis is that the CA F maps characters into characters. Namely, χ A (F x) = 1 if and only if (F x) i = 1 for every i ∈ A, which is in turn the case if and only if x i+j = 1 for every i ∈ A and j ∈ N . Therefore, F χ A = χ F * A , where F * A A + N .
As in the case of the Fourier basis, calculating the effect of the noise Θ on characters boils down to calculating the effect of the transition matrix θ on χ. For the latter, we obtain which gives θχ = p + (1 − p − q)χ. It follows, as in the previous case, that For the combination of the CA F and noise Θ, we get Each local function h ∈ C 0 (X ) has a unique representation h = A⊆Z d h A χ A as a linear combination of characters. We define a semi-norm on C 0 (X ) by for each h ∈ C 0 (X ). Following the same argument as in the case of the XOR CA, a sufficient condition for the uniform ergodicity of Φ is that Φ is contractive with respect to ⟪·⟫, in the sense that there is a constant 0 ≤ ρ < 1 such that ⟪Φh⟫ ≤ ρ ⟪h⟫ for every h ∈ C 0 (X ). The property ⟪Φh⟫ ≤ ρ ⟪h⟫ for every h ∈ C 0 (X ) in turn is equivalent to the condition that ⟪Φχ A ⟫ ≤ ρ for each non-empty finite A ⊆ Z d . Clearly, Φχ ∅ = χ ∅ , hence ⟪Φχ ∅ ⟫ = ⟪χ ∅ ⟫ = 0. For a non-empty finite A ⊆ Z d , we have We get uniform ergodicity if p + |1 − p − q| < 1, that is if either p + q ≤ 1 and q > 0, or p + q > 1 and p + 1 2 q < 1. The spatial mixing of the unique invariant measure follows in a similar fashion as in Theorem 5.1. Note that ⟪1 [u] ⟫ < 2 |A| for a cylinder with base A, because Integrating (125) over y with respect to π, we therefore get π([u]) − Φ t (x, [u]) ≤ 2 × 2 |A| ρ t . Now, using (4), we obtain that d A (t) ≤ 2 2|A| ρ t for every finite set A ⊆ Z d and t ≥ 1. The spatial mixing of the invariant measure hence follows from Proposition 2.1.

Open problems
We conclude with several open problems, some of which are already mentioned in the text.

Problem 1. Is every ergodic PCA uniformly ergodic?
For deterministic CA, ergodicity and uniform ergodicity are known to be equivalent [25,54,46]. We conjecture that the same is true for general PCA. The ergodic PCA discussed in this article are all exponentially ergodic, in the sense that, the probability of each cylinder set converges exponentially fast to its stationary value. We do not know any example of an ergodic PCA that is not exponentially ergodic.
Problem 2. Find an example of a (uniformly) ergodic PCA that is not exponentially ergodic.
For the class of PCA that are monotonic with respect to a total ordering of the alphabet, Louis [43] has provided a necessary and sufficient condition for exponential ergodicity in terms of a spatial mixing condition.
Proposition 2.2 above established the computability of the unique invariant measure for every ergodic PCA. However, for the PCA discussed in this article, one can exploit the exponential ergodicity to give a "fast" algorithm for computing the unique invariant measure. Problem 3. Give an example of (uniformly) ergodic PCA for which the unique invariant measure is not computable by a "fast" algorithm.
Problem 4. Is the unique invariant measure of every (uniformly) ergodic PCA spatially mixing? Find an example of a (uniform) ergodic PCA whose unique invariant measure is not measure-theoretically isomorphic to a Bernoulli process. Proposition 2.1 above provides a sufficient condition for the unique invariant measure of a uniformly ergodic PCA. In view of the result of Goldstein et al. [23], we conjecture that the unique invariant measure of a positive-rate uniformly ergodic PCA is always spatially mixing.
For perturbations of a nilpotent CA with noise, we know ergodicity when noise is sufficiently high (Thm. 3.5) or sufficiently low (Thm. 3.9). When the noise has zero range, one may expect ergodicity to hold for all the parameter range. One of the simplest CA for which the ergodicity under noise is unknown is the majority rule. A majority CA is a CA with binary alphabet under which the symbol at each site is updated to the symbol that is in majority among the neighbouring sites (see Fig. 10). The neighbourhood has to have an odd cardinality to avoid ties. Problem 7. Is every small positive perturbation of a one-dimensional majority CA ergodic? Is every perturbation of the two-dimensional nearest-neighbour majority CA with sufficiently small positive zerorange noise non-ergodic?
For the one-dimensional case, Gray has outline a proof of ergodicity for the nearest-neighbour marjority CA under small symmetric zero-range noise [24]. On the other hand, Toom has proven the non-ergodicity of sufficiently small perturbations of the two-dimensional majority CA with the NECneighbourhood (see Example 2.3). It is conjectured that in two dimensions, the non-ergodicity holds also for the symmetric nearest-neighbour majority rule.
The local rule is given by F (x)i majority(xi−1, xi, xi+1). The noisy version appears to be ergodic. See [57] for a class of two-dimensional examples, and [19,20] for a one-dimensional example.