Bernoullicity of equilibrium measures on countable Markov shifts

We study the equilibrium behaviour of a two-sided topological Markov shift with a countable number of states. We assume the potential associated with this shift is Walters with finite first variation and that the shift is topologically transitive. We show the equilibrium measure of the system is Bernoulli up to a period. In the process we generalize several theorems on countable Markov shifts. We prove a variational principle and the uniqueness of equilibrium measures. A key step is to show that functions with Walters property on a two-sided shift are cohomologous to one-sided functions with the Walters property. Then we turn to show that functions with summable variations on two-sided CMS are cohomologous to one-sided functions, also with summable variations.

1. Introduction 1.1. Countable Markov shifts. Let S be a finite or countable alphabet. Let A be an |S| × |S| matrix with entries in {0, 1}. A (one-sided) topological Markov shift is a pair (X, T ) where X := {x ∈ S N |∀i ∈ N, A xixi+1 = 1} and T : X → X, T (x 0 , x 1 , x 2 , ...) = (x 1 , x 2 , ...). If |S| = ℵ 0 we call X a countable Markov shift (CMS). X is called a one-sided shift space, T is called a shift operator and clearly T X ⊆ X. The two-sided shift space (X,T ) is defined similarly except that noŵ X ⊆ S Z is made of two-sided sequences andT is a left shift. Objects related to the two-sided shift will have hats: a member of the two sided shift space is, for examplê x ∈X. The topology we use is always the product topology induced by the discrete topology on the alphabet S. It is metrizable with d(x, y) := exp(− min{|i| : x i = y i }) (this applies to both one sided and two sided shift spaces). For the two-sided shift, a basis is defined using cylinders: m [a 0 , ..., a n ] := {x ∈X|x m = a 0 , ..., x m+n = a n }. Similarly for one-sided shifts: [a 0 , ..., a n ] := {x ∈ X|x 0 = a 0 , ..., x n = a n } (note that in the one-sided case cylinders always start at the zeroth coordinate). A TMS is topologically mixing if for every two states a, b there exists N ab ∈ N s.t. ∀n ≥ N ab there exist ξ i , 1 ≤ i ≤ n − 1 s.t. A aξ1 A ξ1ξ2 . . . A ξ (n−1)b = 1. A TMS is called topologically transitive if for every two states a, b, there exists N := N ab ∈ N there exist ξ i , 1 ≤ i ≤ N − 1 s.t. A aξ1 A ξ1ξ2 . . . A ξ (N −1) b = 1. Clearly mixing implies transitivity. A fixed real valued function of a shift space (usually referred to as a potential ) may give rise to equilibrium measures, the analogue of an equilibrium distribution in statistical mechanics.
1.2. Equilibrium measures. Let φ ∈ C(X) (real-valued continuous). A Tinvariant Borel probability measure µ is called an equilibrium measure, if it maximizes the quantity h µ (T )+ φdµ (subject to the requirement that h µ (T )+ φdµ = ∞ − ∞). Equilibrium measures are important because they appear naturally via symbolic dynamics in smooth dynamics (as absolutely continuous invariant measures, physical measures etc.). There is great interest in their ergodic properties. One of the most important tools in studying them is Ruelle's operator (a special case of the transfer operator ), see [Sar09] for a thorough development of the theory. It is defined for f : X → R as follows: (L φ f )(x) := T y=x e φ(y) f (y). We'll state the facts we need concerning it as we use them. Ruelle's operator is very useful when working on one-sided shifts, since the term e φ(y) acts as averaging weights. This operator is less useful in the two-sided invertible case:T −1 {x} is always a singleton so no averaging is done. Still, there is a way to use this operator on two-sided shifts, as we explain in the next subsection.
1.3. Cohomology to one-sided function. Two real-valued functions f, g on a TMS are said to be cohomologous if there exists h s.t. f − g = h − h • T (h − h • T is called a coboundary and h is called a transfer function). Cohomology is an equivalence relation and it is interesting in the particular case where a two-sided function (i.e. depends on both positive and negative coordinates) is cohomologous to a one-sided function (depends only on non-negative coordinates). We define the natural projection π : . We are interested in the cases where forf (two-sided) there exists f (one-sided) s.t.f − f • π = h − h • T . We consider three regularity conditions. Define the nth variation ofφ ∈ C(X) as var nφ := sup{|φ(x) −φ(y)| : x n−1 −n+1 = y n−1 −n+1 }. •φ is weakly Hölder if ∃C > 0, 0 < θ < 1 s.t. var nφ < Cθ n for n ≥ 2, •φ has summable variations if ∞ n=2 var nφ < ∞. We delegate the definition of Walters' condition, to section 3 (definitions 3.1, 3.2 there). It is known that Hölder continuity implies summable variations which, in turn, implies Walters condition. In the finite alphabet case, Sinai [Sin72] considered weakly Hölder two-sided functions and showed that each is cohomologous to a onesided weakly Hölder function (originally appeared in [Sin72], but [Bow75] is more accessible). Coelho & Quas [CQ98] did the same for functions with summable variations. Walters [Wal03] has done this for functions satisfying Walters condition. All these results, however, were proven in a compact setting. In order to consider infinite alphabet (equivalently, non compact shift spaces), one needs to develop the theory for such spaces. We show that the proof in [Sin72] also works for countable alphabet. The proof in [CQ98] does too, with some modifications. The proof in [Wal03] relies on a lemma from [Bou01] which is hard to generalize for a noncompact setting. In this respect, we show Sinai's original construction can be used to find a cohomologous one-sided Walters function to a two-sided Walters function on non compact shift spaces (section 3).
1.4. Bernoullicity. A Bernoulli scheme with finite probability vector (p a ) a∈S is the left shift T on S Z with the Borel σ-algebra B(S Z ) generated by cylinders and µ p ( m [a m , ..., a n ]) := p am . . . p an . Bernoulli schemes are a model of ideal randomness. As such, they were extensively studied. Knowing that a particular system is measure theoretically isomorphic (see [Wal00] for definition) to a Bernoulli scheme gives us complete knowledge of its ergodic properties. We prove isomorphism of equilibrium measures of Walters potential to a product of a Bernoulli schemes and a finite rotation (see theorem 1.1 below for exact details).
1.5. Results. Our main result is the following theorem.
Theorem 1.1. Letμ be an equilibrium measure of a Walters potentialf ∈ C(X) with finite first variation (var 1f < ∞) on a two-sided topologically transitive CMS. Assume supf , hμ(T ) < ∞. Then (X,B,μ,T ) is measure theoretically isomorphic to the product of a Bernoulli scheme and a finite rotation.
Note that if we assume supf < ∞ then hμ(T ) < ∞ is equivalent to having finite Gurevich pressure, P G (f ) < ∞ (see section 4 for definition, this follows instantly from the variational principle, theorem 4.2). Results similar to ours can be found in [Bow74a], [Wal05], [Sar11], [Ber87] and [Rat74]. Our results assume very little -we only assume our potential is Walters with finite first variation (as opposed to summable variations in [Sar11]). We do not assume compactness, as opposed to [Wal05]. We use a different conditions than [Ber87]. We prove that every two-sided Walters potential is cohomologous to a one-sided potential. This is theorem 3.1. We were also able to prove a similar result for potentials with summable variations, following [CQ98] (we show the compactness assumed there can be removed). This is theorem 7.1. We also prove that an equilibrium measure, if exists, is unique. This was proved in [BS03] for summable variations potentials (see section 5 for precise statement). Also, we prove a variational principle for Walters potentials on non compact (i.e. countable) TMS. This is theorem 4.2.
1.6. Main idea and organization of the proof. To prove theorem 1.1 we go through several steps. First, we show we may restrict ourselves to to topologically mixing TMS. In this case we show isomorphism to a Bernoulli scheme (without the finite rotation factor, this is theorem 2.1). The reduction is stated and proved in section 2, using the spectral decomposition. From there on we only concern ourselves with the reduced case of topologically mixing CMS. In section 3 we prove that functions that are Walters with finite first variation are cohomologous to onesided Walters functions. Section 4 presents the machinery that is used in section 5. There, the uniqueness of equilibrium measures is established (theorem 5.1). What we actually need is corollary 5.1. This corollary gives us important information on equilibrium measures for one-sided shift spaces (of course, with a corresponding onesided potential). The key is to understand how we can relate two-sided equilibrium measures to one-sided equilibrium measures. This is explained in the beginning of section 6 (the original idea is due to Sinai, [Sin72]). Having established the relation between one-sided and two-sided equilibrium measures, we use corollary 5.1 (stated originally for one-sided equilibrium measures) to prove the Bernoullicity of the two-sided equilibrium measure, using Ornstein theory. Then we turn to show that cohomology to a one-sided function can be done for two-sided potentials with summable variations, giving rise to a one-sided potential (which is also of summable variations). This is done, again, in a non-compact setting using the proof in [CQ98].

Reduction to the topologically mixing case
Suppose we know the following is true: Theorem 2.1. Letμ be an equilibrium measure of a Walters potentialf ∈ C(X) with finite first variation (var 1f < ∞) on a two-sided topologically mixing CMS. Assume supf < ∞, hμ(T ) and f dμ < ∞. Then (X,B,μ,T ) is measure theoretically isomorphic to a Bernoulli scheme.
We can use the following lemma in order to show theorem 2.1 implies theorem 1.1.
). Let (X, B, µ, T ) be an ergodic invertible probability preserving transformation with a measurable set X 0 s.t.
(1) T p (X 0 ) = X 0 mod µ, Then (X, B, µ, T ) is measure theoretically isomorphic to the product of a Bernoulli scheme and a finite rotation.
LetX,φ etc. be as in theorem 1.1. By the spectral decomposition (Remark 7.1.35 in [Kit98]), there existX 0 ,X 1 , ...X p−1 , p ∈ N s.t X i are pairwise disjoint modμ,T (X i ) =X i+1 mod p and (X i ,T p ) is topologically mixing. Since we assume we know theorem 2.1 to be true, this impliesT p is Bernoulli. It is known that µ, as an equilibrium measure, is ergodic [Sar09] theorem 4.7, so the hypotheses in lemma 2.1 are satisfied and 1.1 holds. Thus, wlog, we may restrict ourselves to topologically mixing TMS and prove (under the conditions of theorem 2.1) they are measure theoretically isomorphic to Bernoulli schemes.

Cohomology to one-sided function -Walters case
Let (Y, S) be a dynamical system on Y , a metric space. We define Bowen's metric at time n as follows. d n (x, y) := max 0≤k<n d(T k x, T k y). Now we let g : Y → R. We say g is Walters (satisfies Walters condition, has the Walters property) [Wal78] if ∀ε > 0, ∃δ > 0, s.t. ∀n ≥ 1, ∀x, y ∈ Y : d n (x, y) < δ ⇒ |f n (x) − f n (y)| < ε. The careful reader may check that this definition specializes to the definitions we'll present (and use) for the special case of TMS. Details can be found in e.g. [Bou01].
We use hats (e.g.x ∈X,T etc.) in order to distinguish objects defined using the two-sided shift space from ones defined on the one-sided shift space. When no confusion may arise, we might drop the hats. Let f : X → R. Its nth variation is defined as var n f := sup{|f (x) − f (y)| : This is how we the above definition specializes for CMSs: Definition 3.1. Let (X, T ) be a one-sided CMS, f ∈ C(X). f is said to satisfy Walters condition if lim k→∞ sup n≥1 var n+k f n = 0 and ∀k ≥ 1, n ≥ 1, var n+k f n < ∞. If f is Walters, then it is uniformly continuous. However, it need not be bounded.
Theorem 3.1. Let (X,T ) be a two-sided TMS (possibly with countable alphabet). Letf be Walters with var 1 < ∞. Then there exists a one-sided f : X → R that is also Walters s Proof. Following Sinai [Sin72], for every a ∈ S we define z a -some arbitrary left infinite sequence that can precede a. Letx ∈X. Definex to satisfy ( We claim that h is well defined, uniformly continuous and bounded. To see this, note thatT is uniformly continuous -for ε > 0 choose δ = ε/2.x →x is just a projection and also uniformly continuous.
. H k is uniformly continuous as a sum and compositions of such. Now we show the series H k is uniformly Cauchy. Let ε > 0. We want to find N so that n 1 ≥ N and k > 0 imply | . This shows the uniform continuity of h (clearly it is well defined).
By the uniform Cauchy property of {H k } ∞ k=1 , H k → h uniformly and there's some k for which |h(x) − H k (x)| < 1, ∀x ∈X. For that k, we see that H k (x) ≤ k · var 1 f < ∞. This shows h is bounded. Now we turn to construct the appropriate transfer function and prove the cohomology. Letx ∈X. Then, Since the bottom expression depends only on positive coordinates (the appended z a were completely arbitrary -they were just required to letx be admissible) we get:f for some f : X → R (one-sided). Now we turn to show f is Walters.
The first summand approaches zero by the Walters property off . As for the second, The assumption that var 1f < ∞ is not too restricting since the Walters property implies var 2f < ∞ and we can recode the shift space using 2-cylinders. Then we see that in the new spacef has finite first variation.

The GRPF theorem with some consequences
We will rely on the Generalized Ruelle-Perron-Frobenius theorem. This was originally proved in [Sar01b] for weakly Hölder potentials. The generalization to Walters potentials which we use may be found in [Sar09]. Note that in this section we are solely concerned with one-sided TMS.
We will also call a measure non-singular and mean the same.
Proof. We show for indicator functions. The same holds for any integrable Borel function.
One can check the transfer operator is well defined as a Radon-Nikodym derivative.
Fact 3 (Formula for the transfer operator). Suppose X is a TMS and µ is T non-singular. Then the transfer operator of µ is given by Fact 4 (Properties of the transfer operator). Let µ be a non-singular σ-finite measure on X. Then:

This means that the transfer operator behaves like the adjoint of the Koopman operator (except that it acts on
The definition is not proper since we did not state what is the domain and range. In our case the sum might even be infinite, since we may have infinitely many preimages for every point (recall we wish to consider infinite state TMS). However, we will restrict ourselves to functions the satisfy Walters condition and for such functions this operator turns out to be well defined and well behaved. Note that in the particular case where φ is the log jacobian of some measue µ, then L φ = T µ and all the good properties of the transfer operator hold also for the Ruelle operator.
Recall that a TMS is topologically mixing if for every two states a, b there exists Proposition 4.1 (Gurevich pressure). Let X be a topologically mixing TMS and let φ : X → R be Walters. For every state a ∈ S, lim n→∞ 1 n log Z n (φ, a) exists and is independent of a. We call this limit the Gurevich pressure of φ and denote it P G (φ).
(2) φ is null recurrent if there exists λ > 0, h positive continuous and ν con- If X is compact, then φ is positive recurrent [Sar09], so the last two cases cannot occur. Here we are considering non-compact shift spaces and the last two cases may, in fact, occur. We call the probability measure dm := hdν (apply normalization if required) from part (1) a RPF measure. Our focus will be on the recurrent case. We turn to some consequences of the GRPF theorem.  . This can be extended to the algebra generated by cylinders. Caratheodory's extension theorem extends this to the Borel σ-algebra and completes it. Thus µ = ν up to a multiplicative factor. We require hdν = 1, so the RPF measure is indeed uniquely determined. The following variational principle was proved in a compact setting by Ruelle [Rue73] (see also [Wal00]). Sarig [Sar99] showed this for countable (i.e. non compact) Markov shifts.
Theorem 4.2 (Variational principle). Let X be a topologically mixing TMS and let φ : where the supremum ranges over shift invariant Borel probability measures for which h µ (T ) + φdµ is well defined.
The proof here follows [Sar09] almost verbatim. We give a proof here since there it is not stated for functions that are Walters, but summable variations. Before we prove this, we state few useful facts. We start with a lemma from [Sar09]. The proof there is stated for potentials with summable variations but the same proof works verbatim if the potential is Walters. For that lemma we need the following definition.
Definition 4.10 (Sub-system). Let X be a TMS over the set of states S and with where the supremum ranges over Y 's that are topologically mixing compact sub-systems of X.
In this context it might be useful for some readers to recall that a TMS is compact iff it has a finite number of states. We will also need the following. Let E be sweep out. Then µ(E) > 0. If µ is ergodic, any set of positive measure is a sweep out set. The following are classic results.
We can now turn to finish the proof using the above analysis.
We now show the inverse inequality. Fix some ε > 0 and a topologically mixing compact sub-system Y ⊆ X s.t. P G (φ) ≤ P G (φ| Y ) + ε, by the previous lemma on pressure over sub-systems (lemma 4.1). Denote ψ := φ| Y . Since Y is compact, the GRPF theorem (actually, the original Ruelle's Perron-Frobenius theorem suffices here), ψ is positive recurrent and so there exists a positive eigenfunction h > 0 for Ruelle's operator and a conservative probability measure ν on Y such that L ψ h = e PG(ψ) h, L * ψ ν = e PG(ψ) ν, hdν = 1. This measure is indeed a probability measure since it is finite on cylinders and we only have finitely many cylinders of any fixed length. We set dm = hdν. This is a shift invariant probability measure, since One consequence of the above is that ψ = log dν dν•T + P G (ψ), by the properties of the transfer operator (fact 4) that also apply to Ruelle's operator. Now, let α Y := {[a] ∩ Y |a ∈ S ′ }, where we let S ′ denote the alphabet over which Y is defined. Since Y is compact, α Y is finite and so H m (α Y ) < ∞. This means we may use Rokhlin's formula: The last equality holds, since log h is continuous on a compact space (h > 0) and hence absolutely integrable and m is T -invariant. Thus, h m (T | Y ) + Y ψ = P G (ψ). This implies that P G (ψ) ≤ sup{h µ (T ) + φdµ}. Since by construction, P G (φ) ≤ P G (ψ) + ε, we get that P G (φ) ≤ sup{h µ (T ) + φdµ} + ε. But ε was arbitrary, so we're done.

Theorem 4.3 (Cohomology for g-functions)
. Let X be a topologically mixing TMS and let φ : X → R be Walters. Suppose P G (φ) < ∞.
(1) If φ is recurrent, then φ − P G (φ) = log g + ϕ − ϕ • T where g is a g-function, log g is Walters and ϕ continuous . Walters, g is a sub g-function and ϕ continuous.
In both cases the cohomology can be done s.t. var 1 ϕ < ∞.
Proof may be found in [Sar09,Sar01a]. This will be used in the reduction at the beginning of the proof of the uniqueness theorem (theorem 5.1).

The uniqueness theorem
The focus of this section is on proving the following theorem, which is a generalization of a theorem from [BS03] that was proved there for summable variations potentials. For the uniqueness of equilibrium measures on compact spaces, see [Bow74b].
(2) this equilibrium measure , if exists, equals the RPF measure of φ.
(3) In particular, if φ has an equilibrium measure then φ is positive recurrent and the RPF measure has finite entropy.
Proof. Let us first assume µ is an equilibrium measure for φ. By subtracting a constant from φ we may assume, wlog, that P G (φ) = 0 . By the previous theorem, φ = log g + ϕ − ϕ • T where g is sub g-function, ϕ is continuous and var 1 ϕ < ∞. We first show that L φ e −ϕ = e −ϕ and L * φ (e ϕ µ) = e ϕ µ. This implies that µ = e −ϕ (e ϕ µ) is an RPF measure. By proposition 4.2 it is unique. We divide the proof into several claims.
Having proven the first claim, we proceed with another claim.
Note that E i are measurable (as intersection of such) and disjoint. Clearly Hence, h µi (T ) + φdµ i = 0 and so µ is a convex combination of the required equilibrium measures. Now we assume {p i } is countable. For any N write q N +1 = i>N p i and so µ * N +1 := 1 qN i>N p i µ i is a probability measure. Apply the same argument on µ 1 , ..., µ N , µ * N +1 and send N to infinity. This gives the required decomposition and the claim is proved.
We proceed towards the proof of the main theorem with yet another claim. .., ξ n−1 , a i ]|n ≥ 1 ∧ ξ j = a i , 1 ≤ j ≤ n − 1}\{∅}. This is a generator for T . Assume for a moment that H µ i (β) < ∞. This assumption implies the claim as follows: Where we've used Kac's formula and fact 1. We now show that β is a generator with finite entropy. Define a Bernoulli measure . This is easily seen to be a Bernoulli measure and so Note that by its definition with the aid of Kac's formula, µ B i (· |[a i ]) = µ B i . Also, this measure is a probability measure, since . We now proceed to show that φ ∈ L 1 (µ B i ). Set M := sup n≥2 var n+1 φ n (by the Walters property of φ) and define φ := By the definition of the partition β, partition sets are cylinders. We also see from the same definition that length(B)−1 = ϕ [ai] (x) for any x ∈ B. So ϕ [ai] is constant on partition. Let B ∈ β and fix x B ∈ B. For any y ∈ B we get that  Proof. Let g i := dµi dµi•T . Let T µi be the transfer operator of µ i . We have T µi f = T y=x g i (y)f (y). By the properties of the transfer operator, (proposition 4), T µi 1 is the unique L 1 element s.t. ∀ϕ ∈ L ∞ the following holds: ϕ T µi 1dµ i = ϕ•T dµ i = ϕdµ i by T 's µ i invariance. Thus, µ i a.e T µi 1 = 1. This implies that g i is a gfunction. Now the construction from the first step shows that for almost every ergodic component ϕ − ϕ • T dµ x = 0. By definition of µ i (and the fact that we discarded all i's for which µ(E i ) = 0) we get that ∀i, (ϕ − ϕ • T )dµ i = 0. Thus: 0 = h µi (T ) + φdµ i (µ i is an equilibrium measure) = h µi (T ) + log gdµ i (cohomology and claim) The term in brackets is defined for µ i -a.e. x, so if there exists some set A with µ i (A) > 0 s.t. for every x ∈ A there exists y ∈ T −1 {x} with g i (y) < 0, the above term would be undefined on a set of positive µ i measure, a contradiction. So for our purposes, g i (y) ≥ 0. We sum over those y's for which g i (y) > 0. This does not change the sum, since we neglect only y's for which g i (y) = 0 (and agree that 0 log 0 = 0 log ∞ = 0). This allows us to use Jensen's inequality (recalling that log is concave): ≤ 0 (since g is a sub g-function) and the inequalities are actually equalities. Using Jensen's inequality again, recalling g i (y) is a g-function and g is a sub-g-function, we get that T y=x,gi(y)>0 g i (y) log g(y) gi(y) ≤ log T y=x,gi(y)>0 g(y) ≤ 0. This implies that for µ i a.e.-x, ∀y ∈ T −1 {x} c(x)g i (y) = g(y). Thus, since g i is a g-function. So c(x) = 1 for µ a.e.-x and g = g i µ i a.e..
We now complete the proof. First, notice that by assumption, λ = exp P G (φ) = e 0 = 1 and so by last claim L * log g µ i = µ i . The following holds by the cohomology relation from the beginning.
So we got L * φ (e ϕ µ i ) = e ϕ µ i for every i. Since µ = p i µ i , this holds for µ: L * φ (e ϕ µ) = e ϕ µ. Now we show L φ e −ϕ = e −ϕ . We saw already that L log g 1 = 1 µ i a.e. for every i, hence L log g 1 = 1 µ a.e. Now so L φ e −ϕ = e −ϕ µ-a.e. We now need the following fact: Fact 8. Let X be a topologically mixing TMS and let φ : X → R be Walters. Let ν be such that L * φ ν = λν with λ > 0. Then for any cylinder [a], ν[a] > 0. Proof. Let n := |a|. Fix p ∈ S and denote N := N anp the length of the path a n → p we get from the topological mixing property. Then T N +n y=x e φN+n(y) 1 [a] (y)dν(x). We show that ∀x ∈ [p], T N +n y=x e φN+n(y) 1 [a] (y) > 0. The exponent is always positive, so it can be ignored. We ask whether for every x ∈ [p] there exists y ∈ T −(N +n) {x} and y n 0 = a. This is true since after N preimages of x are taken, there has to be one preimage with a prefix (a n , ..., p). Taking further n preimages guarantees one of them will be have prefix (a, p). For that preimage, 1 [a] = 1 and at least one summand is positive. Hence T N +n y=x e φN+n(y) 1 [a] (y) > 0 on [p] and we are done. L * φ e ϕ µ = e ϕ µ implies that for every word a we have [a] e ϕ dµ > 0, hence µ[a] > 0 for every cylinder [a]. A property that holds a.e. for a measure that is positive on open sets necessarily holds on a dense set, so L φ e −ϕ = e −ϕ on a dense set. Continuity of ϕ (theorem 4.3) implies equality holds everywhere. By the discussion at the beginning, we are done.
We finish with a corollary which will be importatnt in the next section.
Hence log h(x) h(y) ≤ sup n≥1 var n+m φ n → 0 by Walters property. Claim 6. M := sup n≥1 var n+1 φ * n < ∞ . Assume now that x 0 = y 0 = b and denote K := max{1, sup n≥1 var n+1 φ n } (finite, by Walters property): We need the assumption x 0 = y 0 since this allows us to sum over the same p's. Had it not been the case, we'd have no control over the number of summands and the above argument would not have held. Denote M ′ := sup n≥1 var n+1 φ n and recall [log h − log h • T ] n is telescopic. We get: These two claims show that φ * is Walters -the relevant variations are all bounded and approach zero (using the first claim and the Walters property of φ).

Ornstein theory
In this section we finish the proof of theorem 2.1, which implies theorem 1.1 by the discussion in section 2. On two sided shift spaces one needs to specify where a cylinder begins (e.g. for a = (a 0 , ..., a n−1 ) we write m [a] = {x ∈X : x m+n m = a}, the cylinder of a starting at the mth coordinate). In order to make notation easier, we assume that if a starting coordinate is not specified, it is zero (i.e. [a] = {x ∈ X : x n 0 = a}). Let m < n integers. For a partition β, denote β n m := By the work of Ornstein & Friedman, [FO70] , we know that if an invertible probability preserving transformation has a generating sequence of weak Bernoulli partitions, then it is measure theoretically isomorphic to a Bernoulli scheme. This is the heart of the proof of theorem 2.1 (consequently, theorem 1.1) and the focus of this section.
In order to use the above notions, we would like to use the information we gathered on one-sided shifts and apply it to two-sided shifts. We now explain how this is bounded and π is the natural projection (see section 3). It is known from [Roh64] that ifμ is aT -invariant measure, it induces a T -invariant measure µ on the natural extension with the same entropy (hμ(T ) = hμ(T )). Consequently, aT -invariant measure on the two-sided shift induces a T -invariant measure on the one-sided shift with the same entropy. Letμ beT -invariant and let µ be the Consequently, hμ(T )+ φ dμ = h µ (T )+ φdµ (we rely on the fact that the transfer function we constructed in the proof of theorem 3.1, here denoted h, is bounded). The above implies P G (φ) = P G (φ) and so the equilibrium measure for the one-sided shift with the potential φ is the projection of the two-sided equilibrium measure of the potentialφ. This means that any bound we find for equilibrium measures of cylinders in the one-sided shift space automatically applies to the equilibrium measure ofφ. Particularly, we may use corollary 5.1. Using the assumptions and definitions there, assuming wlog that P G (φ) = 0 we have, ∀F ∈ L 1 (µ): so L * φ * µ = µ. Now it is easy to derive the following: Before we prove theorem 2.1, let us first introduce a useful notation: Proof of theorem 2.1. Supposeμ is an equilibrium measure ofφ ∈ C(X) as in the statement of the theorem. For every finite set of letters (states) V ′ ⊆ V, we let This implies the theorem, by the discussion above. As in [Sar11], we make some choice of parameters. The theorem holds, given that we are able to choose these parameters. We first fix some small δ 0 > 0 s.t. every 0 < t < δ 0 satisfies 1 − e −t ∈ ( 1 2 t, t). We fix some smaller δ < δ 0 to be determined later. Then we choose: • Some finite collection S * ⊆ S of states s.t.μ(∪ a∈S * [a]) > 1 − δ, • A constant C * = C * (S * ) > 1 as in corollary 5.1, • m ∈ N, m = m(δ) s.t. sup n≥1 var n+m φ * n < δ. This can be done since φ * is Walters, Given these choices of parameters, we can complete the proof. This proceeds very similarly to [Sar11]. We start with a claim.
Claim 7. Let A := −n [a 0 , ..., a n ], B := k [b 0 , ..., b n ] be two non-empty cylinders of length n + 1. Let b 0 , a n ∈ S * . Then for every k > K(δ) and every n ≥ 0, Proof. We denote α m the collection of m-cylinders [c]. For every k > 2m, We proceed and estimate the terms. The left is called the main term, the second will be called the error term. We start with a preliminary estimate we will use throughout.
By our choice of m(δ) before, we can estimate exp(φ * n+1 (a, z)) = e ±δ exp(φ * n+1 (a, w)) for every w, z ∈ [c]. Fix z and average over all w's: The reason we did the substitution with L φ * is that there's actually just one summand in the expression for L n+1 φ * 1 [a,c] . Using the previous two estimates, we can get: We refer to this as the master estimate.
Main term. First, we assume k > K(δ) and k > 2m(δ). Since we estimate the main term, we know Plugging into the master estimate (above, equation 6.1) we get that We can integrate and see the main term equals: The first bracketed term is bounded from above by µ[a]. To bound below we use corollary 5.1:  Error term. Since a n ∈ S * we may use the master estimate (equation 6.1) in conjunction with corollary 5.1: Proof. Write A := −n [a 0 , ..., a n ], B := k [b 0 , ..., b n ]. We break the sum into • The sum over A, B for which a n , b 0 ∈ S * .
• The sum over a n ∈ S * .
• The sum over a n ∈ S * , b 0 ∈ S * .
The first sum is bounded (using previous claim) by 15δ. The second and third are bounded (each) by 2μ(∪ a ∈S * [a]) < 2δ. The claim follows.

Cohomology-summable variations
Theorem 7.1. Suppose f :X → R has summable variations with finite first variation (i.e. ∞ n=1 var n f < ∞). Then there exists a g with summable variations that depends only on its non-negative coordinates which is cohomologous to f via a bounded continuous transfer function.
Define also g(x) := I x0 + ∞ i=i0 h i (T ni−1 x). We use the following observations: • g depends only on x ∞ 0 , • g is well defined and continuous, • ∀n var n (h i • T ni−1 ) ≤ 4 · 2 −i , by the bound on h i ∞ , • var n (h i • T ni−1 ) = 0 for n > 2n i , since h i is constant on cylinders from −n i + 1 to n i − 1.
Now we can show g has summable variations:  and this can be made arbitrarily small. Thus F N is a uniformly Cauchy series of uniformly continuous functions and F is well defined and continuous. A calculation shows this is indeed the transfer function. Since F N is a finite sum of terms that have bounded sup-norm, the limit F = lim N →∞ F N is also bounded (uniform Cauchy).
The function g we have constructed is one sided so we are done.

Acknowledgements
I would like to express my deepest gratitude to my M.Sc. advisor Omri Sarig for, simply put, being the best. I would also like to thank the Weizmann Institute of Science and the Department of Mathematics and Computer Science for giving me the best possible environment for conducting scientific research.