On regularity of functions of Markov chains

We consider processes which are functions of finite-state Markov chains. It is well known that such processes are rarely Markov. However, such processes are often regular in the following sense: the distant past values of the process have diminishing influence on the distribution of the present value. In the present paper, we present novel sufficient conditions for regularity of functions of Markov chains.


Introduction
Suppose {X n } is a stationary Markov chain taking values in a finite set A and assume that we are not able to observe the values {X n } directly. Instead, we observe the values of some function of X n that groups some elements in A together. To be precise, let π : A → B, with B a smaller alphabet, and assume that we observe the process {Y n } given by (1.1) Y n = π(X n ), for all n.
Processes of this form have been studied extensively in the past 60 years and appear under a variety of different names in various fields: in Probability Theory, functions of Markov chains [7], grouped [21], lumped [25], amalgamated [8], or aggregated Markov chains [37]; one-block factors of Markov measures [31] or sofic measures [27] in Ergodic Theory, Fuzzy Markov measures [20] in Statistical Mechanics. Note also that the Hidden Markov models [1] -very popular in Statistics, can be cast in the form (1.1) as well.
The factor process {Y n } is rarely Markov, the necessary and sufficient conditions have been found by Kemeny & Snell and Dynkin [10,25]. This raises the principal question: what is the dependence structure of the factor process?
It turns out that, under rather mild conditions on the underlying Markov chain and the coding map π, the resulting process can be seen as approximately or nearly Markov in the following sense: the conditional distribution of the next value Y 1 depends on the complete past Y 0 −∞ := (. . . , Y −2 , Y −1 , Y 0 ), but this dependence is regular, i.e., the distant past values {Y −n }, for n ≫ 1, have a diminishing effect on the distribution of Y 1 . Stochastic processes with such properties occur naturally in many contexts, as a consequence many authors introduced concepts that formalize the notion of a measure that is approximately Markov. Among these concepts are the chains with complete connections [32], chains of infinite order [21], g-measures [24] and uniform martingales [23]. Although these concepts are very similar, they are not always equivalent, for a more detailed discussion see [13]. Among these notions g-measures are the most convenient for the purposes of this paper. Usually g-measures are defined on some subset of the product space A Z + . This space can be thought of as the collection of all allowed paths of a process starting at time 0. Like a Markov measure, a g-measure is introduced via its transition probabilities, the differences are that the transitions of a g-measure are described by a function g : A Z + → (0, 1), rather than a matrix P : A × A → [0, 1] and that the time direction is reversed. That is, the vector g(· x ∞ 1 ) represents the distribution of the symbol in the origin, conditioned on the 'future' configuration x ∞ 1 . This time reversal is common in ergodic theory and is mostly inconsequential for our purposes, as a Markov measure satisfies the Markov property in both directions. A g-measure is approximately Markov due to the additional constraint that the function g is continuous. To clarify, continuity corresponds to a vanishing influence from far away symbols since, in the product topology, a function g : A Z + → R is continuous if and only if: We can now use the language of g-measures to phrase the main result of this paper: we provide a novel sufficient condition for functions (factors) of Markov chains to belong to the class of g-measures. This condition is based on the application of the so-called fibre approach that originated in Ergodic Theory [28], but seems to be less known in Probability Theory.
Let us now describe this method briefly. Suppose {X n } is a A-valued stationary process, and µ is its translation invariant measure on A Z + . Denote by ν = µ • π −1 the stationary law of the B-valued factor process Y n = π(X n ). Define the fibre over y ∈ Σ as the set Ω y = {x ∈ A Z + : π(x n ) = y n for all n ≥ 0}. By a well-known Disintegration Theorem, there exists a family of measures {µ y } indexed by points y ∈ B Z + , called a disintergation of µ, such that µ y is concentrated on the fibre Ω y and µ = µ y dν, meaning that for any µ-integrable function f on A Z + , one has The measures µ y should be viewed as conditional measures on fibres Ω y . We will show in Theorem 3.2 that the factor measure ν = µ • π −1 of a Markov measure µ is consistent with a functiong, i.e., ν(y 0 |y ∞ 1 ) =g(y) for ν-a.a. y, whereg can be expressed in terms of the disintegration {µ y }. Based on the expression forg, we identify two sufficient conditions for ν to be a g-measure. The first condition turns out to be equivalent to a known condition of weak lumpability, i.e., covers the case when the factor measure ν is Markov. The second sufficient condition is that the disintegration can be chosen in such a way that y → µ y is continuous. In this case, we say that µ admits a Continuous Measure Disintegration (CMD) {µ y }. Our main result is that existence of a CMD for µ implies that the factor measure is a a g-measure.
Previously, this condition has been applied successfully to the analogous question; when is a factor of a fully supported g-measure itself a g-measure [22,40]. In the context of factors of Markov measures, we show that the condition supersedes the currently known conditions. These results are presented here in the following way: firstly, we introduce the necessary definitions, then we review known results in sections 2.1, 2.2 and 2.4. Subsequently we state our main theorem in Section 3. In order to demonstrate that the condition in Section 3 is more general than known results we apply the theory of non-homogeneous equilibrium states in Section 4.2. We will also recall the constructive approach to continuous measure disintegrations by Tjur in section 4.3 to provide an interesting alternative to recover the known conditions in 4.5. Finally, in Section 5, we discuss some examples to show that existence of a continuous measure disintegration is strictly weaker than the previously known conditions and that, unfortunately, it is not a necessary condition.
We equip Ω M with the product topology. We use the shorthand notation a m n = (a n , a n+1 , . . . , a m ) for words in alphabet A, and denote the corresponding cylinder sets as [a m n ] = {x ∈ Ω M : x m n = a m n }. Similarly, for a given finite set Λ ⊂ Z + , denote the configuration on the subset Λ by a Λ = (a i ) i∈Λ . A concatenation of two configurations a Λ and b ∆ on disjoint sets Λ, ∆ ⊂ Z + is denoted as a Λ b ∆ , to be precise: For a given subshift of finite type Ω M a Markov chain with probability transition matrix P is said to be compatible with Ω M if P ij > 0 ⇐⇒ M ij = 1 for all i, j ∈ A. In complete analogy with the terminology for Markov chains, the subshift of finite type Ω M is called If the subshift of finite type Ω M is irreducible, and P is a compatible probability transition matrix, then the (unique) stationary Markov measure µ has Ω M as its support.

1.2.
Single block factor maps. Suppose A and B are finite sets, |A| > |B|, and π : A → B is a surjective map. We use the same symbol π to denote the map from A Z + to B Z + given by π(x) n = π(x n ) for all n ∈ Z + . Let µ be a stationary Markov measure corresponding to a Markov chain {X n }, supported on an irreducible subshift of finite type Ω = Ω M ⊂ A Z + , define the push-forward (or factor) measure ν as ν = µ • π −1 . The measure ν is supported on a subshift Σ = π(Ω) ⊂ B Z + . In symbolic dynamics, Σ and ν are called the sofic shift and the sofic measure, respectively. Note that Σ is not necessarily a subshift of finite type. Throughout the paper we make the following standing assumptions on Ω and π: (A1) Ω = Ω M is an irreducible SFT, (A2) the one block factor map π : A Z + → B Z + is such that Σ = π(Ω) is an SFT i.e., Σ = Σ M ′ for some {0, 1} matrix M ′ . We note that using standard methods of symbolic dynamics (Fisher covers), it is possible to decide algorithmically whether for a given pair (Ω, π), the image Σ is indeed an SFT [29].
Remark 1.1. The class of 1-block factor maps π between subshifts of type, which we consider in the present paper, is in fact quite large. Indeed, by the Curtis-Lyndon-Hedlund theorem any equivariant map π : Ω → Σ is necessarily a sliding block code of some finite length, i.e., there exists k ≥ 0 such that if y = π(x), then for any n ≥ 0, y n is a function of x n+k n . By going to a higher block representation (c.f., [29,Section 1.4] ) Ω [k] of Ω, one immediately concludes that the k-block factor map π : Ω → Σ can be equivalently represented as a 1-block factor map π [k] : Ω [k] → Σ. Finally, note also that if ν is a Markov measure on the SFT Ω, then the corresponding measure µ [k] on Ω [k] is again Markov, and hence, ν = µ [k] • (π [k] ) −1 is a 1-block factor of the Markov measure µ [k] .
1.3. g-measures. As we will see below, factors of Markov measures are rarely Markov. Instead, it is far more common for factors of Markov measures to belong to the class of g-measures, i.e., measures having positive continuous conditional probabilities. Suppose Σ ⊆ B Z + is a SFT and consider the following set of functions: For any g ∈ G(Σ), at least one g-measure exists; however such a measure might not be unique [6]. A useful property of g-measures is that they are characterized by the uniform convergence of finite one-sided conditional probabilities. In the opposite direction, one can conclude that a given measure ν is not a g-measure if one is able to find a so-called bad configuration for ν. Definition 1.3. A point y ∈ Σ is called a bad configuration for ν if there exists an ǫ > 0 such that, for every n ∈ N, one can find two points y, y ∈ Σ and m ∈ N such that y n 0 = y n 0 = y n 0 and ν(y 0 |y n 1 y n+m n+1 ) − ν(y 0 |y n 1 y n+m n+1 ) ≥ ǫ > 0. Existence of a bad configuration y implies that no version of the conditional probabilities ν(y 0 |y ∞ 1 ) (defined ν-a.s.), can be continuous at y, and hence ν cannot be a g-measure for any continuous g ∈ G(Σ).

Markov factors of Markov measures.
Note that the Markovianity of the factor measure might depend on the initial distribution of the underlying Markov chain. The notion of lumpability was developed to address this question in a uniform fashion, i.e., independently of the initial distribution.
Let P be a stochastic matrix, indexed by A × A and π : A → B a factor map. Now let {X n } be a Markov chain with transition matrix P , then P is called lumpable for π if the process Y n = π(X n ) is Markov for all choices of the initial distribution p. The necessary and sufficient conditions for lumpability are quite restrictive, as demonstrated by the following result: Theorem 2.1. [25] Suppose P is an irreducible stochastic matrix, then P lumpable with respect to π : A → B if and only if for any y 1 , y 2 ∈ B we have (2.1) for any x 1 ,x 1 ∈ π −1 (y 1 ). The transition matrix of the factor chain {Y n = π(X n )} is then given by This condition is indeed very restrictive, in part due to a relatively strong requirement that the factor process {Y n } must be Markov for all initial distributions. Instead, one could require Markovianity only for a specific given initial distribution, this is a so-called weak lumpability property. It turns out that this question can be answered algorithmically in polynomial time [17]. Even though weak lumpability is a indeed a weaker condition than lumpability, it is still rather exceptional.

2.2.
Fully supported Markov chains. Sufficient conditions for a factor measure to be a g-measure are substantially less restrictive than the conditions for (weak-) lumpability. We will discuss some positive and negative results, starting with the very basic positive result for Markov chains with strictly positive transition matrices P . This case was first considered in [21] and comes with an estimate of the continuity rate of the conditional probabilities (g-functions) of the factor measure: 21]). Let ν be a one-block factor of a Markov measure µ with a positive transition matrix P , then ν is a g-measure satisfying for some 0 < c < 1.
Let us only mention an intuitive, rough, argument for this result; suppose P > 0 is the transition matrix of the Markov process {X n }. Suppose y ∈ Σ then, ignoring some technicalities, we can consider the behaviour of µ on Ω y = π −1 (y). In particular, the transition from X n+1 to X n in Ω y will be given by a positive rectangular matrix. It is well known that, if this matrix is square, then the corresponding map between the distributions of X n+1 and X n is a contraction. For a rectangular matrix we can obtain the same result by using the Hilbert projective metric on the relevant distribution spaces. It is easy to show that this contraction will be uniform in n and therefore the result follows. A version of this argument can also be used to prove the more general results in [8,34,35,42].

2.3.
Highly non-regular factor measure. A factor measure ν of a Markov measure µ is not necessarily a g-measure. This situation can arise when any version of the conditional probabilities has an essential discontinuity in at least one point of Σ. In more extreme cases the conditional probabilities can be discontinuous everywhere. One such example was discussed by Blackwell [4], Furstenberg [15,Theorem IV.6], Walters [41] and Lorinzi et al [30]. Let (X n ) n∈Z + be a Bernoulli process taking values in {−1, 1} with µ(X n = 1) = 1 − µ(X n = −1) = p, for 0 < p < 1, p = 1 2 . Then the process (X n ) n∈Z + withX n = (X n , X n+1 ) is Markov. Consider the factor process Y n = π(X n ) = π(X n , X n+1 ) = X n X n+1 . Thus the factor process {Y n } can be viewed as either a two-block factor of a Bernoulli (and hence, also Markov) process {X n }, or a 1-block factor of an extended Markov process {X n }, c.f., Remark 1.1 above.
Note that Σ = π(Ω) is the full shift on two symbols {−1, 1}. In this example the conditional probabilities of the factor process {Y n } are discontinuous everywhere. Indeed, it is easy to see that every fibre over y ∈ Σ, i.e. Ω y = π −1 (y) ⊂ Ω, consists of two points . We can now explicitly compute the conditional probabilities: Since µ is the Bernoulli measure, it is easy to see that, with S n = n k=0 y 0 y 1 . . . y k , one has Similarly, where S n = n k=1 y 1 y 2 · · · y k . Since, S n = y 0 (1 + S n ), using λ = p/(1 − p), one has Suppose for simplicity that λ > 1. For any y n 1 , one can choose a continuation z 5n n+1 such that S 5n ≫ 0. Equally well, one can choose a continuation w 5n n+1 such that S 5n ≪ 0. In the first case, ν(y 0 = 1|y n 1 z 5n n+1 ) ≃ a c = p and in the second case, Therefore, the conditional probabilities ν(y 0 = 1|y 1 y 2 · · · ) are everywhere discontinuous. In some sense this is the worst possible and most irregular behaviour possible. At the same time, when p = 1 2 , ν is a Bernoulli(½,½) product measure on {−1, 1} Z + . This example therefore highlights that regularity of the factor measure depends on both the properties of the coding map and the transition probabilities.

Fibre mixing condition.
In previous examples we saw that it was important to consider the behaviour of the Markov process {X n }, given a realisation of the factor process {Y n }.
In particular the structure of the fibres π −1 (y), y ∈ Σ, plays a crucial rule (c.f., Blackwell-Furstenberg example above). Theorem 2.2 can also be interpreted in this way. To see this, recall that positivity of P implies that transitions between any letters, consistent with the fibre, are allowed. The regularity of the factor process is a consequence of the fact that each transition in this fibre, described by a positive rectangular matrix, acts as a contraction on distributions. The most general sufficient condition [42] for factors of Markov measures to be regular has a similar flavour. In particular, in [42] the above idea is generalised from positive matrices to the analogon of primitive matrices in the context of fibres; fibre mixing.

Definition 2.3 (Fibre mixing).
Let Ω, Σ be subshifts of finite type and π : Ω → Σ is a surjective 1-block factor. We say that π is fibre mixing if, for all y ∈ Σ, for all x,x ∈ Ω y and every n ∈ Z + , there exists anx ∈ Ω y , such that Indeed, fibre mixing is a sufficient condition for the factor measure to be regular. Theorem 2.4 (Yoo [42]). Suppose (i) π : Ω → Σ is a surjective 1-block factor map between irreducible subshifts of finite type Ω and Σ, (ii) P is an irreducible stochastic matrix, compatible with the SFT Ω, and µ is the corresponding stationary Markov measure on Ω.
Suppose the factor π is fibre mixing. Then ν = µ • π −1 is a g-measure on Σ, for a Hölder continuous g-function.
This result provides the most general set of sufficient conditions for regularity of factors of Markov chains known to date. Other sufficient conditions, e.g., found in [8,26] imply fibre mixing and are strictly stronger. Let us reiterate that imposing conditions on fibres alone (i.e., the topological conditions on Ω, Σ, and π) is not optimal: the necessary and sufficient conditions must also take P into account, as demonstrated by the Blackwell-Furstenberg example discussed above.
Yoo has also established the variant of Theorem 2.4 for factors of Gibbs measures with Hölder continuous potentials. Piraino [34,35] has provided an alternative proof in case of Hölder continuous potentials, and extended the result to classes of potentials satisfying the Walters and Bowen conditions, showing in particular, that these classes are preserved under 1-block factorization.

Continuous measure disintegrations
In this section we will argue that imposing conditions on the behaviour of conditional measures on the fibres provides a more appropriate framework to study properties of the factor measures. As the first step, one has to properly define the conditional measures on the fibres. Fortunately, general results of measure theory provide the necessary tools.
Definition 3.1. We call µ Σ = {µ y } y∈Σ a family of conditional measures for µ on the fibres Ω y if µ y is a Borel probability measure on the fibre Ω y , We will also refer to a family of conditional measures µ Σ = {µ y } y∈Σ for µ on fibres Ω y as a disintegration of µ with respect to π : Ω → Σ.
By a celebrated theorem of von Neumann, for all subshifts Ω, Σ, a given continuous surjection π : Ω → Σ and any Borel measure µ on Ω, there exists a disintegration µ Σ = {µ y } y∈Σ of µ with respect to π. Moreover, the disintegration is essentially unique in the sense that for any two disintegrations of µ, {µ y } and {μ y }, we have ν({y :μ y (.) = µ y (.)}) = 1. We will be interested in continuous measure disintegrations (CMD): a measure disintegration µ Σ = {µ y } is called continuous if for every continuous function f : Ω → R, the function is continuous. When a disintegration satisfies this constraint we call it a Continuous Measure Disintegration (CMD). Note that any measure µ admits at most one continuous disintegration.
As the conditional measures µ y are not, in general, translation invariant, we introduce the following notation for cylinder sets in Ω y : n [a m k ] = x ∈ Ω y : x n+m n+k = a m k , for a ∈ Σ and n, k, m ∈ Z + . Using approach similar to that of [40], we will now show that a measure disintegration can be used to find an expression for the conditional probabilities of a factor measure.

Theorem 3.2. Suppose
(i) π : Ω → Σ is a surjective 1-block factor map between irreducible subshifts of finite type Ω and Σ, (ii) P is an irreducible stochastic matrix, compatible with the SFT Ω, and µ is the corresponding stationary Markov measure on Ω. Suppose {µ y } y∈Σ is a disintegration of µ. Then ν = µ • π −1 is consistent with the positive measurable normalized functiong : Σ → (0, 1), i.e., Proof. The expression forg originates from the following 'finite-dimensional' equality: denote by P the joint distribution of ({X n }, {Y n }), where {X n } is the stationary Markov chain with the transition probability matrix P , and Y n = π(X n ) for all n. Then , where p is the invariant distribution: pP = p. We will now show thatg, given by (3.1), is positive and normalized and finally that ν = µ • π −1 is consistent withg. It is easy to check thatg is normalized. Indeed, where we used that since p is the invariant distribution: pP = p, or a∈A p a P a,a ′ = p a ′ for all a ′ ∈ A, and henceg is normalized. The measurability ofg follows immediately from the measurability of the measure disintegration {µ y }. The positivity ofg is readily checked as well. Let y = (y 0 , y 1 , . . .) ∈ Σ, then the transition from y 0 to y 1 is allowed in Σ. Since π : Ω → Σ is surjective, it means that there is at least one pair (a, a ′ ) such that π(a) = y 0 , π(a ′ ) = y 1 and P aa ′ > 0. Since the Markov chain is assumed to be irreducible it follows that the invariant distribution p is strictly positive, and hence κ = min a,a ′ : Therefore, Now we are going to show that ν = µ • π −1 is consistent withg, or, equivalently, that for any Now we show consistency of ν withg by using the fact that µ is a g-measure for Consider an arbitrary h ∈ C(Σ), and let {µ y } be a measure disintegration for µ and π, then Thus, ν is consistent with a positive normalized functiong : Σ → (0, 1).
Therefore, if for some disintegration µ Σ = {µ y }, the functiong, as defined in equation (3.1), is continuous, then ν is a g-measure. There are two obvious sets of sufficient conditions for continiuty ofg. In particular,g is continuous if one of the following conditions holds: for any b ∈ B and any a ′ , a ′′ ∈ π −1 (b ′ ), where b ′ ∈ B. 2) µ admits a continuous measure disintegration on the fibres {Ω y = π −1 (y) : y ∈ Σ}; Proof. Ifg is indeed a continuous function, then ν is a g-measure by definition. We only have to show that conditions (1) and (2) imply continuity ofg. Let us start with the first condition (3.3). Since Condition (3.3) implies that for all a ′ ∈ π −1 y 1 , the sums in the square-brackets have the same value. Let us denote the common value by S y 0 ,y 1 . Therefore, since µ T y is a Borel probability measure on the fibre Ω T y = Ω (y 1 ,y 2 ,...) . Let us now consider the second assumption: suppose µ admits a continuous measure disintegration on the fibres {Ω y }, then for any f ∈ C(Ω), y → µ y (f ) = f dµ y is continuous. In particular, since for any b ∈ B, the function is continuous on Ω as a function of x, we conclude thatg is continuous and hence ν is a g-measure.
Remark 3.1. The first condition is simply a standard (strong) lumpabililty condition for the time-reversal of the original Markov chain. Note that lumpabililty conditions for the chain and its reversal are not equivalent in general. In this instance, however, we only consider the stationary chains, and hence, one should compare the weak lumpability conditions for the chain and its time reversal. It is somewhat surprising that we finish with the strong lumpability condition for the reversed chain, and not the weak lumpability condition.
Remark 3.2. The second sufficient condition requires existence of continuous disintegration for µ: i.e., continuity of the map for every continuous f on Ω. However, we only need continuity of integrals of rather 'simple' functions of the form Thus the question is what is the relation between the requirements that there exists of a continuous measure disintegration for µ, and that there exists a disintegration such that for all b ∈ B, the map Σ ∋ y → Ωy G b (x)µ y (dx) ∈ R + is continuous. The first condition of Corollary 3.3 then reads: for all b ∈ B, G b (x) ≡ const. In the last section we present example of an irreducible Markov chain such that G b (x) ≡ const, but µ does not admit a continuous disintegration. However, in a 'non-trivial' case G b (x) ≡ const, we believe the difference between requiring continuity y → µy f (x)µ y (dx) for all continuous f , versus, only for simple functions depending only on the first coordinate f (x) = f (x 0 ) is not substantial. The main reason is that we believe that the general hypothesis on regularity of factor measures proposed in Statistical Mechanics [11] applies to Markov chains as well.
Remark 3.3. In the following sections we will show that the second condition of Corollary 3.3 includes sufficient conditions found earlier. More specifically, we will show that conditions of Theorem 2.4 imply existence of a continuous measure disintegration for µ.
We will proceed by investigating existence of a continuous measure disintegration using methods developed in thermodynamic formalism for fibred systems.

Thermodynamic formalism for fibred systems
There has been a lot of work done on thermodynamic formalism, equilibrium states and variational principles for fibred systems: starting from the celebrated work of Ledrappier and Walters [28] on relativized variational principles to the relatively comprehensive theory of Denker and Gordin [9], as well as extensive work on random subshifts of finite type [5]. We apply the methods developed in this field to provide sufficient conditions for the existence of continuous fibre disintegrations of Markov measures. Moreover, we apply, for the first time in a dynamical setting, a method originating in Mathematical Statistics, developed by Tjur [38,39] in the 1970's, which provides a constructive approach to the construction of a continuous measure disintegration.

4.1.
Fibres as non-homogeneous subshifts of finite type. The fibres of the factor map π : Ω → Σ are not translation invariant. However, they admit a nice topological description: namely, as non-homogeneous or random subshifts of finite type. It is easy to see that if Ω = Ω M is a SFT, π : Ω → Σ is a 1-block factor map, then for any y ∈ Σ, the fibre Ω y is a non-homogeneous SFT: indeed, let S y n = π −1 (y n ), and put M y n (x n , x n+1 ) = 0 ⇔ M (x n , x n+1 ) = 0 for all n ∈ Z + and x n ∈ S n , x n+1 ∈ S n+1 . In other words, Ω y = Ω M y , for M y = {M y n }, where M y n is a submatrix of M corresponding to rows π −1 (y n ) and columns π −1 (y n+1 ).
We recall the notion of a transitive non-homogeneous subshift of finite type introduced by Fan and Pollicott [12]: It turns out that the fibre mixing condition is equivalent to the requirement that for every y ∈ Σ the fibre Ω y is a transitive non-homogeneous SFT. Moreover, the transitivity index m y of Ω y is bounded; in other words, there exists m ≥ 1 such that the products are positive for all n ∈ N and all y ∈ Σ. Let us start by recalling a standard notion of allowed words in subshifts: if X ⊂ A Z is a subshift, then the word (a 0 , . . . , a k ) ∈ A k+1 is called allowed in X, if there exists x ∈ X such that x 0 = a 0 , x 1 = a 1 , . . . , x k = a k . The set of all allowed words of a subshift X is denoted by L(X), and is called the language of X.

Lemma 4.3.
Suppose Ω, Σ are mixing subshifts of finite type, and π : Ω → Σ be a 1-block factor map. Then the following are equivalent: (1) π is fiber-mixing, i.e., for every x,x ∈ Ω y and all n ∈ N, there exists m ≥ 0 and x ∈ Ω y such that (2) π is sub-positive, i.e., there exists k ∈ N such that for any word y k 0 = (y 0 , y 1 , . . . , y k ) ∈ L(Σ), any two allowed words a k 0 , b k 0 ∈ L(Ω) such that π(a k 0 ) = π(b k 0 ) = y k 0 , there exists a third allowed word c k 0 ∈ L(Ω) satisfying π(c k 0 ) = y k 0 , c 0 = a 0 , and c k = b k . A simple corollary of this result is the following: Proof. If π is fibre-mixing, then π is sub-positive for some integer k. Consider an arbitrary point y ∈ Σ and an integer n. Suppose that the |S n | × |S n+k | matrix is not strictly positive. Thus there exist a n ∈ S n = π −1 (y n ) and b n+k ∈ S n+k = π −1 (y n+k ) such that the (a n , b n+k )-element of M (y) n,k−1 is zero. Informally, that means that we cannot connect a n and b n+k by an allowed path in Ω y of length k+1. On the other hand, we can extend a n and b n+k to configurations x,x ∈ Ω y such that x n = a n andx n+k = b n+k . Therefore, if we consider two words of length k + 1, namely, x n+k n andx n+k n , then π(x n+k n ) = π(x n+k n ) = y n+k n . Since π is sub-positive, there exists a third wordx n+k n such that π(x n+k n ) = y n+k n andx n = x n = a n ,x n+k =x n+k = b n+k . Therefore, we arrived to a contradiction with the assumption that M (y) n,k−1 (a n , b n+k ) = 0. Thus all products of the form (4.3) are strictly positive. In particular, this implies that for each fibre Ω y , the transitivity index m y is bounded by k from above.
In the opposite direction, assume that each fibre is a transitive non-homogeneous subshift of finite type. Suppose x,x ∈ Ω y . Since Ω y is assumed to be transitive, then for any i ∈ Z, we have i+m(i) n=i M (y) n > 0 for some finite m(i), and therefore there exists anx ∈ Ω y witĥ However, this means that ∈ Ω y m and therefore Ω y is fibre mixing.

4.2.
Non-homogeneous equilibrium states. Now we are ready to apply methods of thermodynamic formalism to construct directly the conditional measures on the fibres. The first and the most direct method is to use the approach of [40] which relies on the fundamental results of Fan and Pollicott [12] for transitive non-homogenous subshifts of finite type. Since the proof in the Markov case considered in the present paper is almost identical to (and, in fact, simpler than) the proof in the case of fully supported g-measures in [40], we will only sketch the necessary steps. We start by introducing the averaging operators acting on spaces of continuous functions on fibres Ω y : P y n f (x) = a n 0 ∈π −1 y n 0 : a n 0 x +∞ n+1 ∈Ωy G y n (a 0 . . . a n x n+1 . . .)f (a 0 . . . a n x n+1 . . .), where G y n (x) is defined on Ω y by (4.4) G y n (x) = Q(x 0 , x 1 ) . . . Q(x n , x n+1 ) a n 0 : a n 0 x +∞ n+1 ∈Ωy Q(a 0 , a 1 ) . . . Q(a n , x n+1 ) , Q(a, a ′ ) = p a P a,a ′ p a ′ , a, a ′ ∈ A.
Note that a n 0 : a n 0 x +∞ n+1 ∈Ωy G y n (a 0 . . . a n x n+1 . . .) = 1 for all x ∈ Ω y , and hence P y n 1 = 1. A probability measure µ y on Ω y is called a non-homogeneous equilibrium state associated to for all f ∈ C(Ω y ) and every n ∈ N. Next we will show that the equilibrium states µ y form a continuous measure disintegration, for now, we use a superscript to distinguish from the notation for a disintegration. The sequence of G y = {G y n } given by (4.4), can easily be seen to satisfy the conditions of Theorem 1 of [12], and we immediately get the following corollary: Suppose Ω, Σ are irreducible SFT's, and a 1-block surjective factor map π : Ω → Σ is such that Ω y is a transitive non-homogenous SFT for every y ∈ Σ. Then for each y ∈ Σ there exists a unique non-homogeneous equilibrium state µ y associated to G y = {G y n }. Moreover, uniformly on Ω y , as n → ∞.
Furthermore, the convergence in (4.5) turns out to be uniform in y as well. Using this rather strong property ,we also immediately get the following corollary of Lemma 3.4 and 3.5 [40]: Proposition 4.6. Under the above conditions, the family {µ y } of non-homogeneous equilibrium states on Ω y associated to G y forms a disintegration of µ, i.e., for every continuous function f one has Moreover, the family {µ y } is in fact continuous: for every continuous f , is a continuous function on ν.
Therefore, by Proposition 3.3, we conclude that ν is a g-measure.
Remark 4.1. The above method can be summarized as follows. The conditional measures on fibres are equilibrium states for the same potential as the starting measure µ. One needs to establish uniqueness of equilibrium states on the fibres first, and then prove continuity of the resulting family. In this particular case, one obtains continuity from the double uniform convergence of the averaging (transfer) operators. In the following section, we are going to show that uniqueness on each fibre is in fact sufficient, and one obtains continuity effectively for free.

4.3.
Constructive approach to conditioning on fibres. General results on the existence of measure disintegrations are not constructive. To alleviate this problem, Tjur [38,39] proposed a more direct method: the conditional measures µ y on fibres can be obtained directly, in a unique way, as a limit of measures conditioned on sets with positive measure around y.
Suppose y ∈ Σ and let D y be the set of pairs (V, B), where V is an open neighbourhood of y and B is a subset of V with positive measure: Now equip the collection D y with a partial order given by ( This partial order is upwards directed, as, for any ( Since D y is upwards directed, the collection of conditional measures N y = µ B (·) : (V, B) ∈ D y , is a net, or a generalized sequence, in the space of probability measures on Ω. We can now define the limit or accumulation points of this net as follows: By standard compactness arguments we immediately conclude that M y = ∅, and for each λ y ∈ M y , one has λ y (Ω y ) = 1.
Definition 4.8. The point y ∈ Σ is called a Tjur point if M y is a singleton, i.e., the net N y has a limit, which we denote by µ y .
Two basic theorems by Tjur provide sufficient conditions for the existence of continuous measure disintegrations. The first theorem states that, when conditional measures µ y are defined ν-almost everywhere, they form a measure disintegration.

4.4.
Gibbs measures on fibres. The main result of the previous section states that existence of a unique limit of the sequence of conditional measures µ(·|π −1 B), B ց y, for all y ∈ Σ, is sufficient for the regularity of ν. However, this condition is not easy to validate directly. The general principle for renormalization of Gibbs random fields formulated by van Enter, Fernandez, and Sokal, in the seminal paper [11], states that the conditional measures must be Gibbs for the original potential. Since the original measure µ is Markov, i.e., Gibbs for a two-point interaction, we have to study the Gibbs Markov measures on the fibres. In the setting of this paper that means that the conditional measures are Markov. In fact, we have already seen this indirectly in the Fan-Pollicott construction on non-homogeneous equilibrium states on fibres. In this section we define Gibbs Markov measures on fibres and show the absence of phase transitions, i.e., prove uniqueness on each fibre. In the following section we show that any limit measure in M y must be Markov and, given that there is only one Markov measure on each fibre, we conclude that |M y | = 1 for all y ∈ Σ. Suppose Ω, Σ and π : Ω → Σ are defined as above and µ is a stationary Markov measure with Ω as its support. Q(x 0 , x 1 ) . . . Q(x n , x n+1 ) a n 0 : a n 0 x +∞ n+1 ∈Ωy Q(a 0 , a 1 ) . . . Q(a n , x n+1 ) , Q(a, a ′ ) = p a P a,a ′ p a ′ , a, a ′ ∈ A.
If we define the interaction Φ = {Φ Λ (·)} -collection of functions indexed by finite subsets Λ of Z + , by then the expression (4.6) can be rewritten in a more traditional Gibbsian form: and Z [0,n] (x ∞ n+1 ) = a n 0 :a n 0 x ∞ n+1 ∈Σy exp −H [0,n] (a n 0 x ∞ n+1 ) is the corresponding partition function. We denote by G Ωy (Φ) the set of all Gibbs probability measures for the interaction Φ.
Since Ω y is a non-homogeneous subshift of finite type, i.e., the lattice system as described in [36], the standard theory of Gibbs states implies G Ωy (Φ) is a non-empty convex set of measures. Moreover, the extremal measures are tail-trivial. Thus two extremal measures in G Ωy (Φ) are either singular or equal.
Then ρ ∈ G Ωy (Φ) if and only if for every continuous f on Ω y the Dobrushin-Lanford-Ruelle equations are valid for every n ≥ 0 Given the fact that the non-homogeneous subshift of finite type Ω y is transitive, Φ is a finite range potential, it is easy to check that the family of probability kernels γ [0,n] (·|x ∞ n+1 ) satisfies the so-called boundary uniformity condition: there exists c > 0 such that for any a m 0 ∈ π −1 (y m 0 ), and every x,x ∈ Ω y , for all sufficiently large n, one has (4.8) . Applying standard arguments for uniqueness of Gibbs measures under the boundary uniformity condition [14] one gets: Suppose Ω y is a transitive non-homogeneous subshift of finite type, and the potential Φ is such that the family of probability kernels {γ [0,n] } satisfies (4.8). Then there exists a unique Gibbs measure for Φ on Ω y , i.e., |G Ωy (Φ)| = 1.

4.5.
Conditional measures are Markov. We are now going to show that any limit point of the net N y must be a Gibbs Markov measure on Ω y , i.e., M ⊆ G Ωy (Φ). Since we have already shown that |G Ωy (Φ)| = 1 for all y ∈ Σ, we conclude that |M y | = 1 for all y, i.e., all points in ν are Tjur, and hence the ν is a g-measure. Proof. Let y ∈ Σ, λ ∈ M y and (B m , V m ) ∈ D y is a sequence such that µ Bm → λ. It suffices to show that each µ Bm is a limit point of linear combinations in M y . For any m, n ∈ N we can find a collection {C (m) n,l } of disjoint cylinder sets in Σ, indexed by a finite set L m,n , such that ν B m ∆ ∪ l∈Lm,n C (m) n,l < 2 −n ν(B m ).
Given any measurable set A, we have that µ Bm (A) − µ ∪ l∈Lm,n C (m) n,l (A) → 0 as n → ∞.
Also note that .
In other words, each µ Bm is a limit point of linear combinations of measures of the form µ C (m) n,l . Therefore λ is a limit point of linear combinations of measures in M y .
Hence, if we are able to prove that M y ⊆ G Ωy (Φ), then we are able to conclude that M y ⊆ G Ωy (Φ) as well. Suppose where z (m) is some finite word in alphabet B, such that ν([y m 0 z (m) ]) > 0 for all m. We are going to show that ρ is a Markov measure on Ω y , in other words (4.9) ρ(x n 0 |x n+ℓ n+1 ) = ρ(x n 0 |x n+1 ) for all n ≥ 0, ℓ ≥ 1, and x ∈ Ω y . Since ρ is the weak limit of ρ m 's, it is thus sufficient to establish (4.9) for ρ m for all sufficiently large m.
Consider x ∈ Ω y , fix n ≥ 0, ℓ ≥ 1. Choose m 0 such that for all m ≥ m 0 , K m -the length of the word y m 0 z (m) , satisfies K m > n + ℓ; e.g., m 0 = n + ℓ + 1 suffices. Then , is independent of m and of x n+ℓ n+2 . Hence, ρ, which is the weak limit of ρ m 's satisfies (4.9), and is thus a Markov measure on Ω y .
These results can now be used to show that fibre mixing does indeed imply existence of a continuous measure disintegration, and hence by Corollary 4.11 regularity of the factor measure ν.

Corollary 4.16.
Let Ω ⊂ A Z + and Σ ⊂ B Z + be mixing subshifts of finite type, and π : Ω → Σ a 1-block factor map which is fibre mixing . Suppose µ is the stationary Markov measure consistent with Ω, then µ admits a continuous measure disintegration and hence ν = µ • π −1 is a g-measure.
Proof. By Lemma 4.13, for every y ∈ Σ, there is a unique Gibbs Markov measure on Ω y : |G Ωy (Φ)| = 1. By Proposition 4.15, ∅ = M y ⊂ G Ωy (Φ), and hence |M y | = 1 for all y ∈ Σ. Thus all points in Σ are Tjur, and hence by Corollary 4.11, µ admits a continuous disintegration, which allows us to conclude that ν is a g-measure.
If p = 1 2 , then {Y n } are independent, and ν = µ • π −1 is the Bernoulli measure . We now show that {µ y } y∈Σ defined by is a continuous measure disintegration of µ. It is clear that, given y, the measure µ y is a Borel measure supported on Ω y . Moreover, one has and since y → x + y and y → x − y are a continuous maps, for any continuous function f and any ǫ > 0, one can choose δ > 0, such that d(y,ỹ) < δ implies |µ y (f ) − µỹ(f )| < ǫ.
We now show that {µ y } is indeed a disintegration of µ. For x = (x i ) i≥0 , letx = (x i ) withx i = −x i for all i ≥ 0; note that x − y = x + y . It is sufficient to validate consistency of disintegration {ν y } for indicators of cylindric sets: Hence the µ admits for a continuous disintegration. This example only works for a very specific parameter value p = 1/2. Interestingly, there exists another example that has exactly the same continuous measure disintegration. Let p ∈ (0, 1) and {X n } n∈Z + be a Markov chain taking values in {−1, 1}, with the transition probability matrix The stationary distribution is the distribution ρ = 1 2 , 1 2 . Then the factor process Y n = π (X n , X n+1 ) = X n · X n+1 .
for any n ≥ 1, where we again used the notationȳ 0 = −y 0 . It follows that the process {Y n } n∈Z + is Bernoulli with the parameter p. Note that this example has exactly the same fibre structure as the last example: Ω y = π −1 (y) = x + y , x − y , where x + y = (1, y 0 , y 0 · y 1 , ...), x − y = (−1, −y 0 , −y 0 · y 1 , ...). Moreover, the same continuous measure disintegration exists: {µ y } y∈Σ with Continuity and consistency follow by an identical computation as for the Furstenberg example above. Therefore, we have another example of a factor measure with a continuous measure disintegration, but without fibre mixing conditions. 5.1. Markov factor without continuous measure disintegration. We now show by an example that existence of a continuous measure disintegration is not necessary. In this example, the factor measure ν is Markov. Let {X n } n∈Z + be a stationary Markov chain taking values in A = {1, 2, 3, 4} defined by the probability transition matrix: Define the factor map π as follows: let B = {a, b, c} and put π : A → B, by π(1) = π(3) = a, π(2) = b, π(4) = c. Then the space Σ ⊂ B Z + is a subshift of finite type with forbidden words {bb, cc, bc, cb}. This example is not lumpable as (1, 0, 0, 0) is an initial distribution for which the factor process is not Markov; the transition from state a to state c in the output process has probability 0 until the first occurrence of the word ba. However, direct application of the result in [25], shows that the stationary chain {X n } is weakly lumpable with respect π, i.e., {Y n = π(X n )} is a Markov process, and one can easily compute the corresponding transition probability matrixP . The stationary invariant distribution of {X n } is p = 1 3 , 1 6 , 1 3 , 1 6 . Hence, ν is a Markov measure with the probability transition matrix We proceed by showing that no continuous measure disintegration exists. In this particular case, the map π is finite to one factor map, meaning the fibres have a bounded number of elements.
Since π −1 (b) = 2 and π −1 (c) = 4, but π −1 (a) = {1, 3}. If we assume that y n = a and m = min{m > n : y m ∈ {b, c}} is finite, then π −1 (y) n = 1 if y m = b, and π −1 (y) n = 3 if y m = c. Therefore elements y ∈ Σ are uniquely decodable if y = y n−1 0 a ∞ n for any n ≥ 1, i.e., if y does not end with infinite string of a's. Otherwise, if y = y n−1 0 a ∞ n for some n ≥ 1, then the fibre contains exactly two points, corresponding to one of the two possible tails: infinite number of 1's or 3's. This fibre structure makes a continuous measure disintegration of µ impossible: Suppose {µ y } Z + is a measure disintegration of µ for the factor map π. Then µ y is supported on Ω y for each y ∈ Σ. Furthermore, consider the point z = a ∞ 0 . Then any open neighborhood of z contains, for some n > 0 the cylinder sets [a n 0 b] and [a n 0 c]. For each y ∈ [a n 0 b] we have Ω y ⊂ [1 0 ], while for each y ′ ∈ [a n 0 c] we have Ω y ′ ⊂ [3 0 ]. Hence Since both cylinders [a n 0 b] and [a n 0 c] have positive ν-measure, and ½ [1 0 ] is a continuous function we conclude that no measure disintegration µ can be coninuous at z = a ∞ 0 . One can also use entropy methods to conclude that the factor measure is well behaved.
which in our case is h µ = log (2). For an irreducible SFT the measure of maximal entropy, also known as Parry measure, is unique and is Markov. Moreover, a finite-to-one factor map between two SFTs sends the measure of maximal entropy to the measure of maximal entropy. Thus since ν is a measure of maximal entropy on Ω, then so is ν = µ • π −1 , and hence ν is also Markov.

Conclusions and final remarks
In the present paper we have established sufficient conditions (Corollary 3.3) for 1-block factors of Markov measures to be g-measures. The result combines naturally two types of sufficient conditions: namely, the lumpability and the existence of a continuous disintegrations. We have presented an example showing that these codntions are in fact complementary. We have also demonstrated that the known sufficient conditions on regularity of factors of Markov measures imply existence of a continuous measure disintegrations. Note also that Theorem 3.2 does in fact provide the necessary and sufficient conditions for the factor measure ν to be regular. Namely, the factor measure ν is a g-measure if and only if there exists a continuous normalized function g : Σ → (0, 1) such that g(y) =g(y) for ν-a.a. y ∈ Σ, whereg(y) is given by (3.1). Equivalently, we can conclude that ν is regular if and only if there exists a disintegration {µ y } of µ such the right hand side in (3.1) defines a continuous function of y. It would be interesting to understand whether the two sets of sufficient conditions for continuity ofg identified in the present paper, are complete, i.e., exhaust all possibilities.
An important point which we have not addressed is the following: in case the one-block factor of the Markov measure is a g-measure, how 'smooth' is the corresponding g-function?
In all known examples, factors of Markov measures are either g-measures for some Hölder continuous g, or not regular (c.f., Furstenberg's example). It would be interesting to understand whether this apparent dichotomy can be turned into a rigorous result. We believe that the proposed method: the study of conditional measure disintegrations, can be used to address such questions as well. For example, using the method of Fan-Pollicott, discussed in Section 4.2, one can show that in the fibre-mixing case, the family of non-homogeneous equilibrium states {µ y } is in fact 'smooth', and the resulting g-function is Hölder continuous. For details on how to deduce properties of the g-function from the properties of the corresponding measure disintegration the reader can consult [40]. We have chosen not to provide the details here because the result is well-known and the estimates on the exponential rate of decay of variations the g-function are weaker in comparison to the more direct proofs [42], see also [26,34,35] for more general results.
An advantage of the proposed method lies in the fact that uniqueness of the Markov measure on each fibre immediately implies existence of a continuous measure disintegration, and hence, regularity of the factor measure. In comparison, some earlier results had to establish uniqueness of fibre measures first, followed by a separate argument for continuity of family of conditional measures, e.g., [40].
The approach developed in the present paper can also be applied to the study of regularity properties of the renormalized Gibbs random fields [2,3]. The so-called non-overlapping block renormalization transformations [11] can be represented as 1-block factors, i.e., the renormalized field is given by Y n = π(X n ) for all n ∈ Z d . Existence of a continuous measure disintegration remains a sufficient condition for regularity of the factor measure, and is implied by the uniqueness of Gibbs measures on fibres. As a result, one obtains significant simplification of the proofs of regularity of Gibbs factors in several cases, e.g., decimations of the Ising models [11,19] and Fuzzy Potts model [18,20].

acknowledgements
The authors acknowledge support from The Dutch Research Council (NWO), grant 613.001.218.