Projective distance and $g$-measures

We introduce a distance in the space of fully-supported probability measures on one-dimensional symbolic spaces. We compare this distance to the $\bar{d}$-distance and we prove that in general they are not comparable. Our projective distance is inspired on Hilbert's projective metric, and in the framework of $g$-measures, it allows to assess the continuity of the entropy at $g$-measures satisfying uniqueness. It also allows to relate the speed of convergence and the regularity of sequences of locally finite $g$-functions, to the preservation at the limit, of certain ergodic properties for the associate $g$-measures.


Introduction.
1.1. In [12] Hilbert introduced the so called projective distance, for which the geodesic are precisely the straight lines. It was later used by G. Birkhoff to prove the existence and uniqueness of positive eigenvectors for positive linear transformations on Banach spaces [1]. Birkhoff's strategy goes as follows: uniformly positive bounded linear transformations map the positive cone of a Banach space into itself. This transformation is non-expansive with respect to the projective distance, and if the image cone has finite diameter, then the transformation is a projective contraction. In this case Banch's fixed point Theorem ensures the existence and uniqueness of a projective fixed point for the linear transformation, and projective fixed points are nothing but positive eigenvectors. Furthermore, the contractiveness ensures that the iterations of the linear transformation on any positive vector converge exponentially fast, in the projective sense, towards to the fixed point. Birkhoff's strategy has been successfully employed in the solution of a variety of problems, in particular to prove existence and uniqueness of invariant measures, and the exponential decay of correlations of convenient observables. This has been done for symbolic systems [10,23], for suitable one-dimensional maps [18,19], and for general maps with some degree of hyperbolicity [17,24].
Ornstein'sd-distance was introduced in [26] to give a topological characterization to the Bernoulli processes. This distance generates a topological structure well adapted to the study of important ergodic properties. For instance,d-limits of sequences of mixing processes are mixing, the class of Bernoulli processes isd-closed, as well as the class of K-processes. Bressaud and coauthors, in a study of Markov approximation to g-measures (chains of complete connection in their nomenclature), found an upper bound for the speed ofd-convergence of the approximations related to the regularity of the g-function [3]. In a related work [7], Coelho and Quas studied thed-continuity of g-measures with respect to the uniform distance between g-functions.
1.2. In [5] we stablished a relation between the rate of projective-convergence of the Markovian approximations of a one-dimensional Gibbs measures and the decay of correlations of the limiting Gibbs measure. The result extends straightforwardly to the case on g-measures defined by sufficiently regular g-functions. Our technique relies on a projective comparison of the marginals of the approximating measures. If the potential defining the Gibbs measure is sufficiently regular, then the finite range approximations are sufficiently similar "in the projective sense", and in this case the mixing rate of the Gibbs measure can be upper bounded by a function of the mixing rates of the approximations. Additionally, in this fast approximation regime, the entropy of the approximations converges toward the entropy of the Gibbs measure. Furthermore, since in that case the relative entropy of the limiting Gibbs measure with respect to the approximations goes to cero, then Marton's bounds [21,22] ensures the convergence of the approximations ind-distance. In a recent work [20], Maldonado and Salgado applied our approach to study the approximability of Gibbs measure for two-body interactions in one dimensional symbolic systems. This technique was also used in our study of the preservation of Gibbsianness under amalgamation of symbols [6].
1.3. Despite its actual and potential applications, our notion of "projective convergence" has not yet been formalized, neither its relation tod-converges or vague convergence has been established. The aim of this paper is to fill this gap and to explore to which extent the projective convergence as we define it, is well adapted to study particular classes of processes. We consider in particular the class of gmeasures, leaving for a forthcoming work the study of measures obtained by random substitutions for which we already have some preliminary results. The rest of the paper is organized as follows. The next section is devoted to the study of some general properties of the projective distance, particularly its relation to the vague distance and thed-distance. In Section 3 we study the convergence of Markov approximations to a g-measure, the continuity of the entropy at g-measures satisfying uniqueness, and we establish a criterion for uniqueness based on the speed of convergence and regularity of Markov approximations. Section 4 contains some concluding remark and and perspectives.
1.4. Acknowledgements. This work was supported by the Mexican Government through CONACyT grant CB-2009-01-129072. It was also partially supported by Universidad Autónoma de San Luis Potosí, via grant C14-FAI-04-33.33. We thank Laboratorio Internacional Solomon Lefschetz the financing of our academic exchange with Professor Chazottes from Ecole Polytechnique. L. Trejo-Valencia is supported by CONACyT through the Ph. D. Fellowship 332432.

Let
A be a finite set, which we also called alphabet, and let X := A N the set of infinite A-valued sequences. As usual, the elements of A will be called symbols and words the finite tuples in A. Given x x x = x 1 x 2 · · · ∈ A N and natural numbers 1 ≤ n ≤ m, x x x m n denote the word x n x n+1 . . . x m−1 x m . The left shift T : The pair (X, T ) is the full shift on the alphabet A.
To a word a a a ∈ A n , n ∈ N, we associate the cylinder set [a a a] := {x x x ∈ A N : x x x n 1 = a a a}. Cylinder sets are clopen in the standard Tychonoff topology and generate the corresponding Borel σ-algebra B(X). We denote by M(X) the set of all Borel probability measures on X and by M T (X) the subset of T -invariant probability measures. Both M(X) and M T (X) are compact convex sets in vague topology. The vague topology can be metrized by the distance It is known that M(X) as well as M T (X) are convex set, complete and separable in the vague topology. Furthermore, they have the structure of a simplex, which, in the case of M T (X) implies the uniqueness of the ergodic decomposition [8].
Given µ, ν ∈ M(X), a coupling between µ and ν is a measure λ ∈ M((A × A) N ) such that for all n ∈ N, Here a a a we denote the set of all couplings between µ and ν. Ornstein's d-distance is given by where∆ = {ab ∈ A × A : a = b} is the complement of the diagonal. Distancē d makes M(X) a complete but non-separable topological space. The same holds whend is restricted to the subspace of T -invariant measures M T (X) (see [29] for instance). The function ρ defines a distance on M + (X) which we call projective distance.
is a complete metric space with respect to ρ.
Let us now prove that M + (X) is complete with respect to the distance ρ. For this let {µ m } m∈N be a Cauchy sequence with respect to ρ, which is a Cauchy sequence respect to D as well. Since D makes M(X) a complete space, then there exists µ ∈ M(X) towards which {µ m } m∈N converges. Now, for each n ∈ N, a a a ∈ A n and every m ∈ N, for each n ∈ N, a a a ∈ A n and m 0 ∈ N. From this it follows that which proves that µ is the limit of {µ m } m∈N in the projective distance.
As mentioned above, M(X) is separable in the vague topology while it is nonseparable with respect to the topology induced byd. In this respect, regarding the projective distance we have the following.
Proof. We will exhibit a collection {µ x x x ∈ M + (X) : x x x ∈ {0, 1} N }, such that ρ(µ x x x , µ y y y ) > 1/2 whenever x x x = y y y.
Fix x x x ∈ {0, 1} N , and for each n ∈ N and a a a ∈ {0, 1} n let q(a a a) = max{1 ≤ k ≤ n : a a a k 1 = x x x k 1 } + 1. Now, fix α > 1 and let ν x x x ∈ M + ({0, 1} N ) be given by which proves that the marginals are well normalized. Now, if a a a ∈ A n is such that q(a a a) < n, then q(a a ab) = q(a a a) for all b ∈ A, and b∈{0,1} We have proven that the marginals are well normalized and compatible, which ensures that ν x x x is well defined.
Now, consider any surjective map π : A → {0, 1} and for each n ∈ N extend it coordinatewise to A n . We will denote all those coordinatewise extensions with the same letter π. For each x x x ∈ {0, 1} N the measure µ x x x ∈ M + (X) is given by a a a)) .
This measure is well defined since for each n ∈ N a a a∈A n Now, for x x x = y y y we have ν y y y [π(a a a)] , In this way we obtain the desired uncountable collection {µ x x x ∈ M + (X) : x x x ∈ {0, 1} N } such that ρ(µ x x x , µ y y y ) ≥ 1/2 whenever x x x = y y y. (4), the vague topology is coarser than the projective topology (the one induce by ρ). It is well known, and easy to argue, that thedtopology is finer than the vague topology, and it remains to know how to place the projective topology with respect to thed-topology. Below we will prove that ρ is not coaser thatd. With this, and a construction based on g-measures which we will present in Section 3, we will be able to complete the proof that ρ andd are not comparable.

According to Equation
Theorem 3. There exists a sequence {µ p ∈ M + (X)} p∈N converging ind-distance, but not in the projective distance.
Proof. Let µ x x x ∈ M + (X) be as in the proof of Theorem 2. We will exhibit a Consider the measures µ x x x p and µ x x x as defined in Equation (6). Let us remind that for each y y y ∈ {0, 1}, the measure µ y y y ∈ M(X) is induced by a corresponding measure ν y y y ∈ M({0, 1} N ), defined in Equation (5), via a projection π : A → {0, 1}. Let τ : A → A be a permutation satisfying τ (a) ∈ π −1 (1 − π(a)) for each a ∈ A and with this, for each n ∈ N define the permutation τ p : A n → A n such that We will denote all those permutations with the same symbol τ p . With this we define the coupling λ p ∈ J(µ x x xp , µ x x x ) such that for each a a a × a a a)], from which it follows that λ p is a coupling. By using this coupling we obtain In this way we have proved that µ x x x = lim p→∞ µ x x x p ind-distance.
Theorem 2 ensures that ρ(µ x x x p , µ x x x p ′ ) > 1/2 for all p = p ′ . The theorem follows by taking µ p := µ x x xp .

g-measures
3.1. Let us start with a brief reminder of g-measures. A g-function is any Borel measurable function g : X → (0, 1) satisfying x1 g(x x x) = 1, and a compatible g-measure is any µ ∈ M + T (X) := M + (X) ∩ M T (X) satisfying (7) lim n→∞ µ(x 1 = a 1 |x x x n 2 = a a a n 2 ) := lim n→∞ µ[a 1 a a a n 2 ] µ[a a a n 2 ] = g(a a a), for all a a a ∈ X. This notion is intended to generalize that of Markov chain and was introduced into ergodic theory by M. Keane in [14]. It has as ancestor the so called chains with complete connections studied in probability theory as early as 1935 [25]. This notion is related, and under some conditions is equivalent, to the notion of equilibrium states [30,15]. One of the main problems concerning g-measures is whether a given g-function admits a unique compatible g-measure. Existence of compatible g-measures requires only the continuity of g, while stronger continuity conditions are needed to ensure uniqueness. For instance, Hölder continuity of the g-function implies the existence and uniqueness of a compatible g-measure for which strong mixing holds. Several criteria have been established to ensure uniqueness, all of them relying on the regularity of the g-function. As mentioned in Section 1, several works have considered thed-continuity of g-measures under strong regularity conditions for the limit g-function, and have proved in this way that the limit gmeasure has good ergodic properties (the Bernoullicity of the natural extension [7] or the fast decay of correlation [3]). On the other hand, several examples have been proposed to show that the continuity of the g-function is not enough to ensure the uniqueness of the corresponding g-measure.
Among those examples we find the already classical Bramson-Kalikow construction [2]. Recently P. Hulse [13] published a construction inspired on the Ising model with long range interactions, of a g-function where uniqueness fails. For this example, the set of compatible g-measures necessarily contains non-ergodic measures.

Let us start by reminding the notions of variation of a function and that of
Markov approximation to a measure.
For φ : X → R and each ℓ ∈ N, the ℓ-variation of φ is given by φ(x x x) .
For φ continuous we necessarily have lim ℓ→∞ var ℓ φ = 0. In this case, the speed of convergence of the variation characterizes the regularity of φ. For instance, Hölder continuity corresponds to exponential decreasing of the variation.
Given µ ∈ M(X), for each ℓ ∈ N, the canonical ℓ-step Markov approximation to µ is the only measure µ ℓ ∈ M(X) satisfying for all a a a ∈ X and n ≥ ℓ.
It is well known and easily proved that µ ℓ → µ as ℓ → ∞ in the vague topology. In this respect, concerning the g-measures, we have the following theorem.
Theorem 4. Let g : X → [0, 1] be a continuous g-function and µ ∈ M(X) a compatible g-measure. For each ℓ ∈ N let µ ℓ ∈ M(X) be the canonical ℓ-step Markov approximation. Then µ ℓ → µ as ℓ → ∞ in the projective distance. Furthermore, Proof. First note that for all a a a ∈ X and n ≤ m we have µ[a a a n 1 ] µ[a a a , with p : A m → (0, 1) a probability distribution given by It follows from this, and taking the limit m → ∞, that g(x x x) ≤ µ[a a a n 1 ] µ[a a a n 2 ] ≤ max for all a a a ∈ X and ℓ ≤ n.
For n ≤ ℓ we have µ ℓ [a a a n 1 ] = µ[a a a n 1 ] for all a a a ∈ X. On the other hand, for n > ℓ and a a a ∈ X by writing µ[a a a n 1 ] = n−ℓ−1 j=1 µ[a a a n j ] µ[a a a n j+1 ] × µ[a a a n n−ℓ ], µ ℓ [a a a n 1 ] = . Inequalities (10) imply 1 n log µ[a a a n 1 ] µ ℓ [a a a n 1 ] for all a a a ∈ X and n ∈ N, from which it follows that ρ(µ ℓ , µ) ≤ var ℓ log •g, and the proof is done.
3.3. Let us describe the construction by P. Hulse cited above, which we slightly modify to fit in our context. Consider the real map t → ψ(t) = e t (e t + e −t ) −1 and fix sequences where π(x x x) Λ = Λ −1 Λ m=1 π(x k ) for each Λ ∈ N. Now, for each ℓ ∈ N, both g ℓ and g ′ ℓ are constants inside each cylinder of length Λ ℓ , therefore Walters' criterion (logarithm with summable variations [30]) ensures the existence and uniqueness of g-measures µ ℓ and µ ′ ℓ compatible with g ℓ and g ′ ℓ respectively. Hulse's construction consist on determining sequences From Hulse's construction and Theorem 4 it readily follows the next result.
Proof. Let g : A → [0, 1] be the g-function in Hulse's construction above, and let M(g) the collection of all the compatible g-measures. Since M(g) is not a singleton, then it necessarily contains non-ergodic measures, for instance any strict convex combination of two different extremal measures. Let µ be such a non-ergodic measure. Now, for each ℓ ∈ N, let µ ℓ be the ℓ-step Markov approximation to µ, as defined in Equation (9). According to Theorem 4, the sequence {µ ℓ } ℓ∈N converges to µ in the projective distance. It is know thatd-limits of mixing measures are mixing (see Theorem I.9.17 in [29] for instance). Since µ is fully-supported, then µ ℓ is a mixing measure for each ℓ ∈ N but since µ is not even ergodic, then {µ ℓ } ℓ∈N cannot converge ind-distance.
3.4. It is know that the entropy is ad-continuous functional in the class of ergodic processes (Theorem I.9.16 in [29]), while it is only upper semicontinuous with respect to the vague topology (Theorem I.9.1 in [29]). Concerning the projective distance, we have the following result. Theorem 6. Assume g admits a unique g-measure µ (in which case this measure is ergodic), and suppose that {µ p } p∈N is a sequence of ergodic measures converging to µ in the projective distance, then Proof. First we prove that the relative entropy n a a a∈A n µ p [a a a] nρ(µ p , µ) = ρ(µ p , µ), and the claim follows. Now, following the arguments in [4, Section 3.2], we readily deduce that Now, since the topology of the projective distance is finer than the vague topology, we necessarily have Finally, the Variational Principle for g-measures (see [16] for a proof) establishes that From all the above arguments it follows that and the proof is done.
3.5. In this paragraph we explore the relationship between convergence of gfunctions and the possible convergence in projective distance, of the corresponding g-measures. An analogous result, concerning thed-distance, was obtained by Coelho and Quas in [7]. Before stating our result, let us fix some notation.
Let G ⊂ C 0 (X) denote the set of g-functions, i. e. the set of continuous functions g : X → (0, 1) satisfying a∈A g(ax x x) = 1, ∀ x x x ∈ X. Now, for g ∈ G denote by M(g) ⊂ M(X) the simplex made of all probability measures compatible with g (or g-measures) as defined in Equation (7).
For φ : X → R and N ∈ N, let us denote svar ℓ φ = ℓ k=1 var k φ where var k φ is defined as in Equation (8). We will say that a locally constant function φ : X → R has range ℓ ∈ N whenever (y y y). Clearly, for a locally constant function of range ℓ, var n φ = 0 for all n ≥ ℓ. It is not hard to prove that if g ∈ G is locally constant of range ℓ + 1, then M(g) contains a unique ℓ-step Markov measure (see Section A.1 for details). We have the following.
Theorem 7. Let {g ℓ ∈ G} ℓ∈N be a sequence of locally constant functions converging to g in the sup-norm, and such that for each ℓ ∈ N the function g ℓ is locally constant of range ℓ + 1. If lim ℓ→∞ || log(g/g ℓ )||e svar ℓ log •g ℓ = 0, then the sequences {µ ℓ } ℓ∈N , where µ ℓ is the unique measure in M (g ℓ ), converges in projective distance. Furthermore, the limit measure µ ∈ M(X) is the unique measure in M(g).

Concluding Remarks.
With Theorems 3 and 5 we have established the incomparability of thed-topology and the projective topology in the set of fully-supported probability measures. It is nevertheless not clear if this incomparability remains in the restriction to the class of invariant probability measures. It is not hard to verify that the the projective distance between two Markov measures can be computed by means of a finite algorithm taking the parameters defining the measures as inputs. One can also argue that the output value varies continuously or at worst piecewise continuously with the input parameters. This this does not seem to be the case of thed distance, which suggests that in the class of Markov measures the projective topology is coarser than thed topology.
Theorem 7 establishes a new criterion for uniqueness of g-measures based on the speed of convergence of locally constant approximations to the g-function. It can be related to a similar criterion ensuring convergence ind-distance established by Coelho and Quas in [7]. Although in our case we cannot deduce that the limit measure satisfies the Bernoulli property, we can nevertheless ensure that the limit measure inherits the mixing property of the Markov approximations, and thanks to Theorem 6, that the the entropy is continuous with respect to the projective distance at the limit measure.
Example 1 is the g-measure analog of the one-dimension Ising model with long range interaction, for which a phase transition has been proved to occur (see [9,11] for details). The analogy suggests that the uniqueness of the associated g-measure must break at high values of the parameter β. This transition should be detectable through a criterion involving the regularity of the g-function and the speed of convergence of the Markov approximations.
The projective distance appears to be suited for the study of measures obtained by random substitutions as the one we have characterized in [27]. We can prove that for a certain class of random substitutions, the substitution process is a contraction in the projective distance, and that the unique attractor has the mixing property. The study of this kind of processes and its characterization in terms of the projective distance is the subject of a forthcoming work.
Appendix A.
A.1. A n × n real matrix M is said to be primitive if M ≥ 0 (i. e. none of its entries is negative) and for some k ∈ N, M k > 0 (i. e. all the entries of M k are positive). The primitivity index of a primitive matrix M is the smallest integer ℓ such that M ℓ > 0. The Perron-Frobenius Theorem ensures that the spectral radius (i. e. the maximal norm of its eigenvalues) of a primitive matrix M is achieved by a simple positive eigenvalue λ with positive right and left eigenvectors v v v and w w w respectively.