Asymptotic Entropy of Random Walks on Regular Languages over a Finite Alphabet

We prove existence of asymptotic entropy of random walks on regular languages over a finite alphabet and we give formulas for it. Furthermore, we show that the entropy varies real-analytically in terms of probability measures of constant support, which describe the random walk. This setting applies, in particular, to random walks on virtually free groups.


Introduction
Let A be a finite alphabet and let A * be the set of all finite words over the alphabet A, where o denotes the empty word. Consider a transient Markov chain (X n ) n∈N 0 on A * with X 0 = o such that the transition probabilities depend only on the last K ∈ N letters of the current word, in between two steps the word length differs only by at most K letters and in each step only the last K ∈ N letters of the current word may be modified. Denote by π n the distribution of X n . We are interested whether the sequence 1 n E[− log π n (X n )] converges, and if so to describe the limit. If it exists, it is called the asymptotic entropy, which was introduced by Avez [1]. The aim of this paper is to prove existence of the asymptotic entropy, to describe it as the rate of escape w.r.t. the Greenian distance and to prove its real-analytic behaviour when varying the transition probabilities of constant support.
We outline some background on this topic. It is well-known by Kingman's subadditive ergodic theorem (see Kingman [11]) that the entropy exists for random walks on groups if E[− log π 1 (X 1 )] < ∞. In contrast to this fact existence of the entropy on general structures is not known a priori. In our setting we are not able to apply the subadditive ergodic theorem since we neither have subadditivity nor a global composition law of words if we restrict the random walk to be on a proper subset of A * . This forces us to use other techniques like generating functions techniques. These generating functions are power series with probabilities as coefficients, which describe the characteristic behaviour of the underlying random walks. The technique of our proof of existence of the entropy was motivated by Benjamini and Peres [2], where it is shown that for random walks on groups the entropy equals the rate of escape w.r.t. the Greenian distance; compare also with Blachère, Haïssinsky and Mathieu [3]. In particular, we will also show that the asymptotic entropy h is the rate of escape w.r.t. a distance function in terms of Green functions, which in turn yields that h is also the rate of escape w.r.t. the Greenian distance. Moreover, we prove convergence in probability and convergence in L 1 of the sequence − 1 n log π n (X n ) to h, and we show also that h can be computed along almost every sample path as the limes inferior of the aforementioned sequence. The question of almost sure convergence of − 1 n log π n (X n ) to some constant h, however, remains open. Similar results concerning existence and formulas for the entropy are proved in Gilch and Müller [8] for random walks on directed covers of graphs and in Gilch [7] for random walks on free products of graphs. Furthermore, we give formulas for the entropy which allow numerical computations and also exact calculations in some special cases.
Kaimanovich and Erschler asked whether drift and entropy of random walks vary continuously (or even analytically) when varying the probabilities of the random walk with keeping the support of single step transitions constant. In this article we also show that h is real-analytic in terms of the parameters describing the random walk on A * . This fact applies, in particular, to the case of bounded range random walks on virtually free groups. At this point let us mention that several papers concerning continuity and analyticity of the drift and entropy have been published recently: e.g., see Ledrappier [13], [14], Haïssinsky, Mathieu and Müller [9], Gilch [7]. The recent article of Gilch and Ledrappier [5] collects several results about analyticity of drift and entropy of random walks on groups.
The reasoning of our proofs follows a similar argumentation as in [8] and [7]: we will show that the entropy equals the rate of escape w.r.t. some special length function, and we deduce the proposed properties analogously. The plan of the paper is as follows: in Sections 2 and 3 we define the random walk on the regular language and the associated generating functions. Sections 4 explains the structure of cones in the present context. In Sections 5 and 6 we prove existence of the asymptotic entropy, while in Section 7 we give explicit formulas for it. Section 8 shows real-analyticity of the entropy.

Notation
Let A be a finite alphabet and denote by o the empty word. A random walk on a regular language is a Markov chain on a subset L ⊆ A * := n≥1 A n ∪ {o} of all finite words over the alphabet A, whose transition probabilities obey the following rules: (i) Only the last two letters of the current word may be modified.
(ii) Only one letter may be adjoined or deleted at one instant of time. (iii) Adjunction and deletion may only be done at the end of the current word. (iv) Probabilities of modification, adjunction or deletion depend only on the last two letters of the current word.
Compare with Lalley [12] and Gilch [6]. The assumption that transition probabilities depend only on the last two letters of the current word may be weakened to dependence of the last K ≥ 2 letters by blocking words of length at most K to new letters (compare with [12,Section 3.3]. In general, a regular language is a subset of A * whose words are accepted by a finite-state automaton. It is necessary that by each modification of a word of the regular language in one single step a new word of the regular language is created. The results below, however, are so general such that w.l.o.g. -for ease and better readabilitywe may assume that the regular language L consists of the whole set A * . We will use the notation w ∈ L, w ∈ A * respectively, to emphasize at some points that we explicitely mean a word of the language or just a word over the alphabet. Let us note that random walks on virtually free groups constitute a special case of our setting, and our results directly apply. We introduce some notation. For a word w ∈ L and k ∈ N, w[k] denotes the k-th letter of w, and [w] denotes the last two letters of w. The random walk on L is described by the sequence of random variables (X n ) n∈N 0 . Initially, we have X 0 := o. If we want to start the random walk at w ∈ L instead of o, we write for short P w [ · ] := P[ · | X 0 = w]. For two words w 1 , w 2 ∈ A * , we write w 1 w 2 for the concatenated word. We use the following abbreviations for the transition probabilities: for w ∈ L, a 1 , We call this property weak symmetry. For w 1 , w 2 ∈ L, the n-step transition probabilities are denoted by The natural word length of any w ∈ L is denoted by |w|.
Malyshev [15] proved that the rate of escape w.r.t. the natural word length exists under some natural assumptions, that is, there is a non-negative constant ℓ such that lim n→∞ |X n | n = ℓ almost surely.
Here, ℓ is called the rate of escape. Furthermore, by [15] follows that ℓ is strictly positive if and only if (X n ) n∈N 0 is transient. In [6] there are explicit formulas for the rate of escape w.r.t. more general length functions.
Another characteristic number of random walks is the asymptotic entropy. Denote by π n the distribution of X n . If there is a non-negative constant h such that the limit exists, then h is called the asymptotic entropy. Since we only have a partial composition law for concatenation of two words (if L ⊂ A * ) and since we have no subadditivity and transitivity of the random walk, we can not apply -as in the case of random walks on groups -Kingman's subadditive ergodic theorem to show existence of h. It is easy to see that the entropy equals zero if the random walk is recurrent (see Corollary 7.4). Therefore, we assume from now on transience of (X n ) n∈N 0 .
Moreover, we assume that the random walk on L is suffix-irreducible, that is, for all w ∈ L with P[X m = w] > 0 for some m ∈ N and for all ab ∈ A 2 there is some n ∈ N such that P ∃w 1 ∈ A * : X n = ww 1 ab, ∀k < n : |X k | ≥ |w| X 0 = w > 0.
This assumption excludes degenerate cases and guarantees existence of ℓ; compare with [6, End of Section 2.1]. At this point let us mention that lim n→∞ − 1 n log π n (X n ) is not necessarily deterministic: take two homogeneous trees of different degrees d 1 , d 2 ≥ 3 equipped with simple random walk; identify their root with one single root which becomes o; then the limit depends on the fact in which of the two trees the random walks goes to infinity.

Generating Functions
For w 1 , w 2 ∈ L, z ∈ C, the Green function is defined as the last visit generating function as L(w 1 , w 2 |z) := n≥0 P X n = w 2 , ∀m ∈ {1, . . . , n} : X m = w 1 X 0 = w 1 · z n and the first return generating function as By conditioning on the last visit to w 1 , an important relation between these functions is given by indeed, since G(w, w|z) = 1 − U (w, w|z) −1 it must be that U (w, w|z) < 1 for all 0 < z < R w ; moreover, U (w, w|0) = 0, U (w, w|z) is continuous, strictly increasing and strictly convex for 0 < z < R w , so we must have U (w, w|1) ≤ 1/R w which yields (3.1).
In the following we introduce further generating functions, which also have been used in [6]. Define for a, b, c, d, e ∈ A and real z > 0 We writeL(ab, cde) :=L(ab, cde|1). These generating functions can be computed in two steps: first, one solves the following system of equations: compare with [12] and [6]. The system (3.2) consists of equations of quadratic order, and therefore the functions H(·, ·|z) are algebraic, if the transition probabilities are algebraic. We now get the functionsḠ(ab, cd|z) by solving the following linear system of equations: Finally, we getL We remark that we implicitely took into account the assumption L = A * ; if L ⊂ A * one has to restrict these definitions and systems of equations to the terms which may occur. Moreover, one can compute the Green functions of the form G(o, abc|z) by solving where w 1 , w 2 ∈ L with |w 1 |, |w 2 | ≤ 3 and 1 3 (w 1 ) = 1, if |w 1 | = 3, and 1 3 (w 1 ) = 0 otherwise.
We also define for ab ∈ A 2 : This is the probability of starting at a word wab ∈ L, where w ∈ A * , such that the first step goes to a word of length |wab| + 1 with no further future visits of words of length |wab| or smaller. We define a "length function" on L by For n ≥ 5, the terms L(o, x 1 . . . x n ) can be rewritten as each path from o to x 1 . . . x n is decomposed to the last times when the sets A 3 , A 4 , . . . , A n−1 are visited.

Cones
In this section we introduce the structure of cones in our setting. A path in L is a sequence of words [w 0 , w 1 , . . . , w m ] in L such that P w i−1 [X 1 = w i ] > 0 for all 1 ≤ i ≤ m. For n ∈ N, define L ≥n := {w ∈ L | |w| ≥ n}. For any w ∈ L with |w| ≥ 2, we define the cone rooted at w as By the above made weak symmetry assumption, for w 1 , w 2 ∈ L with |w 1 | = |w 2 |, we have C(w 1 ) = C(w 2 ) whenever there is a positive probability path from w 1 to w 2 in L ≥|w 1 | .
By suffix-irreducibility, for all cd ∈ A 2 , each cone C(wab), where w ∈ A * and ab ∈ A 2 , has a subcone C(wxcd) ⊆ C(w) with a suitable choice of x ∈ A * \ {o}. We say that two cones C(w 1 . . . w m ) and C(y 1 . . . y n ) are isomorphic if C(w m−1 w m ) = C(y n−1 y n ), that is, two isomorphic cones differ only by different prefixes. In particular, there is a natural 1-to-1 correspondence of paths inside C(w 1 . . . w m ) and paths in C(y 1 . . . y n ) where obviously each pair of corresponding paths has the same probability. Since the transition probabilities depend only on the last two letters of the current word, there are only finitely many different cone types up to isomorphisms. We identify the different cone types by two-lettered words ab ∈ A 2 , and write τ (C(w)) = ab for its cone type, where ab are the last two letters of w.
For each isomorphism class of cone types we fix some ab representing its cone type. Let J ⊆ A 2 be the set of different cone types. The boundary ∂C(w) of C(w) is given by all words w 0 ∈ C(w) with |w 0 | = |w|. An important property is the following one: if C(w 1 ) and C(w 2 ) are two isomorphic cones with w 0 ab ∈ ∂C(w 1 ), then there isw 0 ∈ A * such that w 0 ab ∈ ∂C(w 2 ).
Now we make the non-singular covering assumption that each cone C(wa 0 b 0 ), w ∈ A * , a 0 b 0 ∈ A 2 , contains two proper disjoint subcones, that is, we assume that there are subcones of the form C(ww 1 a 1 b 1 ), C(ww 2 a 2 b 2 ) We refer to the remarks at the end of this section if this property does not hold. The next task is to cover (up to a finite complement) any cone C(w) by a finite number of pairwise disjoint subcones C 1 , . . . , C r(w) such that that is, among these subcones every cone type appears. We now show how to construct this covering. Suppose we are given a cone C(wa 0 b 0 ) with w ∈ A * and a 0 b 0 ∈ A 2 . Inside this cone we find subcones of the form C(ww 0 ab) for each ab ∈ A 2 with suitable w 0 ∈ A * \ {o}. Furthermore, we can choose these subcones in a way such that they are not contained in each other, that is, for all these chosen cones of all different types: indeed, since we assume existence of a non-singular covering of C(w) by subcones one can walk from w inside L ≥|w| to words ww 1 a 1 b 1 and Then we have found a subcone of type τ (C(a 1 b 1 )), and we search for other cone types in the subcone C(ww 2 a 2 b 2 ). Obviously, a subcone in C(ww 2 a 2 b 2 ) does not intersect C(ww 1 a 1 b 1 ). Iterating this step leads to subcones in C(w) of all different types which do not intersect each other. After we have found nonintersecting subcones of all types in C(w) we cover this cone by further subcones, which are not intersecting the above chosen subcones, such that the difference of C(w) and the union of subcones is finite. This is, for instance, done by taking all cones rooted at words v ∈ C(w), where v is at the same distance (that is, minimal length of a path)to ∂C(w) as the subcone of maximal distance to w and where v is not contained in any of the above chosen subcones yet. See Figure 1. Let us remark that, for each cone type, we fix such a covering, such that the covering of C(w) does not depend on the choice of the specific root w on the boundary of C(w): fix a covering for C(ab), ab ∈ A 2 ; if w = w 0 a 1 b 1 ∈ L with τ (C(w)) = ab then we fix the covering of C(w) = C(w 0 ab) which is inherits the covering from C(ab). This is well-defined since the covering of a cone depends only on the relative location of its subcones in its interior.
We can also cover L (up to a finite set) by a finite number of non-self-containing subcones, where each cone type appears. To this end, we just apply the algorithm explained above and take cones of the form C(w) with |w| ≥ 2. We denote by C n 0 the covering of L, which contains all types and whose complement is finite. Now we explain how to proceed if every cone contains no two disjoint subcones. This case may, in particular, occur if L is a proper subset of A * . For ab, cd ∈ A 2 , observe that cd ∈ C(ab) if and only if ab ∈ C(cd). This implies that C(w) = {v ∈ L | |v| ≥ |w|} and, in particular, that there is only just one cone type. We can then cover C(w) by the subcone C(w 1 ) for any w 1 ∈ L with |w 1 | = |w| + 1 and p(w, w 1 ) > 0. One can show that in this case the random walk converges almost surely to a deterministic infinite word and that the support of the random walk is a proper subset of A * . In order to see this, assume that the random walk tends with some positive probabilities to some infinite words with prefixes wabc and wdef , where w ∈ A * , a, b, c, d, e, f ∈ A with a = d. Since C(wabc)∩C(wdef ) = ∅ it must be that the random walk enters either C(wabc) or C(wdef ) on its way to infinity due to the assumption of singular covering. That is, the letter a is deterministic, and by induction the infinite limiting word is deterministic We call the random walk expanding if each cone contains two disjoint subcones. The results below depend not on the fact if the random walk is expanding or not. At the end, however, we will see that the non-expanding case leads to zero entropy.

Exit Times
In this section we prove a law of large numbers, which turns out to be the asymptotic entropy in the later section. For this purpose, we define exit times (compare with [6]), for which we derive a law of large numbers. Throughout this section, we use the following notations: afterwards forever, that is, Inductively, if X e k = w and C(w) has a covering (determined only by the type of C(w)) consisting of subcones C r(w) as explained in Section 4, then Observe that X n , n ≥ e k , has the prefix w 0 if X e k = w 0 ab. Define the relative increment between two exit times as follows: Since we have only finitely many different cone types and the subcones of coverings of any cone C are nested at uniformly bounded distance (w.r.t. minimal path lengths) to ∂C, the random variables W k can take only finitely many different values.
Obviously, L(x, y) depends on x only by its last two letters.
Proposition 5.1. The process W k k≥1 is a positiv recurrent Markov chain with transition probabilities We set x 0 := w 0 and inductively: if Since there are only finitely many different values for W k positive recurrence follows due to suffix-irreducibility, which implies irreducibility of the process W k k≥1 .
The random variables W k , k ≥ 1, can take values in Observe that the transition probabilities depend on x only by its last two letters.
Proof. Let y = a 1 b 1 w y a 2 b 2 ∈ W 0 with w y ∈ A * (we omit the special case y = a 1 a 2 b 2 which follows analogously). Then there isā 1b1 ∈ A 2 with L(ā 1b1 , y) > 0 and ξ(a 2 b 2 ) > 0. By construction of our coverings there is some For sake of better identification of the cones, we now switch to a more suitable representation of cones and coverings. We identify the different cone types by numbers I := {1, . . . , r} ⊂ N. If C(w) is a cone of type i, then the covering of C(w) has n j subcones of type j. We denote these subcones by C j i,1 = C j i,1 (w), . . . , C j i,n j = C j i,n j (w) or identify them just by j i,1 , . . . , j i,n j , which correspond to the cones of type j with different locations inside C(w). We will sometimes omit the root w in the notation of the subcones when it will be clear from the context and only the relative positon of a subcone in some given cone will be important. If τ (C(X e k−1 )) = i and X e k ∈ ∂C j i,l (X e k−1 ), then we set i k := j i,l .
At this point we recall the relation between W k and X e k : if X e 0 = w 0 a 0 b 0 and W 1 = That is, there is a natural bijection of trajectories of (W k ) k∈N and (X e k ) k∈N . In particular, the value of W k determines the value of i k uniquely. For a better visualization of the values i k := j i,l , see Figure 2. In other words, the random variables In other words, (j m,n , x) ∈ W if x = a 0 b 0 w 0 ab ∈ W 0 with τ (C(a 0 b 0 )) = m and C(x) is the n-th cone of type j inside C(a 0 b 0 ). Furthermore, define W π := (s, t n ) s, t ∈ I, 1 ≤ n ≤ n(s, t) .
That is, t n corresponds to the n-th cone of type t in a covering of a cone of type s.
The process (i k , W k ) k∈N with state space W is also a positive recurrent Markov chain since the values of i k are uniquely determined by the values of W k and the process (W k ) k∈N is a Markov chain. Moreover, for (i k,l , w k−1 ), (j m,n , w k ) ∈ W, the transition probabilities are given by In particular, the transition matrix of (i k , W k ) k∈N has zero entries. In order to apply the result of [10, Theorem 1.1] for getting the analytic behaviour of the entropy later we have to adapt the Markov chain in order to obtain a transition matrix without zeroes.
The process (i k ) k∈N is, in general, not a Markov chain because it can be seen as a projection of the process (W k ) k∈N .
Define the following projection for (i k,l , w 1 ), (j m,n , w 2 ) ∈ W: Here, j l represents the l-th cone of type j in a cone of type i, namely the cone represented by j i,l . We now define the hidden Markov chain (Y k ) k∈N by In other words, (Y k ) k∈N traces the way to infinity in terms of which subcones are entered successively without distinguishing which of the hit boundary points are the exit time points X e k .

5.2.
Modified Exit Time Process. The aim of this subsection is the construction of a Markov chain related to the exit time process (i k , W k ) k∈N such that the transition matrix has strictly positive entries and the modified process leads under π to the same hidden Markov chain for almost every trajectory.
Consider the two subcones C j i,1 ⊂ C(a 1 b 1 ) and C j k,l ⊂ C(a 2 b 2 ) belonging to coverings of the bigger cones with τ (C(a 1 b 1 )) = i and τ (C(a 2 b 2 )) = k. Assume that y 0 ab ∈ ∂C j k,n .
Since both cones are isomorphic, there is uniqueȳ 0 =ȳ [i,j,ab] 0 ∈ A * such thatȳ 0 ab ∈ ∂C j i,1 ; see Figure 3. In the following we will use this notationȳ 0 =ȳ For i, j ∈ I and ab ∈ A 2 with τ (C(ab)) = j we write #{j s,t | s = i} = (j s,t , xab) ∈ W s ∈ I \ {i}, 1 ≤ t ≤ n(s, j) , which is independent from the specific choice of ab. Let (i k,l , x), (j m,n , y 0 ab) ∈ W. Define the following transition probabilities on W: Observe that the transitions depend on x only by its last two letters. It is easy to see that these transition probabilities define a Markov chain (inherited from the process (i k , W k ) k∈N ): each step from (i k,l , x) to (j m,n , y 0 ab) either behaves according to q(·, ·) (case m = i and n ≥ 2) or steps from (i k,l , x) to (j i,1 , y 0 ab) (when seen as a step of the process (i k , W k ) k∈N )) are split up into different equally likely paths (i k,l , x) to (j m,n , y 0 ab) with m = i or m = i ∧ n = 1; sinceq(·, ·) depends on its first argument only by i (and not by k and l), it follows from q(·, ·) thatq(·, ·) describes also a random walk. Moreover, the corresponding transition matrix has strictly positive entries. By suffix-irreducibility and Proposition 5.1, the matrix Q = q((i k,l , x), (j m,n , y)) is stochastic, and governs a positiv recurrent Markov chain (î k , x k ) k∈N with invariant probability measure ν. The initial distribution of (î 1 , x 1 ) is given byμ 1 , defined aŝ for (i m,n , x) ∈ W. If we equip the process with the invariant probability measure ν as initial distribution we write î (ν) k , x (ν) k k∈N .
Then the process (î k , x k ), (î k+1 , x k+1 ) k∈N is again a positiv recurrent Markov chain with transition matrix Q 2 (arising from Q) and invariant probability measure denoted by ν 2 .
Proof. We prove the claim by induction on n. First, let j, s ∈ I and t (1) = j m with 2 ≤ m ≤ n(s, j), and let a 0 b 0 , ab ∈ A 2 with τ (C(a 0 b 0 )) = s and τ (C(ab)) = j. If C j,m is the m-th cone of type j in the covering of C(a 0 b 0 ) then there is unique x 0 = x [s,j,m,ab] 0 ∈ A * with x 0 ab ∈ ∂C j,m . With this notation we get: Now we turn to the case t (1) = j 1 . Once again, if C j,1 is the first cone of type j in the covering of C(a 0 b 0 ) then there is unique x 0 = x [s,j,1,ab] 0 ∈ A * with x 0 ab ∈ ∂C j,1 . We get: We now perform the induction step where we will use the equations from the initial step as induction assumptions. First, consider the case t (n+1) = j m with m ≥ 2; then for all a 0 b 0 , ab ∈ A 2 with τ (C(a 0 b 0 )) = s (n+1) =: s and τ (C(ab)) = j there is unique x 0 = x [s,j,m,ab] 0 ∈ A * with x 0 ab ∈ ∂C js,m (a 0 b 0 ). Since we have an underlying Markov chain we obtain: Now we turn to the case t (n+1) = j 1 . Once again, if C j,1 is the first cone of type j in the covering of C(a 0 b 0 ) (of type s) then there is unique x 0 = x [s,j,1,ab] 0 ∈ A * with x 0 ab ∈ ∂C j,1 . We get by distinguishing whether t (n+1) = j 1 arises from i n+2 = j s,1 or i n+2 = j k,l with k = s: (1) , t (1) ), . . . , Y n = (s (n) , t (n) ),î n+1 = u p,q , x n+1 = w · q (u p,q , w), (j s,1 , x 0 ab) + (t k,l ,y)∈W: t=j,k =s,[y]=abq (u p,q , w), (j k,l , y) Finally, we obtain: This finishes the proof.
The statement of the lemma can be said in other words: the process governed by Q can be seen as a exit time process, where one has more subcones to enter (namely, the subcones of indices j k,l , k = i, when being currently in a cone of type i), but under the projection π folds the process down to the same hidden Markov chain (Y k ) k∈N , and it does not distinguish ifî k = j i,1 orî k = j m,n , m = i.
Hence, the Markov chains (i k , W k ), (i k+1 , W k+1 ) k∈N and (î k , x k ), (î k+1 , x k+1 ) k∈N lead to the same hidden Markov chain in terms of probability. The important difference is that the transition matrix Q has strictly positive entries, while this must not hold for the transition matrix of the chain (i k , W k ), (i k+1 , W k+1 ) k∈N .

Entropy of the Hidden Markov Chain related to the Exit Time Process.
In this subsection we derive existence of the asymptotic entropy of the hidden Markov chains First, consider the hidden markov chain (Z k ) k∈N : this process is stationary and ergodic since the underlying Markov chain i k∈N is stationary and ergodic. Hence, there is a constant H(Z) ≥ 0 such that for almost every realisation (s 1 , s 2 , . . . ) ∈ W N π of (Z k ) k∈N ; see e.g. Cover and Thomas [4,Theorem 16.8.1]. We now deduce the same property for the process ( Y k ) k∈N .
Let j ∈ I. Recall that the covering of L consists of n 0 subcones. Each of these subcones C (0) i has again a covering consisting of n i subcones of type j. Write N j := n 0 i=1 n i , and we denote by C (1) j,k these different subcones with 1 ≤ k ≤ N j . Furthermore, we write w ∼ yab if w = ycd ∈ ∂C(yab) for y ∈ A * and ab, cd ∈ A 2 , that is, w ∼ yab if w lies on the same boundary of a cone as yab (namely the cone C(yab)).
Moreover, we have for all j ∈ I and w 1 ab, w 2 ab ∈ N j i=1 ∂C (1) j,k that P[X e 1 = w 1 ab] > 0 if and only if P[X e 1 = w 2 ab] > 0. Therefore, there are c, C > 0 such that j,k . Assume now that τ (C(a 1 b 1 )) = j ∈ I. Observe that ∂C(y 0 y 1 a 1 b 1 ) = {y 0 y 1 c 1 d 1 , . . . , y 0 y 1 c κ d κ } implies that C (1) j,k has the form {wc 1 d 1 , . . . , wc κ d κ } for some suitable w ∈ A * . We have: where the values of s 2 , . . . , s k−1 and t (1) , . . . , t (k−1) are determined by the values of W j = y j a j b j . Analogously, Recall that G(o, w) = G(o, o)L(o, w) for all w ∈ L and that ξ(·) can only take finitely many values. Writing X e 1 = x 1 . . . x n and j = τ (C(X e 1 )), we now can conclude as follows: The last equation follows from (5.1) and (5.2). We need those important estimates since the first coordinate of Y 1 describes only the cone type of X e 1 but there may be several cones of the same type j = τ (C(X e 1 )).
lim n→∞ l(X e k ) k = H(Z) almost surely.
Proof. It suffices to comparel(X e k ) with l(X e k ). Assume for a moment that X e k = x 1 . . . x n and that X e k is on the boundary of the cone C. Then, the probability of walking inside C from x 1 . . . x n ∈ ∂C to any x 1 . . . x n−2 ab ∈ ∂C can be bounded from below by some constant ε 0 , because the probabilities depend only on x n−1 x n and ab ∈ A 2 . Therefore, Taking logarithms, dividing by k and letting k tend to infinity yields the claim.
Now we come to an important law of large numbers. For this purpose, define d(x, y) = |y| − |x| for x, y ∈ L with |x| ≤ |y|, where | · | is the natural word length. Denote by ν 0 the invariant probabilty measure of (W k ) k∈N and define Then: Proof. Defineê k := sup m ∈ N |X m | = k . Transience yieldsê k < ∞ almost surely for all k ∈ N. Define the maximal exit times at time n ∈ N as k(n) := max{k ∈ N |ê k ≤ n}, t(n) := max{k ∈ N | e k ≤ n}.
Obviously, k(n) ≥ t(n) and each exit time e k corresponds to exactly oneê l with l ≥ k. First, we rewrite Let ε 1 be the minimal occuring positive single-step transition probability. Since the subcones of coverings of bigger cones are nested at bounded distance we haveê k(n) ≥ e t(n) ≥ e k(n)−D for some suitable D ∈ N. The first quotient on the right hand side of (5.4) tends to zero since ≤ L(o, X e t(n) ) (due to weak symmetry), which in turn yields (n − e t(n) )/n → 0 as n → ∞.
By Corollary 5.7, l(X e t(n) )/t(n) tends to H(Z). On the other hand side,ê k /k tends almost surely to 1/ℓ andê k(n) /n tends to 1 almost surely; see [6,Proposition 2.3]. It remains to prove that the limit lim k→∞ k(n)/t(n) exists. Clearly, for a suitable constants D 1 and D 2 . Thus, it is sufficient to consider Since d(X e i , X e i+1 ) can be computed from W i and W i+1 , we may apply the ergodic theorem for positive recurrent Markov chains on the process (W j , W j+1 ) j∈N which yields almost surely This finishes the proof and gives the proposed formula.

Existence of Entropy
We follow the reasoning of [7] for the proof of existence of the entropy. First, we need the following lemma: Proof. A simple adaption of the proof of [12,Proposition 8.2] shows that G(v, w|z) has radius of convergence R(v, w) > 1. At this point we need suffix-irreducibility. With the help of this fact we are able to prove the lemma in several steps: (1) There is R 0 > 1 such that L(o, abc|R 0 ) < ∞ for all abc ∈ A 3 : this follows from the inequality G(o, w|z) ≥ L(o, w|z). (2) There is R 1 > 1 such thatḠ(ab, cd|R 1 ) < ∞ for all ab, cd ∈ A 2 : this follows from the inequality G(ab, cd|z) ≥Ḡ(ab, cd|z). This finishes the proof.
Lemma 6.2. There are constants D 1 and D 2 > 0 such that for all m, n ∈ N 0 Proof. Denote by C ̺ the circle with radius ̺ in the complex plane centered at 0. A straightforward computation shows for m ∈ N 0 : Let w = w 1 . . . w t ∈ L. An application of Fubini's Theorem yields Set D 1 := D 0 ∨ max{G(o, w|̺)|w ∈ L, |w| ≤ 2}. Since |X n | ≤ n, we obtain by setting The following technical lemma will be used in the proof of the next theorem: Lemma 6.3. Let (A n ) n∈N , (a n ) n∈N , (b n ) n∈N be sequences of strictly positive numbers with A n = a n + b n . Assume that lim n→∞ − 1 n log A n = c ∈ [0, ∞) and that lim n→∞ b n /q n = 0 for all q ∈ (0, 1). Then lim n→∞ − 1 n log a n = c.
Proof. A proof can be found in [7,Lemma 3.5].
Lemma 6.4. For n ∈ N, consider the function f n : L → R defined by otherwise.
Then there are constants d and D such that d ≤ f n (w) ≤ D for all n ∈ N and w ∈ L.
Proof. Let w ∈ L and n ∈ N with p (n) (o, w) > 0. Denote by R the radius of convergence of G(w, w|z). By Inequality (3.1), we get For the upper bound, observe that w ∈ L with p (n) (o, w) > 0 can be reached from o in n steps with a probability of at least ε n 0 , where ε 0 := min{p(w 1 , w 2 ) | w 1 , w 2 ∈ A * , p(w 1 , w 2 ) > 0} > 0 is independent from w. Thus, the sum n 2 m=0 p (m) (o, w) has a value greater or equal to ε n 0 . Hence, f n (x) ≤ − log ε 0 . Now we can finally prove: Theorem 6.5. The asymptotic entropy h of (X n ) n∈N 0 exists and equals h = ℓ · λ −1 · H(Z).
Proof. We can rewrite ℓ · λ −1 · H(Z) as The next aim is to prove lim sup n→∞ − 1 n E log π n (X n ) ≤ h. We now apply Lemma 6.3 by setting A n := m≥0 p (m) (o, X n ), a n := (2) Convergence in probability: (3) Convergence in L 1 : Proof. The proofs are completely analogous to the proofs in [7, Corollary 3.9, Lemma 3.10], where [7, Lemma 3.10] holds also in the case h = 0.
Corollary 6.7. The entropy is the rate of escape with respect to the Greenian distance, that is, Proof. This follows from the simple fact G(o, X n |1) = G(o, o|1)L(o, X n |1) and Proposition 5.8.

Calculation of the Entropy
In order to compute h = ℓ · λ −1 · H(Z) we have to calculate the three factors: while there is a formula for ℓ (given in [6,Theorem 2.4]) and there is also a formula for λ (given in In general, the computation of H(Z) is a hard task. But there is a simple way in order to calculate H(Z) numerically, which is due to the inequalities for all n ∈ N; see [4, Theorem 4.5.1]. In particular, it is even shown that Hence, one can calculate H(Z) numerically up to an arbitrarily small error. Furthermore: Corollary 7.1. If the random walk is expanding, then h > 0. Otherwise, h = 0.
Proof. In the expanding case, the random walk has at least two possibilities for entering a subcone decsribed by X e 3 for every given value of X e 2 . Thus, 2 ) , Z 1 > 0, which yields h > 0 due to (7.1). On the other hand side, if the random walk on L is not expanding, then each cone has a covering consisting of of only one single cone. Then the projections Z n become deterministic and this implies We call ab ∈ A 2 unambiguous if ∂C(ab) = {ab}. In other words, whenever the random walk enters a subcone of type C(wab), w ∈ A * , it must enter it through its single boundary point wab. This allows us to "cut" the random walk into pieces and to obtain another formula for the entropy H(Z). For n ∈ N, x 2 , . . . , x n ∈ W 0 and ab ∈ A 2 define w(ab, x 2 , . . . , x n , x) := P W 2 = x 2 , . . . , W n = x, [W n ] = ab W 1 = ab , w(ab, x 2 , . . . , x n ) := y 2 ,...,yn∈W 0 : where ∼ is the relation introduced in the proof of Proposition 5.6. In particular,ỹ(ab, for almost every realisation (s 1 , s 2 , . . . ) ∈ W N π . Observe that τ (W n+1 ) = ab is equivalent to Y n = (t n , α tn,m ) for some cone type t n ∈ I, where α denotes the cone type of C(ab) and 1 ≤ m ≤ n(t n , α). For any such trajectory, we define For any realisation (s 1 , s 2 , . . . ) ∈ W N π and n ∈ N, denote by d(n) the maximal index k with N k ≤ n. Since [W N k +1 ] = ab for all k ∈ N we can use the strong Markov property as follows when N j < n: Therefore, we can rewrite the following probability: Obviously, d(n)/n tends almost surely to ν W (ab). Hence, if we consider only the subsequence where n equals one of the N k 's we obtain the following almost sure convergence: This proves the claim.
Making ε arbirtraily small yields the proposed claim.
Finally, we remark that the entropy is zero for recurrent random walks: Corollary 7.4. If (X n ) n∈N 0 is recurrent then h = 0.
Proof. Clearly, − 1 n E log π n (X n ) ≥ 0. Assume now that lim sup n→∞ − 1 n E log π n (X n ) = c > 0. Then there is a deterministic sequence (n k ) k∈N such that, for any small ε 1 > 0, for all sufficiently large k. Denote by p 0 the minimal occuring positive single-step transition probability. Then − 1 n k log π n k (X n k ) ≤ − log p 0 . Moreover, choose N ∈ N with 1/N < c−ε 1 . Then there is some δ > 0 with P − 1 n k log π n k (X n k ) ≥ 1 N ≥ δ ∀k ∈ N large enough.
To see this, assume that δ = δ k depends on k with lim inf k→∞ δ k = 0 which leads to a contradiction to (7.2) since If δ k tends to zero then we get a contradiction to the choice of N .
Choose now ε > 0 arbitrarily small with ε < δ. In the recurrent case we have ℓ = 0. Then there is some index K ∈ N such that for all k ≥ K: δ − ε ≤ P − log π n k (X n k ) ≥ n k /N, |X n | ≤ εn k ≤ e −n k /N · |A| εn k which yields the inequality But this gives a contradiction if we make ε sufficiently small since the right hand side tends to zero, but the left hand side to 1 N . Thus, lim sup n→∞ − 1 n E log π n (X n ) = 0, yielding h = 0.

Analyticity of Entropy
The random walk on L depends on finitely many parameters which are described by the transition probabilities p(w 1 , w 2 ), w 1 , w 2 ∈ A * with |w 1 | ≤ 2 and |w 2 | ≤ 3. That is, each random walk on L can be defined via a vector p ∈ R |B 2 ×B 3 | , where B i := ∪ i n=1 A n ∪ {o}. The support of p is the set of indices in B 2 × B 3 corresponding to non-zero entries of p. Fix now any subset B ⊆ B 2 × B 3 , which allows at least one well-defined random walk on A * , and consider in the following only vectors p with support B, which give rise to a well-defined random walk on A * . We ask whether the entropy mapping p → h = h p varies real-analytically. The crucial point will be the following lemma: Lemma 8.1. The transition probabilities q(w 1 , w 2 ), w 1 , w 2 ∈ W 0 , vary real-analytically w.r.t. p.
Hence, p lies in the interior of the domain of convergence of H(ab, c|1) if seen as a multivariate power series in terms of p. This yields real-analyticity of H(ab, c|1) in p. The same holds for ξ(ab) andL(ab, cde), which is proven completely analogously sinceL(ab, cde|z) has also radius of convergence bigger than 1, see proof of Lemma 6.1.
Now we can prove: