On uniform convergence in ergodic theorems for a class of skew product transformations

Consider a class of skew product transformations consisting of an ergodic or a periodic transformation on a probability space (M, B, m) in the base and a semigroup of transformations on another probability space (W,F,P) in the fibre. Under suitable mixing conditions for the fibre transformation, we show that the properties ergodicity, weakly mixing, and strongly mixing are passed on from the base transformation to the skew product (with respect to the product measure). We derive ergodic theorems with respect to the skew product on the product space. The main aim of this paper is to establish uniform convergence with respect to the base variable for the series of ergodic averages of a function F on the product of the two probability spaces along the orbits of such a skew product. Assuming a certain growth condition for the coupling function, a strong mixing condition on the fibre transformation, and continuity andintegrability conditions for F, we prove uniform convergence in the base and L^p(P)-convergence in the fibre. Under an equicontinuity assumption on F we further show P-almost sure convergence in the fibre. Our work has an application in information theory: It implies convergence of the averages of functions on random fields restricted to parts of stair climbing patterns defined by a direction.


Introduction
The approximation of a line by a planar lattice yields a stair climbing pattern. Consider the averages of a function of a random field along a finite window moving up the stair climbing pattern. Under which conditions does this sequence converge, and what are explicit formulae for the limit? More formally, let L λ,t (z) := (z, [λz + t]) (z ∈ Z) be a lattice approximation of the line with slope λ and y-intercept t, and let m ∈ N be a fixed window size. Let P be a Z 2 -indexed random field with values in a set Υ, i.e., a stationary probability measure on Ω := Υ Z 2 , and let F be the canonical σ-algebra. Consider the averages 1 n n−1 i=0 f ω(L λ,t (i, ..., I + m − 1)) (n ∈ N) of a function f ∈ L 1 (Ω, F , P ). What can we say about P -almost sure or L 1 (P )-convergence of this sequence? Averages of this type are similar to the ones that occur in the context of directional Shannon-MacMillan theorems for lattice random fields (cf. [6]).
The situation described above can be represented as a special case of a more general set up involving skew product transformations. In the independent component, or base, we have a measurepreserving transformation τ on a probability space (M, B, µ). In the dependent component, or fibre, we have a mixing semigroup of measure-preserving transformations (θ k ) k∈K on a probability space (Ω, F , P ), which is linked to the base by a K-valued B-measurable function κ on M. The class of skew products considered in this paper is given by This paper deals with the following two questions: (A) Is S ergodic or mixing with respect to the product measure? (B) Do the ergodic averages along the skew product converge uniformly with respect to the base? Under suitable assumptions, we will answer both questions positively.
Ergodicity, and other mixing properties of various classes of skew products have been studied by a number of authors, but the above situation does not fit into any of the settings covered by existing literature. Kakutani [15] introduced a skew product with a Bernoulli-shift in the base and an ergodic transformation in the fibre was introduced by . He showed that the skew product is ergodic if and only if the transformation in the fibre is ergodic. Other mixing properties were investigated, e.g., by Meilijson [18], den Hollander and Keane [8], and Georgii [12]. Adler and Shields (cf. [1] and [2]) considered a translation on the cirle for the fibre. Anzai [3] introduced skew products of two translations on the circle, and derived a criterion for ergodicity. Furstenberg [10] studied unique ergodicity. Zhang [23] investigated this for a translation on a torus in the fibre. A torus translations in the base can also be combined with the translation on R by the value of a real function of the argument in the base. Skew products of this type are called real extensions of torus translations, and they were explored in Oren [20], Hellekalek and Larcher [13], [14], and Pask [21].
Our answer to question (A) is summarized in Theorem 2.2. We prove that, under suitable mixing conditions for the transformations in the fibre, the properties ergodic, weakly mixing, and strongly mixing, are passed on from the transformation in the base to the skew product. As an explicit example we study the case when P is a random field and (θ k ) k∈K is a group of shift transformations. In this case, the conditions on the fibre transformation can be insured by assuming tail-triviality for P and a growth condition for ergodic sums of κ along τ (cf. Corollary 2.3).
Our answer to question (B) is given in Theorem 4.9. The proof combines two different approaches. The first approach explores uniform convergence theorems in the spirit of Weyl's classical result for the rotation on the circle. As a little addition to the theorems of Weyl and Oxtoby we show that the ergodic averages of a continuous and uniquely ergodic transformation on a real interval convergence uniformly for the class of Riemann-intergrable functions (cf. Corollary 4.4). The second approach are techniques developed for ergodic theorems along subsequences. We extend Blum and Hanson's theorem to the d-parameter case replacing the strict monotonicity condition on the sequence by a growth condition on the coupling function κ, uniformly in t. Combining the two approaches we obtain uniform convergence in the base and L p -convergence in the fibre for functions that are continuous with respect to the base and fulfill an integrability condition in the fibre (cf. Theorem 4.9). We further derive a variation of this theorem for the class of Riemannintegrable functions on a real interval (cf. Corollary 4.10). We further derive a result (cf. Theorem 4.11) about uniform convergence in the base and P -almost sure convergence in the fibre, provided the iterates of the function fulfill an equicontinuity condition.
We conclude the paper by returning to our initial questions about the asymptotics of (1). Let T := [0, 1) be the circle equipped with the Borel σ-algebra B and the Lebesgue measure µ. For λ ∈ R define a rotation on T by τ λ (t) := t + λ mod 1. For x ∈ R let [x] be the integer part of x. Define a skew product on the product space T × Ω by We will see that the iterates of S follow the stair climbing pattern L λ,t . This allows to rewrite the sequence (1) as an average of f along the orbit of S. The convergence of this sequence, for all starting levels t of the stair climbing pattern, is a consequence of the uniform ergodic theorems derived in in Section 3 (cf. Corollary 4.12).
Ouline of the paper: In the first section we define mixing properties of semigroups of transformations along sequences and we introduce a the class of skew products considered in this paper. In Theorem 2.2, we give the result on ergodicity and mixing properties of these skew products. Finally, we have a closer look on the case when the fibre transformation is a shift operator for a random field. In Section 3 we discuss ergodic theorems for skew products with ergodic base transformation (cf. Corollary 3.1) and with periodic base transformations (cf. Corollary 3.3). We illustrate the results with two examples related to the sequence (1). In the last section we focus on the main aim of this paper, the uniform convergence with respect to the base. Depending, among other things, on the regularity of the function with respect to the base variable, we obtain different kinds of convergence in the fibre. Theorem 4.9 states L p (P )-convergence provided the function is continuous with respect to the base variable. Corollary 4.10 is a version of this for Riemannintegrable functions. Theorem 4.11 states P -almost sure convergence, provided the functions fulfill a certain equicontinuity condition. Corollary 4.12 brings us back to the original motivation for this paper. It states the convergence of the sequences (1).

Mixing properties of a class of skew products
Let (Ω, F , P ) be a probability space, and let (θ k ) k∈N d 0 be a d-parameter semigroup of measurepreserving transformations on (Ω, F , P ), i.e., each of the transformations preserves the measure P, and θ 0 = Id, and The following example will be used frequently in our settings. Let σ 1 and σ 2 be two commuting measure-preserving transformations on (Ω, F , P ). Then defines a two-parameter semigroup (θ k ) k∈N 2 0 of measure-preserving transformations on (Ω, F , P ). If σ 1 and σ 2 are invertible it extends to a two-parameter group (θ k ) k∈Z 2 . The constructions extends to d-parameters in a obvious way.
Let τ be a measure-preserving transformation of a probability space (M, B, µ), and assume that κ is a B-measurable function on M with values in K. Then defines a skew product on the product space Ω := M × Ω. In particular, choosing κ ≡ k 0 for a constant k 0 ∈ K yields the uncoupled product of τ and θ k0 . Obviously, S is measurable with respect to the product σ-algebra σ-algebra F := B ⊗ F, and it preserves the product measure P := µ ⊗ P. It is easy to see that for all n, m ∈ N 0 , for all t ∈ M, and for all ω ∈ Ω, Let us know study under which conditions the skew product is ergodic. Furthermore, as suggested by J. Aaronson, we broaden the question to other mixing properties. Answers to these questions will be given in the next lemma. Note that, by a simple projection argument, the ergodicity of τ is necessary for the ergodicity of S. As we know in the case of an uncoupled product, the ergodicity of both transformations does not garanty the ergodicity of the product. However, it can be shown that the product is ergodic whenever one of the transformations is ergodic and the other one is weakly mixing (cf. [17]). A major ingredient for S for our lemma are assumptions that bring into play the function κ. We make use of two conditions: (C1) (θ k ) k∈K is weakly mixing along the sequence (κ n (t)) n∈N for µ-almost all t ∈ M.
(C2) (θ k ) k∈K is strongly mixing and (κ n (t)) n∈N goes to infinity for µ-almost all t ∈ M.
Note that (C2) implies (C1). Condition (C2) can be easily verified for lattice approximations of a line, as discussed in the context of Corollary 4.12. We further need the notion of weakly mixing along a sequence. This has been introduced for transformations by N. Friedman (cf. [9]), and we extend it to d-parameter (semi-)groups.
Definition 2.1. Let (k n ) n∈N be a K-valued sequence. A (semi-)group (θ k ) k∈K of measure-preserving transformations on (Ω, F , P ), is called weakly mixing along (k n ) n∈N with respect to P, if Lemma 2.2. (i) Assume condition (C1). If τ is ergodic w.r.t. µ then S is ergodic w.r.t. P .
Proof. We give the proof of the first statement here; the remaining proofs are conducted in a similar fashion. Assume condition (C1) and let τ be ergodic with respect to µ. To prove the ergodicity of S we will show that for all bounded F -measurable functions F and G It is sufficient to show this for functions which are products of functions on the factors, i.e., F (t, ω) = f (t)Φ(ω) and G(t, ω) = g(t)Ψ(ω), where f and g are bounded B-measurable functions on M and Φ and Ψ are bounded F -measurable functions on Ω. (The general case follows by approximation.) For these functions we have and we obtain This goes to 0, because the averages in the first expression convergence to 0 for µ-almost all t by condition (C1) and the second expression converges to 0 because of the ergodicity of τ.
This section concludes with a closer look at the case of shift transformations on a discrete random field with values in a set Υ. Let Ω := Υ Z d . For any J ⊆ Z d let F J denote the σ-algebra generated by all projections ω → ω(j) with j ∈ J, and let F := F Z d . Denote the coordinates of an elements in Z d by upper indices, and let · l be its maximum norm. Consider the shift In this situation we have the following Let v 1 and v 2 be linear independent vectors in Z d . Assume that P is tail-trivial and that the sequence κ n (t) n∈N goes to infinity for µ-almost all t ∈ M. Then, when τ is ergodic, weakly mixing or strongly mixing with respect to µ, S is ergodic, weakly mixing or strongly mixing with repect to P , respectively.
Proof. We are going to show condition (C2). Define the boxes V n = v ∈ Z d v ≤ n (n ∈ N), and let B ∈ F J , for some finite subset J of Z d . Then there is an m ∈ N such that J ⊆ V m . Setting m(n) := κ (1) By the assumptions on v 1 , v 2 and κ, m(n) goes to infinity. By Proposition 7.9 in [11], tailtriviality is equivalent to short-range correlations, i.e., sup C∈F Z d \Vn P (A ∩ C) − P (A)P (C) converges to 0 as n goes to infinity.

Ergodic theorems with skew products
Applying Birkhoff's ergodic theorems to the skew product S yields, for any F ∈ L 1 (Ω, F , P ), P -almost surely and in L 1 (P ), where J is the σ-algebra of all S-invariant sets in F . We study this limit more closely for two different cases: when the transformation τ is ergodic and when it is periodic. In the ergodic case, combining (8) and Lemma 2.2 immediately yields the following ergodic theorem for the skew product.
Corollary 3.1. Assume that τ is ergodic with respect to µ and that the condition (C1) is fulfilled. Then for any function F ∈ L 1 (Ω, F , P ), P -almost surely and in L 1 (P ).
Now consider the case that τ periodic. We calculate the iterates of the skew product and derive an ergodic theorem with an explicit expression for the limit.
Lemma 3.2. Assume that τ is periodic with q ∈ N. Then for all j ∈ Z and all ν ∈ {0, 1, ..., q − 1}, Proof. (i) follows from the definition of κ using the periodicity of τ. (ii) is an immediate consequence of (i) and the semigroup property of θ. (iii) follows from (5), the periodicity of τ and because of (ii).
for µ-almost all t ∈ M, and for P -almost all ω ∈ Ω and in L 1 (P ). If θ κq(t) is ergodic with respect to P for µ-almost all t ∈ M, then the limit simplifies to the constant 1/q Proof. Any n ∈ N can be represented as n = mq + ν, with m ∈ N and ν ∈ {0, 1, ..., q − 1}, and we may break down the ergodic averages to Since the first factor converges to 1, and the second addend within the brackets converges to 0, our question reduces to the study of ergodic limits along the subsequence (mq) m∈N . They take the form The last equality can be seen by applying Lemma 3.2. For µ-almost all t ∈ M, the function f is integrable, and applying Birkhoff's ergodic theorem yields P -almost surely and in L 1 (P ) This implies the first statement of the Corollary. In the ergodic case J t is trivial, and, using the invariance of P under θ, the last expression reduces to E F τ ν (t), · ].
We end this section by illustrating the results by two special cases relevant to the skew product tracing a stair climbing pattern introduced in (3). (i) Let λ be irrational. Choose (θ k ) k∈K and κ fulfilling condition (C1). By Lemma 2.2, S is ergodic.
(ii) Let λ be rational. There is a unique representation λ = p/q, where p ∈ Z, q ∈ N, p and q have no common divisor. τ λ is periodic with period q. Furthermore, τ λ respects the parti- q , e ν q ). The limit in Corollary 3.3 is of the form 1/q q−1 ν=0 E F t + νλ mod 1, · J t . If P is ergodic with respect to θ κq(t) then limit simplifies to 1/q q−1 ν=0 E F t + νλ mod 1, · .
(i) If τ is ergodic then, for any function F ∈ L 1 (Ω, F , P ), P -almost surely and in L 1 (P ).

Uniform convergence
This section addresses the question of sure convergence with respect to the first parameter. In addition to the assumptions at the beginning of Section 2 we suppose that M is a compact separable metric space endowed with metric d, and B is the Borel σ-algebra on M for the topology induced by d. Recall that the convergence of ergodic averages need not be true everywhere, even if we are in a compact topological space and both the transformation and the function are continuous. Which conditions guarantee sure convergence in the first parameter? We will be asking a little more than this, namely about uniform convergence in t. We are interested in results of the type uniformly in t ∈ M and in L 1 (P ).
We further investigate when (9) takes place P -almost surely, i.e., for P -almost all ω ∈ Ω, Again, we consider two different cases: when τ is ergodic and when it is periodic. In the second case we proceed as in the proof of Corollary 3.3 and obtain the following uniform version.
for all t ∈ M. Then, for P -almost all ω and in L 1 (P ), where J t denotes the σ-algebra of θ κq(t) -invariant sets in F . If P is ergodic with respect to θ κq(t) , for all t ∈ M, the limit equals 1/q q−1 ν=0 E F τ ν (t), · .
The ergodic case is more delicate. We begin with careful investigations of ergodic theorems on the single spaces, which will later be combined to derive a result on the product space. In the base we are dealing with a transformation on a compact and metrizable space. We now recall and refine some of the existing results about uniform convergence in this situation. Motivated by applications in information theory. we put particular emphasis on extending uniform convergence results to the class of Riemann-integrable functions. An example for a function that is Riemann-integrable but not continuous occurs in the proof of a directional Shannon-MacMillan theorem for random fields (cf. [6]).
The classical example for an ergodic theorem that gives a statement about uniform convergence is the one by Weyl. In its simplest form, it says that the averages of a continuous function along the orbit of an irrational translation on the circle converge uniformly to the integral of the function. To prove Weyl's theorem, Krengel (cf. Theorem 2.6 in Paragraph 1.2.3 in [17]) uses an Arzela-Ascoli technique which we will make use of at the end of this section. Under the assumption that τ : M → M is continuous and that f is a function on M, such that the functions F n := 1/n n−1 i=1 F • τ i (n ∈ N) are equicontinuous on M, Krengel's theorem states that the convergence in Birkhoff's ergodic theorem is uniform in t. Together with the following Lemma, it yields Weyl's theorem. Proof. We have to show that for every ε > 0 there is a δ > 0 such that for all n ∈ N and all s, t ∈ M with d(s, t) < δ, 1/n n−1 i=0 f (τ i (s)) − f (τ i (t)) < ε. Fix ε > 0. We will show that there is a δ > 0, such that |f (τ i (s)) − f (τ i (t)) < ε for all d(s, t) < δ. Since M is compact, f must be uniformly continuous, i.e., there is a δ > 0 such that for all x, y ∈ M with d(x, y) < δ we have |f (x) − f (y)| < ε. By assumption, d(τ (s), τ (t)) ≤ c d(s, t) for all s, t ∈ M, and therefore, d(τ i (s), τ i (t)) ≤ c i d(s, t) ≤ d(s, t) for all s, t ∈ M, and for all i ∈ N.
We shall ask whether we could replace the assumption of continuity of the function f in Weyl's theorem by a weaker condition. It is certainly not true for all measurable functions, which can be seen in a simple example: Fix t 0 ∈ T. Its orbit under τ is the set O := {τ n (t 0 )|n ∈ N 0 }. For the function f := 1 O , the ergodic averages converge to 1, for all t ∈ O, but This question of uniform convergence has a connection with unique ergodicity (cf. e.g., Chapter 4.1.e. of [16] or Theorem 6.19 in [22]). A continuous transformation τ of a compact metrizable space is called uniquely ergodic if it has only one invariant Borel measure. It can be shown that this measure must be ergodic, which implies that the ergodic averages of an integrable function converge almost surely to a constant. The Lebesgue measure is the only probability measure on (T, B), which is invariant with respect to rotations of the circle, so it must be uniquely ergodic. Oxtoby extended Weyl's theorem to the situation where τ is a uniquely ergodic transformation on a compact metric space. It states uniform convergence of the ergodic averages of a continuous function. Note that, conversely, uniform convergence does not imply the continuity of the function. (Further conditions for this would be needed, such as topological transitivity of τ or constancy of the limit.)

Below Theorem 2.7 in Chapter 1 of [17], Krengel mentions that Weyl's theorem is sometimes spelled out for to the class of Riemann-integrable functions. Actually, it was proved by de Bruijn and Post [7] that the function is Riemann-integrable if and only if the convergence is uniform.
This also follows from our next proposition. We ask the following question: Considering uniform convergence of the ergodic averages along a continuous transformation on a compact real interval, can we pass automatically from the class of continuous functions to the class of functions which are integrable in the sense of Riemann? Then the convergence holds as well for any function which is integrable in the sense of Riemann.
The proof is carried out using a common sandwich argument (cf. e.g., in Chapter 4.1.e. of [16]). Applying the proposition to the situation of Oxtoby's Theorem yields Now we focus on studying the convergence in the fibre. Fix t ∈ M, and define a function on Ω by f (ω) := F (t, ω). This reduces the ergodic averages of the skew product to 1/n n−1 i=0 f θ κi(t) ω , which we can view as a sort of ergodic average along the subsequence (κ i (t)) i∈N . Recalling a classical result about L p -convergence of ergodic averages along subsequences, there is the following characterization.

Theorem 4.5. (Blum & Hanson)
Let T be a transformation on (Ω, F ). Suppose that T is invertible and that both, T and T −1 preserve P. Then P is strongly mixing with respect to T if and only if for all p, 1 ≤ p < ∞, every strictly increasing sequence (m i ) i∈N of integers, and every function f ∈ L p (Ω, F , P ), The key to the proof of Blum and Hanson's theorem is the following Lemma 4.6. Under the assumptions of Theorem 4.5 and supposing that P is strongly mixing with respect to T we have for all A ∈ F, for every strictly increasing sequence (k i ) i∈N , Our next step is to carry over Blum and Hanson's theorem to the case of a d-parameter group of transformations (θ k ) k∈Z d , at the place of iterates of one transformation T. What we need is a condition on the Z d -valued sequence (k i ) i∈N which replaces the strict monotonicity imposed on (m i ) i∈N . With an eye toward later applications on the product space we generalize the result further by showing that the L 2 -convergence takes place uniformly over a family of functions, indexed by a set I. Recall that · denotes the maximum norm in Z d .
Lemma 4.7. Assume that P is strongly mixing with respect to (θ k ) k∈Z d , and let (k n (t)) n∈N (t ∈ I) be a family of sequences with values in Z d that fulfill, for all m ∈ N, Then for all A ∈ F, Proof. For every A ∈ F and t ∈ I we obtain by simple calculations, Fix ε > 0. Due to the mixing condition there is an m ∈ N such that and By assumption (12) there is a n 0 ∈ N such that 1 Applying the last two inequalities to (13) yields for all n ≥ n 0 sup t∈I 1 n 2 and the assertion of the lemma follows by letting ε to 0.
Theorem 4.8. Assume that P is strongly mixing with respect to (θ k ) k∈Z d . Let (k n (t)) n∈N (t ∈ I) be a family of sequences with values in Z d , for which for all m ∈ N, Then for 1 ≤ p < ∞ and for any f ∈ L p (Ω, F , P ), Proof. As an immediate consequence of the preceeding lemma, (14) is true for p = 2 for any simple function g on (Ω, F ). By a standard argument (cf., e.g., Lemma 4 in [5]), this convergence holds as well in L p (P ), for 1 ≤ p < ∞. Finally, for any function f in L p (P ), decomposition into positive and negative parts, L p (P )-approximation by simple functions, and monotone convergence yields (14).
We will now combine the approaches developed separately for the base and the fibre transformation to derive a result about uniform convergence in the base and L p -convergence in the fibre. A crucial ingredient is a condition that regulates the effects of the coupling sequence (κ n (t)) n∈N . Theorem 4.9. Let τ : M → M be continuous and uniquely ergodic, and suppose that P is strongly mixing with respect to the group of transformations (θ k ) k∈Z d . Let κ : M → Z d be B-measurable such that, for all m ∈ N. Let be 1 ≤ p < ∞. Then for every F -measurable function F on Ω such that sup t∈M | F (t, · ) | is in L p (Ω, F , P ) and F ( · , ω) is continuous on M for P-almost every ω, in L p (P ) and uniformly in t ∈ M.
Proof. We first prove the theorem in the case when F is the indicator of a set of the form U × A, where U is the intersection of finitely many metric balls in M or their complements, and A ∈ F. By (5), the expression 1 n then transforms to It may be replaced by 1 without affecting the asymptotic behavior of (18) as can be seen as follows. We may bound Now, we argue as in the second part of the proof of Lemma 4.7, replacing the sequence (k i ) i∈N by (κ i (t)) i∈N , and using assumption (15) instead of (12). This proves that the difference created by the change (19) converges to 0 uniformly with respect to t.
Since the term in the rectangular brackets in the second addend in (18) equals 1 U (τ i (t))P (A) + 1 U (τ j (t))P (A), the whole expression (18) simplifies to which can be further reduced to 1/n Since τ is uniquely ergodic and µ(∂U ) = 0, Corollary 4.1.14 in [16] tells us that the first factor converges to 0 uniformly in t, which concludes the first part of the proof. To pass from L 2 -convergence to general L p , use again a standard argument (for instance, Lemma 4 in [5]). Now we let F be a general function, satisfying the conditions of the theorem. We need to find for every positive ǫ a sequence of metric balls U i in M and A i ∈ F, with real numbers a i such that, for all t ∈ M , F (t, ·) − I(t, ·) p < ε, where I = n i=1 a i 1 Ui×Ai . It will then follow that For ω ∈ Ω and c > 0, let δ(c, ω) be the modulus of continuity for the function F ( · , ω). Define the sets Then the sequence of functions on Ω, , is bounded by sup t∈M |F | p (t, ω), which is integrable, and converges to 0 for every ω. By the bounded convergence theorem, the integral of F k converges to 0 as k goes to infinity. Choose a k such that F k P (dω) ≤ (ε/2) p . Since M is compact, we may find a finite sequence t 1 , . . . , t r ∈ M such that the balls of radius 1/k around these centers cover M . We also define a sequence of real numbers −k − 1 = s 0 < · · · < s r ′ = k such that the difference between any two successive elements is less than ε/8. Now we define a collection of sets U i,j and A i,j indexed by r × r ′ . We start with U i,j as the ball of radius 1/k around t i , and then remove the intersections, so that the U i,j is the same for all j, and running through 1 ≤ i ≤ r yields a disjoint cover of M . The sets A i,j are defined by Let a i,j = s j . We throw in one additional product set, U 0 = M and A 0 = D k (ε/8) c ∪ M c k with a 0 = 0, and define the simple function I(t, ω) as indicated above. Then for any t ∈ M, We already assumed (in defining k) that the second term is smaller than ε/2. For every t, there is a unique pair i, j such that t ∈ U i,j and ω ∈ A i,j . By construction, I(t, ω) = s j , so the integrand in the first term is bounded by This in turn is bounded by 2 p (ε/4) p < (ε/2) p , since ω is not in A 0 and d(t i , t) < 1/k, completing the proof.
Our next goal is to derive a statement about P -almost sure convergence rather than L 1 (P )convergence in the fibre. Further conditions on F are needed. P -almost sure convergence of ergodic theorems involving weights or subsequences is a very subtle question (cf., e.g. [4]). Choosing a function F which is constant in ω and considering Lemma 4.2 suggests that we need an equicontinuity assumption in t. Note also the additional assumptions that non-empty open sets on M have positive mass under µ.
Proof. We may assume without loss of generality that E[F |J ] = 0. The general case can be reduced to this by subtracting E[F |J ] on both sides and making use of the invariance of E[F |J ] under S. The first step is to construct a countable dense set M 1 ⊂ M and a set N 1 ⊂ Ω with P (N 1 ) = 0 such that Since M is compact, the conditions on F assure that F ∈ L 1 (Ω, F , P ), and therefore by (8) there is a set M 1 ⊂ M with µ(M 1 ) = 1 such that for any t ∈ M 1 there is a set N (t) ⊂ Ω with P (N (t)) = 0 and (21) holds for all ω ∈ Ω \ N (t). M 1 is dense in M because its complement has measure zero with respect to µ and therefore, by assumption, contains no non-empty open subsets.
Since M is separable we can find a countable dense subset C ⊂ M, and because M 1 is dense in M, we can approximate any x ∈ C by a sequence (a j (x)) j∈N with a j (x) ∈ M 1 for all j ∈ N. M 1 := x∈C j∈N a j (x) defines a countable dense subset of M, and N 1 := t∈M1 N (t) defines a subset of Ω, which fulfills (21). This completes the first step.
For the next step, choose s ∈ M and fix ε > 0. By equicontinuity, there is a set N 0 ⊂ Ω with P (N 0 ) = 0 and a δ > 0 such that for all r, t ∈ M with d(r, t) < δ for all n ∈ N and all ω ∈ Ω \ N 0 .
Define N := N 0 ∪ N 1 and fix ω ∈ Ω \ N. Since M 1 is dense in M we can find a t ∈ M 1 with d(s, t) < δ, and by (21) there is an n 1 ∈ N such that 1 n Combining the last two inequalities leads the desired 1 n n−1 i=0 F • S i (s, ω) < ε for all n ≥ n 1 and all ω ∈ Ω \ N.
For uniform convergence w.r.t. the first variable, we use a standard compactness argument. M can be covered by a finite number m of δ-neighborhoods in M , which centers are denoted by s 1 , ..., s m . Applying the reasoning of the last step to each of the s 1 , ..., s m we find n 0 ∈ N such that 1 n for all n ≥ n 0 , k ∈ {1, ..., K}, and ω ∈ Ω \ N.
For an arbitrary s ∈ M there exists k ∈ {1, ..., K} such that d(s, s k ) < δ, and by (22) we obtain for all n ∈ N and all ω ∈ Ω \ N.
Finally, the convergence (20) follows by the last two inequalities.
We conclude the paper with an application of our results to the question that originally motivated this work. Consider the ergodic averages of a function of a random field restricted to the staircase pattern defined by the approximation of a line by a lattice as in (1). We actually the function to be multivariate. In this context, this means that it may depend on a finite number of such steps. What can be said about the asymptotic behavior of the ergodic averages of such a function?
Let P be a two-dimensional random field, that is, a probability measure on Ω = Υ Z 2 invariant w.r.t. the group of shift transformations (ϑ v ) v∈Z 2 (see just above Corollary 2.3 for details). For a set U ⊆ Z 2 let π U (ω) := ω(U ) and P U := P • π −1 U defines a probability distribution on Υ |U| . Use [x] denote the integer part of a real number x, respectively. Recall that L λ,t (z) = (z, [λz + t]) (z ∈ Z) is an approximation of the line with slope λ and y-intercept t by elements of the lattice Z 2 . For z 1 , z 2 ∈ Z, L λ,t (z 1 , ..., z 2 ) are the z 1 th to the z 2 th step. We will use the short form P λ,t,m := P L λ,t (0,...,m−1) . (i) Let λ be rational and represented as λ = p/q for p ∈ Z and q ∈ N with no common divisor. If P is ergodic w.r.t. the shift transformations then ν=0 Υ m f (y) P λ,τ ν (t),m (dy) in L 1 (P ), for P -almost all ω ∈ Ω, and uniformly in t ∈ [0, 1].
(ii) Let λ be irrational. Assume further that f ω(L λ,t (0, ..., m − 1)) is Riemann-integrable with respect to t ∈ [0, 1], for P -almost all ω ∈ Ω. If P is strongly mixing w.r.t. the shift transformations then Proof. Let τ λ be the rotation on the circle defined in Example 3.4, and S λ the skew product defined in (2). Use κ(t) := (1, [t + λ]) and κ n as defined in (5). It is easy to show that κ n (t) = L λ,t (n) for all n ∈ N : For n = 1, it follows immediately from plugging in the definition, and it remains to induce the statement from n to n + 1. It is obvious for the first coordinate. The second coordinate of κ can be written as κ  . This implies that the iterates of the skew product are of the form S n λ (t, ω) = τ n λ (t), θ L λ,t (n) ω (n ∈ N) capturing the lattice approximation of the line. Using the second equality in (5), it also follows that L λ,t (n + u) = L λ,t (n) + L λ,τ n (t) (u) for all n, u ∈ N 0 . We thus get L λ,t (i, ..., i + m − 1) = L λ,t (i)+L λ,τ i (t) (0, ..., m−1). Introducing the function F λ (t, ω) := f ω, L λ,t (0, ..., m−1) we obtain Consider case (i). With the representation λ = p/q as above, τ λ is periodic with q. By Corollary 4.1 and (23), the averages on the left side in (i) converge uniformly in t, for P -almost all ω and in L 1 (P ). Since P is ergodic with respect to θ κq(t) for all t ∈ M, the limit equals 1/q q−1 ν=0 Ω F λ τ ν λ (t), · P (dω) which simplifies to the expression on the right hand side of(i). Consider case (ii). For an irrational λ, τ λ is uniquely ergodic. Since κ n (t) (with · for the maximum norm) is bounded from below by n, the sequence tends to infinity as n goes to infinity. Corollary 2.3 with v 1 = (1, 0) and v 2 = (0, 1) implies the ergodicity of S λ . It remains to verify condition (15). The latter easily follows from κ i (t) − κ j (t) ≥ |i − j|, and since 1/n 2 1 ≤ i, j ≤ n |i − j| ≤ m converges to 0 for all m ∈ N. Now, Corollary 4.10 applied to (23) implies that the averages in (ii) converge uniformly in t, and in L 1 (P ). The limit equals 1 0 Ω F λ (t, · ) P (dω) dt which simplify to the expression on the right hand side of (ii).
Note that P -almost everywhere convergence in (ii) can be derived as well, but requires additional conditions of the form stated in Theorem 4.11. For m = 1 the limit actually simplifies to the integral of f with respect to the marginal distribution of P in the origin. In particular, it is independent of t, and it is the same in (i) and (ii). The simplest interesting case is m = 2. Let