Inverting the Furstenberg correspondence

Given a sequence of subsets A_n of {0,...,n-1}, the Furstenberg correspondence principle provides a shift-invariant measure on Cantor space that encodes combinatorial information about infinitely many of the A_n's. Here it is shown that this process can be inverted, so that for any such measure there are finite sets whose combinatorial properties approximate it arbitarily well. Moreover, we obtain an explicit upper bound on how large n has to be to obtain a sufficiently good approximation. As a consequence of the inversion theorem, we show that every computable invariant measure on Cantor space has a computable generic point. We also present a generalization of the correspondence principle and its inverse to countable discrete amenable groups.


Introduction
For each n, let A n be a subset of {0, . . . , n− 1}. The Furstenberg correspondence principle, described more precisely in Section 2, allows one to assign a shift-invariant measure on Cantor space, 2 N , which encodes combinatorial information about infinitely many of the A n 's. This correspondence lies at the heart of Furstenberg's remarkable ergodic-theoretic proof [10,11] of Szemerédi's theorem, and allows one to use facts about shift-invariant measures on Cantor space to draw conclusions about subsets of {0, . . . , n − 1}, for sufficiently large n.
It is natural to ask exactly which shift-invariant measures on 2 N arise from this correspondence. It is not hard to show that any ergodic measure can be obtained in this way; any generic point of the system reflects all the relevant information about the measure, and the desired finite approximations can be read off from such a point. This fact has been noted by a number of authors, including Bergelson Theorem 3.1 of Section 3 shows that, in fact, every shift-invariant measure on 2 N , ergodic or not, arises in such a way. The proof provides an explicit construction of a sequence of finite combinatorial approximations to any given measure, with, moreover, a uniform upper bound on how large n has to be in order to approximate the measure to a given accuracy. Section 4 considers some consequences for computable measure theory. There is a precise sense, described in [13,14,18,29], in which a dynamical system can be said to be computable; and similarly for a transformation of such a space, and an element of the underlying space. In particular, 2 N equipped with the left shift is computable, a computable element of 2 N is a computable binary sequence, and a computable measure on 2 N is an algorithm which computes (arbitrarily good rational approximations to) the measure of each basis set in the usual topology.
It is by now well known that many common ergodic-theoretic constructions are not computable. For example, V'yugin [27,28] has shown that one cannot generally compute a bound on the rate of convergence of a sequence of ergodic averages A n f , even when f is computable (see also [2,Section 5] and [1]). Similarly, there is a sense in which ergodic decomposition is not computable [17]. The passage from a sequence of sets (A n ) to one of the measures guaranteed to exist by the Furstenberg correspondence principle is certainly not computable, since it is not even continuous in the data. 1 The proof of theorem 3.1 shows, however, that passage in the other direction is fully effective: one can explicitly compute a sequence of combinatorial approximations from the given measure.
This fact has a surprising consequence: every computable shift-invariant measure on 2 N , ergodic or not, has a computable generic point. A number of recent papers [12,13,14] are concerned with identifying conditions under which a computable measure preserving system has such an element. Theorem 4.2 shows that not only is this always the case when the underlying dynamical system is 2 N with the shift, but, moreover, one has explicit rates of convergence that are independent of the measure in question.
The existence of generic points for shift-invariant measures on 2 N was first established by Colebrook [9]. The construction in Section 4 depends on the fact that one can obtain a point of 2 N by piecing together finite specifications, and Sigmund [23] has shown that generic points exist, more generally, for measure-preserving systems satisfying the "specification property." But it does not seem possible to adapt the computability results here to this more general setting; this is discussed in Section 6.
In recent years, the correspondence principle has been more broadly construed as a way of relating combinatorial configurations in a discrete group with measurepreserving systems on which this group acts. In particular, the principle has been generalized to countable discrete amenable groups in [7], [3,Section 4], and, more recently, [4]. It is noted in [4, Section 1] that one can, conversely, pass in the other direction from "space averages" to combinatorial "group averages" in the case where the action of the group of the space is ergodic. Section 5 below again lifts the restriction to ergodic actions, and provides an effective version of the transformation.
I am grateful to Bryna Kra and Henry Towsner for comments and suggestions on an earlier draft; to Manfred Denker and Matthieu Hoyrup for subsequently pointing me to Sigmund's results; and, especially, to an anonymous referee for many helpful comments, corrections, suggestions, and references.

Preliminaries
For each natural number n, it is convenient to identify n with the set {0, . . . , n − 1}, and to identify each subset A of n with the finite binary string of length n whose ith digit is 1 if and only if i is in A. Note that this representation encodes both the set A and the fact that A is to be viewed as a subset of n. If σ is another binary 1 There is, however, always such a measure that is low in the Turing jump of the sequence (An).
In this sense the Furstenberg correspondence principle is analogous to the Bolzano-Weierstrass principle; see [20,21]. sequence, say that σ occurs at position i of A if for every j less than the length of σ, the jth bit of σ agrees with the bit of A at (i + j) mod n; that is, we let σ wrap around to the beginning of A, if necessary, in doing the comparison. Given A and σ, set Let 2 N denote Cantor space, that is, the space of functions from N to the discrete space {0, 1}, under the product topology. If we view elements ω of 2 N as infinite binary sequences, it makes sense to write σ ⊂ ω to denote that the finite sequence σ is an initial segment of ω. The collection of cylinder sets [σ] provides a basis for the topology on 2 N , where [σ] = {ω | σ ⊂ ω} is the set of infinite sequences extending σ. I will use B to denote the Borel sets in this topology. Let T denote the shift-left map on 2 N , defined by setting (T ω)(n) = ω(n + 1) for every n. Notice that for every σ, , the set of infinite binary strings in which σ occurs in the second position. Every finite subset A of n gives rise to a measure µ A on the Borel subsets of Cantor space defined by setting µ A ([σ]) = D A (σ) and applying the Caratheodory extension theorem. By the observations above, we have , which is to say, µ is invariant under the shift. Now let (A n ) be a sequence of sets of natural numbers, with A n ⊆ n for each n. By the compactness of the space of measures on Cantor space in the vague topology, there is a subsequence (µ An i ) i∈N of (µ An ) that converges weakly to a measure µ on Cantor space. In particular, for each σ, the sequence (µ An i ([σ])), which is equal to the sequence (D An i (σ)), converges to µ([σ]). Thus we have: This theorem can be proved more directly by iteratively thinning the sequence (A n ) so that the densities converge for each σ, taking a diagonal subsequence, and then defining µ([σ]) to be the resulting limit.
We can take Theorem 2.1 to be a precise statement of the Furstenberg correspondence principle, though sometimes the phrase is used to refer to one of its consequences. Note that since the size of σ remains fixed as n grows, the limits in question are not changed if we do not take wraparound into account when counting the number of occurrences of σ in A n .

Inverting the correspondence
In this section we complement Theorem 2.1 by showing that, in fact, any shiftinvariant measure µ can be obtained as the result of the construction. We can abbreviate the conclusion of the theorem by saying that A gives a (j, ε)good approximation to µ. The first claim provides an explicit bound on how large A needs to be to provide such an approximation. It would be interesting to know whether this bound can be improved. It is the second claim, however, the provides a natural inverse to Theorem 2.1: if for each j we choose m j large enough to ensure there are (j, 1/j)-good approximations for any n ≥ m j , then for any µ we can build a sequence which contains such approximations between m j to m j+1 . Any measure satisfying the conclusion of Theorem 2.1 then has to coincide with µ.
Proof. Fix j and ε > 0, and let k be an integer much larger than j and 1/ε, to be specified more precisely later on. Then the set {[τ ] | length(τ ) = k} forms a partition of 2 ω , and if i < k − j, Recall that N τ (σ) denotes the number of times that σ occurs in τ . By the Tinvariance of µ, we have In other words, if the τ 's are sufficiently long, the average of the densities of σ in each τ , weighted by µ([τ ]), provide a good approximation to µ([σ]). We now obtain the desired set A by concatenating copies of the τ 's in the right proportion; the fact that k is much larger than j will ensure that the occurrences of σ near the border between copies of the τ 's will have a negligible contribution to the overall density. More precisely, let l be much larger than k and 1/ε, and let τ 0 , τ 1 , . . . , τ 2 k −1 be an enumeration of the sequences of length k.
Let A be the set obtained by concatenating a 0 copies of τ 0 , followed by a 1 copies of τ 1 , and so on. Then A has length kl. Accounting for occurrences of σ near the border between such copies, the total number of occurrences of σ in A is given by Dividing by kl, we have using the previous expression for µ ([σ]). Now we only need to choose k = O(j/ε) large enough to make the second error less than ε/2, and then choose l = O(2 k /ε) = 2 O(j/ε) to make the first error less than ε/2. The length of A is then kl The last claim of the proof is easily obtained, for example, by concatenating sufficiently many copies of A and truncating as necessary.
As is well known, 2 N with the left shift is universal for measure-preserving systems with a distinguished set, in the following sense (see, for example, [25, Example 2.2.6]). Let X = (X, C, ν, U ) be a measure-preserving system, and let E be a C-measurable set. Define a function ϕ from X to 2 N by In other words, the 1's in ϕ(x) correspond to places where the orbit of x under U lands in E. Then for any x ∈ X, T ϕ(x) = ϕ(U x), and ϕ −1 ([1]) = E. Moreover, ϕ −1 , as a function on subsets of 2 N , is a σ-algebra homomorphism from B onto the σ-subalgebra of C generated by E. Define the "push-forward" measure µ on B by setting µ(A) = ν(ϕ −1 (A)) for every A in B. Then µ is a T -invariant measure, which encodes information about the measure ν on intersections of finite shifts of E and their complements: if we let ( . This allows us to generalize the statement of Theorem 3.1: Corollary 3.2. Let X = (X, C, ν, U ) be any measure-preserving system, and let E be any C-measurable set. Then for each j and ε > 0, there exist n and A ⊆ n such that for every σ of length at most j,

Consequences for computable measure theory
We can also consider Theorem 3.1 in computability-theoretic terms. This presupposes some notions from computable analysis and measure theory; I will sketch the necessary background here, and refer the reader to [8,18,29] for details.
Computability theory starts with the notion of a computable function from the natural numbers to natural numbers, or from finite strings of symbols to finite strings of symbols. One then obtains notions of computability with respect to other finitary objects (integers, pairs of numbers, finite graphs, and so on) by fixing encodings of these objects as numbers or strings. Intuitively, a function from a set of finitary objects to another is said to be computable if there is an algorithm, or computer program, that computes it. This notion can be made precise using the Turing machine model of computation and fixing the various encodings, but for practical purposes the intuitive description suffices.
Computable analysis has to take into account the representation of infinitary objects, like the real numbers, which cannot be encoded with a finite amount of data. We define a real number r to be computable if there is a computable function f from N to Q such that for every i, |r − f (i)| < 2 −i . In other words, r is computable if one can compute arbitrarily good rational approximations to it. Notice that the choice of i → 2 −i as a rate of convergence is somewhat arbitrary, and the definition is unchanged if one replaces 2 −i with any computable sequence of rationals that decreases to 0; from any such representation we can obtain any other. Notice that we have defined a real number to be computable if it has a computable representation as a Cauchy sequence of rationals with a fixed rate of convergence; a given computable real will have multiple computable representations.
How shall we define a computable function from R to R? The problem is that the inputs to such a function are no longer finite objects. The standard solution is to say that a function F from R to R is computable if there is an algorithm which, on input i, is allowed to ask for rational approximations of the input x to any desired accuracy and, after finitely many such queries, terminates and returns an approximation of F (x) to within 2 −i . The notion can be made precise in terms of a Turing machine with access to an oracle tape that contains a representation of the input, but, once again, for practical purposes, the intuitive description suffices. This model of computation on the real numbers was originally proposed by Grzegorczyk [15], and is an instance of what is generally referred to as "type 2 computability" today [8,29]. Notice that the algorithm computing F is supposed to act appropriately on any representation of a real number x, whether x is computable or not. One can show that any computable function F from R to R is continuous; roughly speaking, this holds because finite approximations to the value of F (x) depend on only a finite amount of the data representing x.
There is nothing special about the real numbers; the method carries over to any system of elements that can be given the structure of a separable metric space. Suppose (X, d) is such a space and A is a countable dense subset of X. Assuming one has finitary representations of the elements of A such that the distances between these elements are computable, then if one replaces R and Q in the preceding discussion by X and A, respectively, one obtains notions of a "computable element of X" and a "computable function on X." We will not need the full generality of these definitions here. Instead, I will focus on how they play out for 2 N and the space of measures on 2 N . An element ω of 2 N is computable if and only if the function from N to {0, 1} which, on input i, returns the ith digit of ω is computable. Similarly, a computable function T from 2 N to 2 N is given by an algorithm which, for every i, computes the ith bit of T ω after querying finitely many bits of ω. For example, if T is the left shift, then T is easily seen to be computable, because in order to output the ith bit of T ω one need only query the (i + 1)st bit of ω.
A measure µ on 2 N is said to be computable if there is an algorithm which, on input σ, computes (arbitrarily good rational approximations to) µ([σ]). In other words, µ is computable if there is an algorithm which, on input σ and i, computes a rational approximation of µ([σ]) to within 2 −i . More generally, one can take an arbitrary measure µ to be represented by such a function from S × N to Q, where S is the set of finite binary strings. As in the case of the real numbers, it makes sense to talk about algorithms that carry out computations relative to such a representation.
The following theorem provides a sense in which, from a computational point of view, a measure on 2 N is "morally equivalent" to a sequence of (j, ε) good approximations.
Theorem 4.1. There are a computable function m(j, ε) and an algorithm with the following property: given any representation of a measure µ on 2 N , the algorithm computes a sequence (A n ) of subsets of n such that for every n ≥ m(j, ε), (A n ) is an (j, ε)-good approximation to µ. Conversely, there is an algorithm which, given a representation of such a function m and sequence (A n ), computes the measure µ.
Proof. Let m(j, ε) be as in the statement of Theorem 3.1. The proof of Theorem 3.1 provides the requisite algorithm, that is, for each n, an explicit description of how to obtain A ⊆ n from finitely many values of µ on cylinder sets. Conversely, given m and (A n ), to compute µ([σ]) to within ε, let j = length(σ), choose n = m(j, ε), and compute D An (σ).
If µ is a shift-invariant measure on 2 N , a point ω of 2 N is generic if for every finite binary sequence σ, In other words, for every σ, the limiting frequency of occurrences of σ in ω is µ([σ]) (see, for example, [17]). The following theorem shows that given a shift-invariant measure µ on 2 N , one can compute a generic point, such that the rate of convergence of the limit above is moreover computable (and independent of µ).

Theorem 4.2.
There is a computable function m(j, ε) with the following property. Given a representation of a shift invariant measure µ on 2 N , one can compute a point ω that is generic for µ, with the additional property that for every σ of length j, every ε > 0, and every n ≥ m(j, ε), |µ([σ]) − 1 n i<n 1 [σ] (T i ω)| < ε. Proof. Given µ, for each j let A j provide a (j, 2 −j )-good approximation to µ with length bounded as in Theorem 3.1. The idea is to build ω by concatenating copies of A 1 , then copies of A 2 , then copies of A 3 , and so on, choosing enough copies at each stage to ensure that the transitions are smooth. Specifically, construct ω = τ 0 τ 1 τ 2 . . . in stages, as follows. First, define τ 0 to be the empty sequence. Now, assuming τ 0 , . . . , τ l are defined, set m l = length(τ 0 τ 1 . . . τ l ), and let k = m + length(A l+2 ). Let τ l+1 be the concatenation of enough copies of A l+1 so that k/length(τ l+1 ) < 2 −(l+1) . Then a routine calculation shows that for every σ of length at most l + 1 and n ≥ m l+1 , |µ([σ]) − 1 n i<n 1 [σ] (T i ω)| < 2 −l . Clearly ω can be computed from µ, and a bound m(j, ε) on m max(j,⌈log 2 (ε −1 )⌉+1) can be computed outright, independent of µ.

An extension to amenable groups
In recent years the correspondence principle has typically been construed more abstractly as a way of relating combinatorial configurations in a discrete group with measure-preserving systems on which this group acts. The principle has been generalized to countable discrete amenable groups in [7], [3,Section 4], and even more broadly in [4]. (See also [26].) The conventional way of passing in the other direction, from "space averages" to "group averages," relies on the pointwise ergodic theorem and works only for ergodic measures. In this section, we provide an effective proof that once again avoids the assumption of ergodicity.
A countable discrete group Γ is said to be amenable if for every finite K ⊂ Γ and ε > 0 there is a finite F ⊂ Γ such that |F ∆ KF | < ε · |F |. Given such a Γ, we can fix a sequence F 0 ⊆ F 1 ⊆ F 2 ⊆ . . . ⊆ Γ such that i F i = Γ and for every finite set K there is an i such that |F j ∆ KF j | < ε · |F j | for every j ≥ i. Such a sequence is called a Følner sequence.
Here the natural analogue to 2 N is 2 Γ under the product topology. For each γ ∈ Γ, γ gives rise to the action T γ on 2 Γ defined by (T γ ω)(α) = ω(γα). A measure µ on 2 Γ is said to be Γ-invariant if T γ preserves µ for each γ. On the natural numbers, ({0, . . . , n − 1}) n forms a Følner sequence, and it is natural to associate each element of that sequence with the corresponding cyclic subgroup. In general, however, there is no way to associate a group to each element F n of a Følner sequence, nor a way to paste copies of such groups together. As a result, we need a more general framework.
Fix Γ. Given a finite subset F of Γ and a set X, we define a partial action of F on X to consist of a partial function x → γx for each γ in F , satisfying 1x = x for every x, and γ(γ ′ x) = (γγ ′ )x whenever γ, γ ′ , and γγ ′ are all in F , and both sides of the equation are defined. Say that the domain of F with respect to this partial action is the intersection of the domains of the γ's, as γ ranges over the elements of F . In other words, an element i ∈ X is in the domain of the partial action if γi is defined for each i in F .
A pattern, σ, is now a map from some finite subset of Γ to {0, 1}. As above, the standard topology on 2 Γ is generated by the cylinder sets [σ], where [σ] = {ω | ω(i) = σ(i) for every i ∈ dom(σ)}. Fix a finite subset F of Γ and an action of F on some finite set X. If A is a subset of X and i is an element of X, say that σ occurs at position i in A if and only if for every α in the domain of σ, σ(α) = 1 if and only if αi is defined and in A. As in Section 2, define N A (σ) = |{i ∈ X | σ occurs at position i in A}|, and where the set X and the partial action on X are left implicit in the notation.
The following theorem provides one formulation of the correspondence principle for amenable groups.
Theorem 5.1. Let Γ be a countable discrete amenable group, with Følner sequence (F n ). Let (X n ) be a sequence of sets, where each X n equipped with a partial action of F n such that lim n |dom(F n )|/|X n | = 1. Then for any sequence of sets (A n ), where A n ⊆ X n for each n, there are a Γ-invariant measure µ on 2 Γ and a subsequence (A ni ) of (A n ) with the property that for every pattern σ, µ([σ]) = lim i→∞ D An i (σ).
Taking Γ = Z and F n = {−(n − 1), . . . , n − 1} for each n yields a version of Theorem 2.1 with Z in place of N. In the formulation in [3,Section 4], for example, the sets X n are taken to be subsets of Γ itself.
Proof. As in the proof of Theorem 2.1, we can iteratively thin the sequence (A n ) and diagonalize so that the limit in question exists for each σ, and then define µ([σ]) accordingly. We only need to show that µ is additive and Γ invariant, at which point we can apply the Caratheodory extension theorem.
To see that µ is additive, let σ be any pattern, α an element of Γ that is not in the domain of σ, and let σ 0 and σ 1 be the patterns extending σ with value 0 and 1, respectively, at γ. It suffices to show that µ([σ]) = µ([σ 0 ]) + µ([σ 1 ]). But since (F n ) is a Følner sequence, dom(σ) ∪ {γ} ⊆ F n for sufficiently large n, and the desired conclusion follows from the fact that lim n |dom(F n )|/|X n | = 1. Similarly, Γ-invariance also follows from the fact that (F n ) is a Følner sequence with this last property.
We have the following inverse: Theorem 5.2. Let Γ be a countable discrete amenable group with Følner sequence (F n ), and let µ be any Γ-invariant measure on 2 Γ . Then for each j and ε > 0, there exist an n, a finite set X, a partial action of F n on X, and A ⊆ X such that for every σ with domain Proof. The proof is similar to that of Theorem 3.1, mutatis mutandis. Given j, we can choose k large enough to make |F j | · |F j F k ∆ F k |/|F k | arbitrarily small. Then for any σ with domain F j , if we let τ range over patterns with domain F k , we have Let F j as act partially on F k by left multiplication, and view τ as representing a subset of F k . Then as long as i is in the domain of F j , is equal either to [τ ] or the empty set, depending on whether σ occurs at position i in τ . But i fails to be in the domain of F j only when γi ∈ F k for some γ in F j , and so the set of i that are not in the domain of F j has cardinality at most |F j | · |F j F k ∆ F k |. Thus we can continue the calculation above, Now proceed as in the proof of Theorem 3.1. Let X be a disjoint union of copies of F k , where F j acts on F k by left multiplication, insofar as the results of the multiplication land in F k . Let A be a disjoint union of copies of the various τ 's living on the various F k 's, where the fraction of occurrences of a given τ approximates µ([τ ]), and |F j | · |F j F k ∆ F k |)/|F k | is sufficiently small to preserve the quality of the approximation.
As in the proof of Corollary 3.2, the universality of 2 Γ means that the result can be pulled back to arbitrary Γ-invariant spaces. Corollary 5.3. Let Γ be a countable discrete amenable group with Følner sequence (F n ). Let X = (X, C, ν, Γ) be a measure-preserving system, and let E be any Cmeasurable set. Then for each j and ε > 0, there exists a finite set X, a partial action of F j on X, and A ⊆ X such that for every pattern σ with domain F j ,

Final comments
The proof of Theorem 3.1 relies on the fact that one can construct a point of 2 N by concatenating finite specifications of its orbit behavior. This is an instance of a more general property that some dynamical systems enjoy, known as the specification property [23]. (A slightly stronger version is presented in [19].) Let (X, T ) be a dynamical system, where X is a compact space and T is a continuous map from X to itself. Let µ be a T -invariant probability measure defined on the Borel subsets of X. A point x in X is said to be generic for µ if for every continuous function f from X to R, lim n→∞ 1 n i<n f (T i x) = f dµ. The results of Sigmund [23] show that if (X, T ) satisfies the specification property, then are generic points for any T -invariant measure, whether it is ergodic or not. The existence of generic points for any shift-invariant measure on 2 N is a special case of this result.
The notions of computability discussed in Section 4 can be extended to more general compact metric spaces; see, for example, [13,14,18,29]. In analogy to Theorem 4.2, one might expect that systems satisfying a computable version of the specification property will always have computable generic points. But the methods of Sigmund [23] do not seem to translate to the computable setting: the analogue to Theorem 3.1 above is given by Lemma 1 of Sigmund [22], which relies on the pointwise ergodic theorem in an essential way. In contrast, the proof of Theorem 3.1 relies on particular features of 2 N . It seems likely that Sigmund's result is noneffective, which is to say, there are computable dynamical systems satisfying a computable version of the specification property, but lacking any computable generic points. It would therefore be interesting to know the extent to which the methods used here can be generalized.