Expectation, Conditional Expectation and Martingales in Local Fields

We investigate a possible deﬁnition of expectation and conditional expectation for random variables with values in a local ﬁeld such as the p -adic numbers. We deﬁne the expectation by analogy with the observation that for real-valued random variables in L 2 the expected value is the orthogonal projection onto the constants. Previous work has shown that the local ﬁeld version of L ∞ is the appropriate counterpart of L 2 , and so the expected value of a local ﬁeld-valued random variable is deﬁned to be its “projection”in L ∞ onto the constants. Unlike the real case, the resulting projection is not typically a single constant, but rather a ball in the metric on the local ﬁeld. However, many properties of this expectation operation and the corresponding conditional expectation mirror those familiar from the real-valued case; for example, conditional expectation is, in a suitable sense, a contraction on L ∞ and the tower property holds. We also deﬁne the corresponding notion of martingale, show that several standard examples of martingales (for example, sums or products of suitable independent random variables or “harmonic”functions composed with Markov chains) have local ﬁeld analogues, and obtain versions of the optional sampling and martingale convergence theorems.


Introduction
Expectation and conditional expectation of real-valued random variables (or, more generally, Banach space-valued random variables) and the corresponding notion of martingale are fundamental objects of probability theory. In this paper we investigate whether there are analogous notions for random variables with values in a local field (that is, a locally compact, non-discrete, totally disconnected, topological field) -a setting that shares the linear structure which underpins many of the properties of the classical entities.
The best known example of a local field is the field of p-adic numbers for some positive prime p. This field is defined as follows. We can write any non-zero rational number r ∈ Q\{0} uniquely as r = p s (a/b), with a, b, and s integers, where a and b are not divisible by p. Set |r| = p −s . If we set |0| = 0, then the map | · | has the properties: (1) The map (x, y) → |x − y| defines a metric on Q and we denote the completion of Q in this metric by Q p . The field operations on Q extend continuously to make Q p a topological field called the p-adic numbers. The map | · | also extends continuously and the extension has properties (1).
The closed unit ball around 0, Z p = {x ∈ Q p : |x| ≤ 1}, is the closure in Q p of the integers Z, and is thus a ring (this is also apparent from (1)), called the p-adic integers. As Z p = {x ∈ Q p : |x| < p}, the set Z p is also open. Any other ball around 0 is of the form {x ∈ Q p : |x| ≤ p −k } = p k Z p for some integer k.
Every local field is either a finite algebraic extension of the p-adic number field for some prime p or a finite algebraic extension of the p-series field; that is, the field of formal Laurent series with coefficients drawn from the finite field with p elements.) A locally compact, non-discrete, topological field that is not totally disconnected is necessarily either the real or the complex numbers.
From now on, we let K be a fixed local field. Good general reference for the properties of local fields and analysis on them are (Sch84;vR78;Tai75;vR78). The following are the properties we need.
There is a real-valued mapping x → |x| on K called the non-archimedean valuation with the properties (1). The third of these properties is the ultrametric inequality or the strong triangle inequality. The map (x, y) → |x − y| on K × K is a metric on K which gives the topology of K. A consequence of of the strong triangle inequality is that if |x| = |y|, then |x + y| = |x| ∨ |y|. This latter result implies that for every "triangle" {x, y, z} ⊂ K we have that at least two of the lengths |x − y|, |x − z|, |y − z| must be equal and is therefore often called the isosceles triangle property.
The valuation takes the values {q k : k ∈ Z} ∪ {0}, where q = p c for some prime p and positive integer c (so that for K = Q p we have c = 1). Write D for {x ∈ K : |x| ≤ 1} (so that D = Z p when K = Q p ). Fix ρ ∈ K so that |ρ| = q −1 . Then for each k ∈ Z (so that for K = Q p we could take ρ = p). The set D is the unique maximal compact subring of K (the ring of integers of K). Every ball in K is of the form x + ρ k D for some x ∈ D and k ∈ Z. If B = x + ρ k D and C = y + ρ ℓ D are two such balls, then In particular, if q −k = q −ℓ , then either B ∩ C = ∅ or B = C, depending on whether or not We have shown in a sequence papers (Eva89; Eva91; Eva93; Eva95; Eva01b; Eva01a; Eva02; Eva06) that the natural analogues on K of the centered Gaussian measures on R are the normalized restrictions of Haar measure on the additive group of K to the compact the balls ρ k D and the point mass at 0. There is a significant literature on probability on the p-adics and other local fields. The above papers contain numerous references to this work, much of which concerns Markov processes taking values in local fields. There are also extensive surveys of the literature in the books (Khr97; Koc01; KN04).
It is not immediately clear how one should approach defining the expectation of a local field valued random variable X. Even if X only takes a finite number of values {x 1 , x 2 , . . . , x n }, then the object k x k P{X = x k } doesn't make any sense because x k ∈ K whereas P{X = x k } ∈ R. However, it is an elementary fact that if T is a real-valued random variable with E[T 2 ] < ∞, then c → E[(T −c) 2 ] is uniquely minimized by c = E[T ]. Of course, since this observation already uses the notion of expectation it does not lead to an alternative way of defining the expected value of a real-valued random variable. Fortunately, we can do something similar, but non-circular, in the local field case. We should mention at this point that there is a theory of integration of local field valued functions against local field valued measures -this often goes under the title of ultrametric integration or non-Archimedean integration: see, for example, (Khr94; vR78; Sch84).
Fix a probability space (Ω, F, P). By a K-valued random variable, we mean a measurable map from Ω equipped with F into K equipped with its Borel σ-field. Let L ∞ be the space of K-valued random variables X that satisfy X ∞ := ess sup |X| < ∞. It is clear that L ∞ is a vector space over K. If we identify two random variables as being equal when they are equal almost surely, then The map (X, Y ) → X − Y ∞ defines a metric on L ∞ (or, more correctly, on equivalence classes under the relation of equality almost everywhere), and L ∞ is complete in this metric. Hence L ∞ is an instance of a Banach algebra over K.
It is apparent from the papers on analogues of Gaussian measures cited above that L ∞ is the natural local field counterpart of the real Hilbert space L 2 . In particular, there is a natural notion of orthogonality on L ∞ (albeit one which does not come from an inner product structure).
The expectation of the K-valued random variable X is the subset of K given by We show in Section 2 that E[X] is non-empty. Note that if c ′ ∈ E[X] and c ′′ ∈ K is such that |c ′′ − c ′ | ≤ ε(X), then, by the strong triangle inequality, c ′′ ∈ E[X]. Thus E[X] is a (closed) ball in K (where we take a single point as being a ball).
Observe that we use the same notation for expectation of K-valued and R-valued random variables. This should cause no confusion: we either indicate explicitly whether a random variable has values in K or R, or this will be clear from context. The outline of the rest of the paper is the following. We show in Section 2 that the expected value of a random variable in L ∞ is non-empty, remark on some of the properties of the expectation operator, and motivate the definition of conditional expectation by considering the situation where the conditioning σ-field is finitely generated or, more generally, has an associated regular conditional probability. The appropriate definition of the conditional expectation of X ∈ L ∞ given a sub-σ-field G ⊆ F is not, as one might first imagine, the L ∞ projection of X onto L ∞ (G) (:= the subspace of L ∞ consisting of G-measurable random variables). For this reason, we need to do some preparatory work in Sections 3 and 4 before finally presenting the construction of conditional expectation in Section 5 and describing its elementary properties in Section 6. We establish an analogue of the "tower property" in Section 7 and obtain a counterpart of the fact for classical conditional expectation that conditioning is a contraction on L 2 (both of these results need to be suitably interpreted due to the conditional expectation being typically a set of random variables rather than a single one). We introduce the associated notion of martingale in Section 9 and observe that several of the classical examples of martingales have local field analogues. We develop counterparts of the optional sampling theorem and martingale convergence theorem in Sections 10 and 11, respectively.
We remark that in (Kan03) there is a brief attempt along the lines we have followed to define a conditional expectation and the consequent notion of martingale in the local field context, although there it is an L 2 rather than an L ∞ distance that is minimized and only a few properties of the resulting objects are explored.
Note: We adopt the convention that all equalities and inequalities between random variables should be interpreted as holding P-almost surely.

Expectation
Theorem 2.1. The expectation of a random variable X ∈ L ∞ is non-empty. It is the smallest closed ball in K that contains suppX (the closed support of X).
Proof. By the strong triangle inequality X − c ∞ ≤ X ∞ ∨ |c|, and X − c ∞ = |c| for |c| > X ∞ . Therefore, the infimum of c → X − c ∞ over all c ∈ K is the same as the infimum over {c ∈ K : |c| ≤ X ∞ } and any point c ∈ K at which the infimum of is achieved must necessarily satisfy |c| ≤ X ∞ . That is, Again by the strong triangle inequality, the function c → X − c ∞ is continuous. Consequently, E[X] is non-empty as the set of points at which a continuous function on a compact set attains its infimum.
As we observed in the Introduction, E[X] is a ball of radius (= diameter) ε(X). If x ∈ suppX is not in E[X] and c is any point in E[X], then, by the strong triangle inequality, |x − c| > ε(X) and , it must be a ball contained in E[X] with diameter r < ε(X). However, if c is any point contained in the smaller ball, then |x − c| ≤ r for all x ∈ suppX, contradicting the definition of ε(X).
Our notion of expectation shares some of the features of both the mean and the variance of a real-valued variable. Any point in the ball E[X] is as good a single summary of the "location" of X as any other, whereas the diameter of E[X] (that is, ε(X)) is a measure of the "spread" of X.

Some properties of E[X] are immediate. It is easily seen that for constants
, with equality when X and Y are independent. This follows from the fact that supp(X + Y ) ⊆ suppX + suppY , with equality when X and Y are independent. Also, if X and Y are independent, then E . These remarks further support our assertion that E[X] combines the properties of the mean and the variance for real-valued random variables.
Define the Hausdorff distance between two subsets A and B of K to be We know from Theorem 2.1 that E[X] and E[Y ] are balls with diameters ε(X) and ε(Y ), respectively. We have one of the alternatives and there exists y ∈ suppY such that y is not in the unique ball of diameter q −1 ε(Y ) containing E[X]. Then, by the strong triangle inequality, |x − y| = ε(Y ) for all x ∈ suppX, and so ) in this case. Similar arguments in the other cases show that This is analogous to the continuity of real-valued expectation with respect to the real L p norms.
Rather than develop more properties of expectation, we move on to the corresponding definition of conditional expectation because, just as in the real case, expectation is the special case of conditional expectation that occurs when the conditioning σ-field is the trivial σ-field {∅, Ω}, and so results for expectation are just special cases of ones for conditional expectation.
In order to motivate the definition of conditional expectation, first consider the special case when the conditioning σ-field G ⊆ F is generated by a finite partition {A 1 , A 2 , . . . , A n } of Ω. In line with our definition of E[X], a reasonable definition of E[X | G] would be the set of G-measurable random variables Y such that for each k the common value of c k := Y (ω) for ω ∈ A k satisfies Equivalently, suppose we define ε(X, G) to be the G-measurable, R-valued random variable that takes the value inf c∈K ess sup{|X(ω) − c| : More generally, suppose that G ⊆ F is an arbitrary sub-σ-field and there is an associated regular conditional probability P G (ω ′ , dω ′′ ) (such a regular conditional probability certainly exists if G is finitely generated). In this case, we expect that E[X | G](ω ′ ) should be the expectation of X with respect to the probability measure P G (ω ′ , ·). It is easy to see that if we let ε(X, G) be the G-measurable random variable such that ε(X, G)(ω ′ ) is the infimum over c ∈ K of the essential supremum of |X − c| with respect to P G (ω ′ , ·), then this definition of ε(X, G) subsumes our previous one for the finitely generated case and our putative definition of E[X | G] coincides with the set of G-measurable random variables Y such that |X − Y | ≤ ε(X, G), thereby also extending the definition for the finitely generated case.
We therefore see that the key to giving a satisfactory general definition of E[X | G] for an arbitrary sub-σ-field G ⊆ F is to find a suitable general definition of ε(X, G). We tackle this problem in the next three sections. Lemma 3.2. (i) Suppose that S is a non-negative real-valued random variable and G is a sub-σ-field of F. Then S ≤ ess sup{S | G}.

Conditional essential supremum
(ii) Suppose that S and G are as in (i) and T is G-measurable real-valued random variable with S ≤ T . Then ess sup{S | G} ≤ T .
(iii) Suppose that S ′ and S ′′ are non-negative real-valued random variables and G is a sub-σfields of F. Then Hence, for each p ≥ 1,  Let {F n } ∞ n=0 be a filtration (that is, a non-decreasing sequence of sub-σ-fields of F). Recall that a random variable T with values in {0, 1, 2, . . .} is a stopping time for the filtration if {T = n} ∈ F n for all n. Recall also that if T is a stopping time, then the associated σ-field F T is the collection of events A such that A ∩ {T = n} ∈ F n for all n. Proof. This follows immediately from the definition of the conditional essential supremum and the fact that if U is a non-negative real-valued random variable, then ess sup{U |F T } = ess sup{U |F n } on the event {T = n} (see, for example, Proposition II-1-3 of (Nev75)).
Notation 4.2. Given A ∈ F, the K-valued random variable 1 A is given by where 1 K and 0 K are, respectively, the multiplicative and additive identity elements of K. We continue to use this same notation to also denote the analogously defined real-valued indicator random variable, but this should cause no confusion as the meaning will be clear from the context.
(iii) If X 1 , X 2 , . . . ∈ L ∞ and A 1 , A 2 , . . . ∈ G are pairwise disjoint, then Proof. Part (i) follows immediately from the definition. Part (ii) follows from part (i): since X 1 A = Y 1 A by assumption, Part (iii) follows from parts (i) and (ii): for any of the events A j , . Part (iv) is an immediate consequence of Lemma 3.2(iii). However, there is also the following alternative, more elementary proof. Note first that X r G = X r G for any r > 0 because Thus, from Jensen's inequality and the observation that (x + y) s ≤ (x s + y s ) for 0 ≤ s ≤ 1, The following result is immediate from Corollary 3.3.
Lemma 4.4. Suppose that X ∈ L ∞ and G ⊆ H are sub-σ-fields of F. Then X H ≤ X G .
The following result is immediate from Lemma 3.4.
Lemma 4.5. Suppose that X ∈ L ∞ , {F n } ∞ n=0 is a filtration of sub-σ-fields of F, and T is a stopping time. Then

Construction of Conditional Expectation
Definition 5.1. Given X ∈ L ∞ and a sub-σ-field G ⊆ F, set Remark 5.2. Before showing that E[X | G] is non-empty, we comment on a slight subtlety in the definition. One way of thinking of our definition of E[X] as the set of c ∈ K for which X − c ∞ is minimal, is that E[X] is the set of projections of X onto K ≡ L ∞ ({∅, Ω}). A possible definition of E[X | G] might therefore be the analogous set of projections of X onto L ∞ (G), that is, the set of Y ∈ L ∞ (G) that minimize X − Y ∞ . This definition is not equivalent to ours. For example, suppose that Ω consists of the three points {α, β, γ}, F consists of all subsets of Ω, P assigns positive mass to each point of Ω, G = σ{{α, β}, {γ}}, and X is given by X(α) = 1 K , X(β) = 0 K , and X(γ) = 0 K . Consider Y ∈ L ∞ (G), so that Y (α) = Y (β) = c and Y (γ) = d for some c, d ∈ K. In order that Y ∈ E[X | G] according to our definition, c and d must be chosen to minimize both |1 K − c| ∨ |0 K − c| and |0 K − d|. By the strong triangle inequality, |1 K − c| ∨ |0 K − c| is minimized by any c with |c| ≤ 1, with the corresponding minimal value being 1. Of course, |0 K − d| is minimized by the unique value d = 0 K . On the other hand, in order that Y is a projection of X onto L ∞ (G), the points c and d must be chosen to minimize |1 K − c| ∨ |0 K − c| ∨ |0 K − d|, and this is accomplished as long as |c| ≤ 1 and |d| ≤ 1. We don't belabor the point in what follows, but several of the natural counterparts of standard results for classical conditional expectation that we show hold for our definition fail to hold for the "projection" definition.
The following lemma is used below to show that E[X | G] is non-empty.
Lemma 5.3. Suppose that X ∈ L ∞ is not 0 K almost surely, and G is a sub-σ-field of F. Set q −N = X ∞ . Then there exist disjoint events A 0 , A 1 , . . . ∈ G and random variables Y 0 , Y 1 , . . . ∈ L ∞ (G) with the following properties: (1) On the event A n , X − Z G ≥ q −(N +n) for every Z ∈ L ∞ (G).
(2) On the event A n , X − Y n G = q −(N +n) . and (4) On the event n k=1 A k , Y p = Y n for any p > n.
(5) The event ∞ k=1 A k has probability one.
Proof. Suppose without loss of generality that X ∞ = 1, so that N = 0. Set Z 0 := {Z ∈ L ∞ (G) : X − Z ∞ ≤ 1}. Note that the constant 0 belongs to Z 0 and so this set is non-empty.
It is clear that X − Y 0 G = 1 on the event A 0 and X − Y 0 G ≤ q −1 on the event Ω \ A 0 . Moreover, if there existed V ∈ L ∞ (G) with then we would have the contradiction that W ∈ Z 0 defined by Now suppose that A 0 , . . . A n−1 and Y 0 , . . . , Y n−1 have been constructed with the requisite properties. If P(Ω \ n−1 k=1 ) = 0, then take A n = ∅ and Y n = Y n−1 (recall that we are interpreting all equalities and inequalities as holding P-a.s.) Otherwise, set Note that Y n−1 belongs to Z n . Put δ n := inf Z∈Zn P{ X − Z G = q −n }. An argument very similar to the above with Z n and δ n replacing Z 0 and δ 0 establishes the existence of A n and Y n with the desired properties.
Theorem 5.4. Given X ∈ L ∞ and a sub-σ-algebra G ⊆ F, the conditional expectation E[X | G] is nonempty.

Elementary Properties of Conditional Expectation
Proposition 6.1. Fix a sub-σ-field G ⊆ F.
(iii) If X 1 , X 2 , . . . ∈ L ∞ and A 1 , A 2 , . . . ∈ G are pairwise disjoint, then Proof. Consider part (i). We first show the inclusion . Note that P{Z = 0, X = 0} = 0 and hence XW = Z, because otherwise we would have the contradiction We need to show that W ∈ E[Y | G]. Consider U ∈ L ∞ (G). By Lemma 4.3(ii) and the assumption that and so, by Lemma 4.3(i)+(ii) as required.
The proof of the claim E[X + Y | G] = X + E[Y | G] is similar but easier, so we omit it.
Parts (ii) and (iii) follow straightforwardly from parts (ii) and (iii) of Lemma 4.3.
Proposition 6.2. Let G be a sub-σ-algebra of F. Suppose that X ∈ L ∞ is independent of G.
Then E[X | G] is the set of random variables Y ∈ L ∞ (G) that take values in E[X].
Proof. Observe for any Z ∈ L ∞ (G), that, by the assumption of independence of X from G, , > ε(X), otherwise, and the result follows.
7 Conditional spread and the tower property Definition 7.1. Given X ∈ L ∞ and a sub-σ-field G of F, let ε(X, G) denote the common value of Proof. Suppose that D H (A, B) < δ for some δ ≥ 0. By definition, for every X ∈ A there is a Y ∈ B with X − Y ∞ < δ, and similarly with the roles of A and B reversed. If U ∈ A+ C, then U = X + W for some X ∈ A and W ∈ C. We know there is Y ∈ B such that X − Y ∞ < δ.
A similar argument with the roles of A and B reversed shows that D H (A + C, B + C) < δ.
Theorem 8.3. Suppose that X, Y ∈ L ∞ and G is a sub-σ-field of F.
By Proposition 6.1, Furthermore, on the event N : and similarly for Y . The result now follows from Lemma 8.2.

Martingales
Definition 9.1. Let {F n } ∞ n=0 be a filtration of sub-σ-fields of F. A sequence of random variables {X n } ∞ n=0 is a martingale if there exists X ∈ L ∞ such that X n ∈ E[X | F n ] for all n (in particular, X n ∈ L ∞ (F n )).
Remark 9.2. Note that our definition does not imply that X n ∈ E[X n+1 | F n ] for all n. For example, suppose that F n := {∅, Ω} for all n but X is not almost surely constant, then we obtain a martingale by taking X n to be any constant in the ball E[X], but we only have X n ∈ E[X n+1 | F n ] for all n if X 0 = X 1 = X 2 = . . ..

Many of the usual real-valued examples of martingales have K-valued counterparts.
Example 9.3. Let {Y n } ∞ n=0 be a sequence of independent random variables in L ∞ with 0 K ∈ E[Y n ] for all n. Suppose that ∞ k=0 Y k converges in L ∞ (by the strong triangle inequality and the completeness of L ∞ , this is equivalent to lim n→∞ Y n ∞ = 0). Set F n := σ{Y 0 , Y 1 , . . . , Y n }. Put X n := n k=0 Y k and X n := ∞ k=0 Y k It follows from the second claim of Proposition 6.1(i) that X n ∈ E[X | F n ] for all n and hence {X n } ∞ n=0 is a martingale.
Example 9.4. Let {Y n } ∞ n=0 be a sequence of independent random variables in L ∞ with 1 K ∈ E[Y n ] for all n. Suppose that ∞ k=0 Y k converges in L ∞ (by the strong triangle inequality and the completeness of L ∞ , this is equivalent to lim n→∞ Y n − 1 K ∞ = 0). Set F n := σ{Y 0 , Y 1 , . . . , Y n }. Put X n := n k=0 Y k and X := ∞ k=0 Y k . It follows from the first claim of Proposition 6.1(i) that X n ∈ E[X | F n ] for all n and hence {X n } ∞ n=0 is a martingale.
Example 9.5. Let {Z n } ∞ n=0 be a discrete time Markov chain with countable state space E and transition matrix P . Set F n := σ{Z 0 , Z 1 , . . . , Z n }. Say that f : E → K is harmonic if f is bounded and for all i ∈ E the expectation of f with respect to the probability measure P (i, ·) contains f (i) (that is, if f (i) is belongs to the smallest ball containing the set {f (j) : P (i, j) > 0}). Fix N ∈ {0, 1, 2, . . .}. Then {X n } ∞ n=0 := {f (Z n∧N )} ∞ n=0 is a martingale.
10 Optional sampling theorem Theorem 10.1. Let {F n } ∞ n=0 be a filtration. Suppose that X ∈ L ∞ and {X n } ∞ n=0 is a martingale with X n ∈ E[X | F n ] for all n. If T is a stopping time, then X T ∈ E[X | F T ].

Martingale convergence
Theorem 11.1. Let {F n } ∞ n=0 be a filtration. Suppose that X ∈ L ∞ and {X n } ∞ n=0 is a martingale with X n ∈ E[X | F n ] for all n. If X is in the closure of ∞ n=1 L ∞ (F n ), then lim n→∞ X n −X ∞ = 0 (in particular, {X n } ∞ n=0 converges to X almost surely).
Proof. Since X is in the closure of ∞ n=1 L ∞ (F n ), for each ε > 0 there exists Y ∈ L ∞ (F N ) for some N such that X − Y ∞ < ε. Because F N ⊆ F n for n > N , Y ∈ L ∞ (F n ) for n ≥ N . By Theorem 8.3, D H (E[X | F n ], E[Y | F n ]) < ε for n ≥ N . However, E[Y | F n ] consists of the single point Y , and so the Hausdorff distance is simply sup{ W − Y ∞ : W ∈ E[X | F n ]}. Thus