Wald for non-stopping times: The rewards of impatient prophets

Let $X_1,X_2,\ldots$ be independent identically distributed nonnegative random variables. Wald's identity states that the random sum $S_T:=X_1+\cdots+X_T$ has expectation $E(T)) E(X_1)$ provided $T$ is a stopping time. We prove here that for any $1<\alpha\leq 2$, if $T$ is an arbitrary nonnegative random variable, then $S_T$ has finite expectation provided that $X_1$ has finite $\alpha$-moment and $T$ has finite $1/(\alpha-1)$-moment. We also prove a variant in which $T$ is assumed to have a finite exponential moment. These moment conditions are sharp in the sense that for any i.i.d.\ sequence $X_i$ violating them, there is a $T$ satisfying the given condition for which $S_T$ (and, in fact, $X_T$) has infinite expectation. An interpretation of this is given in terms of a prophet being more rewarded than a gambler when a certain impatience restriction is imposed.


Introduction
Let X 1 , X 2 , . . . be independent identically distributed (i.i.d.) nonnegative random variables, and let T be a nonnegative integer-valued random variable. Write S n = n i=1 X i and X = X 1 . Wald's identity [11] states that if T is a stopping time (which is to say that for each n, the event {T = n} lies in the σ-field generated by X 1 , . . . , X n ), then (1) ES T = ET · EX.
In particular, if X and T have finite mean then so does S T . It is natural to ask whether similar conclusions can be obtained if we drop the requirement that T be a stopping time. It is too much to hope that the equality (1) still holds. (For example, suppose that X i takes values 0, 1 with equal probabilities, and let T be 1 if X 2 = 0 and otherwise 2. Then ES T = 1 = 3 2 · 1 2 = ET · EX.) However, one may still ask when S T has finite mean. It turns out that finite means of X and T no longer suffice, but stronger moment conditions do. Our main result gives sharp moment conditions for this conclusion to hold. In addition, when the moment conditions fail, with a suitably chosen T we can arrange that even the final summand X T has infinite mean. Here is the precise statement. (If T = 0 we take by convention X T = 0). Theorem 1. Let X 1 , X 2 , . . . be i.i.d. nonnegative random variables, and write S n := n i=1 X i and X = X 1 . For each α ∈ (1, 2], the following are equivalent. (i) EX α < ∞.
(ii) For every nonnegative integer-valued random variable T satis- The special case α = 2 of Theorem 1 is particularly natural: then the condition on X in (i) is that it have finite variance, and the condition on T in (ii) and (iii) is that it have finite mean. At the other extreme, as α ↓ 1, (ii) and (iii) require successively higher moments of T to be finite. One may ask what happens when T satisfies an even stronger condition such as a finite exponential moment -what condition must we impose on X, if we are to conclude ES T < ∞? The following provides an answer, in which, moreover, the independence assumption may be relaxed. Theorem 2. Let X 1 , X 2 , . . . be i.i.d. nonnegative random variables, and write S n := n i=1 X i and X = X 1 . The following are equivalent.
(ii) For every nonnegative integer-valued random variable T satisfying Ee cT < ∞ for some c > 0, we have ES T < ∞. (iii) For every nonnegative integer-valued random variable T satisfying Ee cT < ∞ for some c > 0, we have EX T < ∞. Moreover, if X 1 , X 2 , . . . are assumed identically distributed but not necessarily independent, then (i) and (ii) are equivalent.
On the other hand, in the following variant of Theorem 1, dropping independence results in a different moment condition for T . Proposition 3. Let X be a nonnegative random variable. For each α ∈ (1, 2], the following are equivalent. (i) EX α < ∞.
In order to prove the implications (iii) ⇒ (i) of Theorems 1 and 2, we will assume that (i) fails, and construct a suitable T for which EX T = ∞ (and thus also ES T = ∞). This T will be the last time the random sequence is in a certain (time-dependent) deterministic set, i.e.
T := max{n : X n ∈ B n } for a suitable sequence of sets B n . It is interesting to note that, in contrast, no T of the form min{n : X n ∈ B n } could work for this purpose, since such a T is a stopping time, so Wald's identity applies. In the context of Theorem 2, T will take the form for a suitable function f .
The results here bear an interesting relation to so-called prophet inequalities; see [4] for a survey. A central prophet inequality (see [8]) states that if X 1 , X 2 , . . . are independent (not necessarily identically distributed) nonnegative random variables then where U denotes the set of all positive integer-valued random variables and S denotes the set of all stopping times. The left side is of course equal to E sup i X i . The factor 2 is sharp. The interpretation is that a prophet and a gambler are presented sequentially with the values X 1 , X 2 , . . ., and each can stop at any time k and then receive payment X k . The prophet sees the entire sequence in advance and so can obtain the left side of (2) in expectation, while the gambler can only achieve the supremum on the right. Thus (2) states that the prophet's advantage is at most a factor of 2.
The inequality (2) is uninteresting when (X i ) is an infinite i.i.d. sequence, but for example applying it to X i 1[i ≤ n] (where n is fixed and (and the factor of 2 is again sharp). How does this result change if we replace the condition that U and S are bounded by n with a moment restriction? It turns out that the prophet's advantage can become infinite, in the following sense. Let X 1 , X 2 , . . . be any i.i.d. nonnegative random variables with mean 1 and infinite variance. By Theorem 1, there exists an integer-valued random variable T so that µ := ET < ∞ but EX T = ∞. Then we have Here the first claim follows by taking U = T and the second claim follows from Wald's identity.
Interpreting impatience as meaning that the time we stop at has a mean of at most µ, we see that this impatience hurts the gambler much more than the prophet.
Our proof of the implication (i) ⇒ (ii) in Theorem 1 will rely on a concentration inequality which is due to Hsu and Robbins [5] for the important special case α = 2, and a generalization due to Katz [7] for α < 2. For expository reasons, we include a proof of the Hsu-Robbins inequality, which is simpler than the original proof, and is an adaptation of that given in [2]. Thus, we give a complete proof from first principles of Theorem 1 in the case α = 2. Erdős [3] proved a converse of the Hsu-Robbins result; we will also obtain this converse in the case of nonnegative random variables as a corollary of our results.
Throughout the article we will write X = X 1 and S n := n i=1 X i . If T = 0 then we take X T = 0 and S T = 0.

The case of exponential tails
In this section we give the proof of Theorem 2, which is relatively straightforward. We start with a simple lemma relating X T and S T for T of the form that we will use for our counterexamples. The same lemma will be used in the proof of Theorem 1.
Lemma 4. Let X 1 , X 2 , . . . be i.i.d. nonnegative random variables. Let T be defined by T = max{k : X k ∈ B k } for some sequence of sets B k for which this set is a.s. finite, and where we take T = 0 and X T = 0 when the set is empty. Then Proof. Observe that 1[T = k] and S k−1 are independent for every k ≥ 1. Therefore, Proof of Theorem 2. We first prove that (i) and (ii) are equivalent, assuming only that the X i are identically distributed (not necessarily independent).
Assume that (i) holds, i.e. E[X(log X) + ] < ∞, and that T is a nonnegative integer-valued random variable satisfying Ee cT < ∞. Observe that X k ≤ e ck + X k 1[X k > e ck ], so The first sum equals e c (e cT − 1) e c − 1 which has finite expectation. The expectation of the second sum is at most Hence ES T < ∞ as required, giving (ii). Now assume that (i) fails, i.e. E[X(log X) + ] = ∞, but (ii) holds (still without assuming independence of the X i ). Taking T ≡ 1 in (ii) shows that EX < ∞. Now let (4) T := max{k ≥ 1 : where T is taken to be 0 if the set above is empty and ∞ if it is unbounded. Then by Markov's inequality. The last sum is (EX)e 1−k /(e − 1), and hence Ee ck < ∞ for suitable c > 0 (and in particular T is a.s. finite). On the other hand, which is infinite, contradicting (ii). Now assume that the X i are i.i.d. We have already established that (i) and (ii) are equivalent, and (ii) immediately implies (iii) since S T ≥ X T . It therefore suffices to show that (iii) implies (i). Suppose (i) fails and (iii) holds. Taking T ≡ 1 in (iii) shows that EX < ∞. Now take the same T as in (4). As argued above, ES T = ∞ and Ee cT < ∞ for some c > 0 (so ET < ∞). Hence (iii) gives EX T < ∞. But this contradicts Lemma 4.
Remark Conditions (i) and (iii) cannot be equivalent if the i.i.d. condition is dropped since if X 1 = X 2 = X 3 = . . ., then X T = X 1 for every T and so (iii) just corresponds to X having a first moment.

The case α = 2 and the Hsu-Robbins Theorem
In this section we prove Theorem 1 in the important special case α = 2 (so 1/(α − 1) = 1). We will use the following result of Hsu and Robbins [5]. For expository purposes we include a proof of this result, which is simpler than the original proof, and is based on an argument from [2].
The first term on the right is summable in n, and the second term is summable by the assumption of finite variance. Applying the same argument to −S n completes the proof.
We will also a need a simple fact of real analysis, a converse to Hölder's inequality, which we state in a probabilistic form. See, e.g., Lemma 6.7 in [9] for a related statement. The proof method is known as the "gliding hump"; see [10] and the references therein. Lemma 6. Let p, q > 1 satisfy 1/p + 1/q = 1. Assume that a nonnegative random variable X satisfies EXg(X) < ∞ for every nonnegative function g that satisfies Eg q (X) < ∞. Then EX p < ∞.
Proof. Assume EX p = ∞. Letting ψ k := P(⌊X⌋ = k), we have ∞ k=1 ψ k k p = E⌊X⌋ p = ∞, so we can choose integers 0 = a 0 < a 1 < a 2 , . . . such that for each ℓ ≥ 1, Denote the interval [a ℓ−1 , a ℓ ) by I ℓ and let g be defined on [0, ∞) by Since (p − 1)q = p, we obtain On the other hand We can now proceed with the main proof.
Proof of Theorem 1, case α = 2. We will first show that (i) and (ii) are equivalent. Assume (i) holds, i.e. EX 2 < ∞, and let T satisfy ET < ∞. We may assume without loss of generality that EX = 1. By the nonnegativity of the X i , we have (6) P(S T ≥ 2n) ≤ P(T ≥ n) + P(S n ≥ 2n).
Since ET < ∞, the first term on the right is summable in n. Since EX 2 < ∞ and EX = 1, Theorem 5 with ǫ = 1 implies that the second term is also summable. We conclude that ES T < ∞. Now assume (ii). To show that X has finite second moment, using Lemma 6 with p = q = 1, we need only show that for any nonnegative function g satisfying Eg 2 (X) < ∞, we have EXg(X) < ∞. Given such a g, consider the integer valued random variable (7) T g := max{k ≥ 1 : where T g is taken to be 0 if the set is empty or ∞ is it is unbounded. We have Since Eg 2 (X) < ∞, the last expression is finite, and hence ET g < ∞.
Thus, by assumption (ii), we have ES Tg < ∞. However so that EX⌊g(X)⌋ < ∞, which easily yields EXg(X) < ∞ as required. Clearly (ii) implies (iii). Finally, we proceed as in the proof of Theorem 2 to show (iii) implies (i). Suppose (i) fails and (iii) holds. Taking T ≡ 1 in (iii) shows that EX < ∞. Since EX 2 = ∞, Lemma 6 implies the existence of a g with Eg 2 (X) < ∞ but EXg(X) = ∞. Let T g be defined as in (7) above, for this g. The argument above shows that ES Tg = ∞ while ET g < ∞, and so the assumption (iii) gives EX Tg < ∞. However this contradicts Lemma 4.
We also obtain the following converse of the Hsu-Robbins Theorem due to Erdős. Corollary 7. Let X 1 , X 2 , . . . be i.i.d. nonnegative random variables with finite mean µ. Write S n = n i=1 X i and X = X 1 . If, for all ǫ > 0, ∞ n=1 P(|S n − nµ| ≥ nǫ) < ∞, then X has a finite variance.
Proof. Without loss of generality, we can assume that µ = 1. By Theorem 1 with α = 2, it suffices to show that ES T < ∞ for all T with finite mean. However, this is immediate from (6) -the first term on the right is summable since T has finite mean, and the second term is summable by the assumption of the corollary with ǫ = 1.

The case of α < 2
The proof of Theorem 1 in the general case follows very closely the proof for α = 2. We need the follow replacement of Theorem 5 due to Katz [7], whose proof we do not give here. A converse of the results in [7] appears in [1]. We will also use the general case of Lemma 6.
Next we show that (ii) implies (i). To show that X has a finite αmoment, using Lemma 6, it suffices to show that for any nonnegative function g satisfying Eg α/(α−1) (X) < ∞, we have EXg(X) < ∞. Given such a g, consider as before the integer valued random variable where T g is taken to be 0 if the set in empty or ∞ if it is unbounded.

The dependent case
Proof of Proposition 3. Assume (i) holds. If ET α/(α−1) < ∞ and X 1 , X 2 , . . . are as in (ii), then we can write The first sum is at most T α/(α−1) which has finite expectation. The expectation of the second sum is at most Hence ES T < ∞, as claimed in (ii). Now assume (ii) holds. To show that X has finite α-moment, using Lemma 6, it is enough to show that for any nonnegative g satisfying Eg α/(α−1) (X) < ∞, we have EXg(X) < ∞. It is easily seen that it suffices to only consider g that are integer valued. Given such a g, let T be g(X) and let all the X i be equal to X. Then ET α/(α−1) < ∞. By (ii), ES T < ∞. However, by construction S T = Xg(X), concluding the proof.