SHARP TAIL INEQUALITIES FOR NONNEGATIVE SUBMARTINGALES AND THEIR STRONG DIFFERENTIAL SUBORDINATES

Let f = ( f n ) n ≥ 0 be a nonnegative submartingale starting from x and let g = ( g n ) n ≥ 0 be a sequence starting from y and satisfying for n ≥ 1. We determine the best universal constant U ( x , y ) such that As an application, we deduce a sharp weak type ( 1,1 ) inequality for the one-sided maximal function of g and determine, for any t ∈ [ 0,1 ] and β ∈ (cid:82) , the number The estimates above yield analogous statements for stochastic integrals in which the integrator is a nonnegative submartingale. The results extend some earlier work of Burkholder and Choi in the martingale setting.


Introduction
The purpose of this paper is to study some new sharp estimates for submartingales and their differential subordinates. Let us start with introducing the necessary background and notation. In what follows, (Ω, , ) is a non-atomic probability space, filtered by a nondecreasing family ( n ) n≥0 of sub-σ-fields of . Let f = ( f n ) n≥0 be an adapted sequence of integrable variables. Then d f = (d f n ) n≥0 , the difference sequence of f , is given by Assume that g = (g n ) n≥0 is another adapted integrable sequence, satisfying |d g n | ≤ |d f n | and | (d g n | n−1 )| ≤ | (d f n | n−1 )|, n ≥ 1.
(1.1) Following Burkholder [5], we say that g is strongly differentially subordinate to f , if |g 0 | ≤ | f 0 | and the condition (1.1) holds. For example, this is the case when g is a transform of f by a predictable sequence v = (v n ) n≥0 , bounded in absolute value by 1. That is, we have d g n = v n d f n for n ≥ 0 and by predictability we mean that for each n the variable v n is measurable with respect to (n−1)∨0 . Let us also mention that if f is a martingale, then the strong differential subordination is equivalent to saying that g is a martingale satisfying |d g n | ≤ |d f n | for all n ≥ 0. Let |g| * = sup n |g n |, g * = sup n g n denote the maximal function of g and the one-sided maximal function of g, respectively. We will also use the notation || f || p = sup n || f n || p for the p-th norm of the sequence f , p ≥ 1. The problem of a sharp comparison of the sizes of f and g under various assumptions on f has been studied extensively by many authors. The literature on the subject is very rich, we refer the interested reader to the papers [3]- [9], [11]- [16] and references therein for more information on the subject, and [1], [2], [10] for applications to Riesz systems and the Beurling-Ahlfors transform. We only mention here a few classical estimates, related to the problem investigated in the paper. In the martingale setting, Burkholder [3] proved the following weak-type inequality. A natural question about the optimal constant above for nonnegative submartingales f was answered by Burkholder in [5].
Theorem 1.2. Suppose that f is a nonnegative submartingale and g is strongly differentially subordinate to f . Then for any λ > 0, and the constant 3 is the best possible.
The two results above have been extended and generalized in many directions, see e.g. [3], [6], [9], [11], [12], [15] and [16]. We take the line of research related to the following question, raised by Burkholder in [3]. Suppose that g, a strong differential subordinate to f , has at least probability t of exceeding β; how small can || f || 1 be? In the particular case when t = 1 and f is a martingale, the answer is the following (cf. [3]).
Theorem 1. 3. Suppose that f is a martingale starting from x and g is strongly differentially subordinate to f . If g satisfies the one-sided bound and the expression on the right is the best possible.
This result was generalized by Choi [9] to the case when t ∈ [0, 1] is arbitrary. Precisely, we have the following.
Theorem 1. 4. Suppose that f is a martingale starting from x and g is strongly differentially subordinate to f . If g satisfies the one-sided bound where t ∈ [0, 1] is a fixed number, then Again, the bound on the right is the best possible.

Main results
Our contribution will be, among other things, to establish a submartingale version of the theorems above. First, we study a more general problem and provide a sharp upper bound for the tail of g * , which depends not only on || f || 1 and f 0 , but also on the starting point of g. Throughout, the function U : [0, ∞) × → is given by (3.1) below.
Theorem 2.1. Let f be a nonnegative submartingale starting from x ≥ 0 and let g be a sequence starting from y ∈ such that the condition (1.1) is satisfied. Then and the inequality is sharp.
This will be proved in Sections 3 and 4 below. As an application, we will obtain in Section 5 the following extension of Theorem 1.4. Throughout the paper, the function L : [0, ∞)× ×[0, 1] → is given by (5.4) below.
Theorem 2.2. Let f be a nonnegative submartingale starting from x ≥ 0 and let g start from y ∈ . Suppose that (1.1) holds. If g satisfies the one-sided estimate where t ∈ [0, 1] is a fixed number, then and the bound is the best possible. In particular, if g is strongly differentially subordinate to f , then we have a sharp inequality Finally, Theorem 2.1 leads to another interesting variation of the inequality (1.3), to be proved in Section 5.

Theorem 2.3.
Assume that f is a nonnegative submartingale and g is strongly differentially subordinate to f . Then for any λ > 0 we have and the constant 8/3 is the best possible.
Comparing this to Theorem 1.2 we see that the constant decreases in the one-sided setting. This is not surprising: a careful study of Burkholder's example in [5] shows that the extremal pair ( f , g) in (1.3) is symmetric in the sense that (g * ≥ λ) = ((−g) * ≥ λ) = 1/2. In other words, half of the tail of |g| * comes from dropping to −λ; however, in (2.5) we do not take this part into account. We conclude this section by the observation that the results above yield some new and interesting sharp estimates for stochastic integrals in which the integrator is a nonnegative submartingale.
To be more precise, suppose that (Ω, , ) is complete and is equipped with a right-continuous filtration ( t ) t≥0 . Assume that X = (X t ) is an adapted nonnegative cadlag submartingale and let H = (H t ) t≥0 be a predictable process taking values in [−1, 1]. Let Y = (Y t ) t≥0 denote the Itô integral of H with respect to X , that is, Then standard approximation arguments (see [5]) yield the following.

Theorem 2.4. (i) If Y satisfies the one-sided bound
where p ∈ [0, 1] is a fixed number, then and the bound is the best possible.
(ii) For any λ > 0 we have ||X || 1 and the constant 8/3 is the best possible.

A special function
Consider the following subsets of [0, ∞) × : and let U : [0, ∞) × → be given by We also introduce the functions φ, ψ : Later on, we will need the following properties of these objects.

Lemma 3.1. (i) The function U is continuous on its domain and of class C 1 on the set E
There is an absolute constant A such that for all x ≥ 0 and y ∈ , Proof. (i) This is straightforward and reduces to tedious verification that the function U and its partial derivatives match appropriately at the common boundaries of the sets D 0 − D 3 . We omit the details.
(ii) This follows immediately from the formulas for U, φ and ψ above (in fact A = 1 suffices, but we will not need this).
(iii) Observe that ψ ≥ 0, which gives that the function y → U(x, y) is nondecreasing for a fixed x. It suffices to note that and it is evident that all the expressions are nonpositive.

Lemma 3.2. (i)
For any x ≥ 0, y ∈ and h, k ∈ satisfying |h| ≥ |k| and x + h ≥ 0 we have Proof. There is a well-known procedure to establish such an estimate (see e.g. [4]): fix x ≥ 0, y ∈ , a ∈ [−1, 1] and consider a function G = G x, y,a : [−x, ∞) → given by G(t) = U(x + t, y + at). Then the condition (3.4) is equivalent to saying that G is concave. Since U is of class C 1 on the set E (in virtue of part (i) of Lemma 3.1), the concavity is the consequence of the two conditions, which will be proved below: (a) G (t) ≤ 0 for those t, for which (x + t, y + at) lies in the interior of one of the sets D i , By the translation property G x, y,a (t + s) = G x+t, y+at,a (s), valid for all t ≥ −x and s ≥ −t − x, it suffices to establish (a) and (b) for t = 0. Let us verify the first condition.
which is nonpositive: this follows from |a| ≤ 1 and − 0)), then after some straightforward computations, On the other hand, if (x, y) ∈ ∂ D 3 ∩ ∂ D 0 and x > 2, then and we are done.
3) and the proof of the above lemma, we have that for any y the function t → U(t, y + t), t ≥ 0, is nonincreasing.
Now we turn to the proof of Theorem 2.1. Proof of (2.1). We will prove that for any nonnegative integer n we have y). (3.5) This will yield the claim: to see this, fix > 0 and introduce the stopping time τ = inf{n : g n ≥ − }. Note that Obviously, the family ({g τ∧n + ≥ 0}) n≥0 is nondecreasing. In addition, the modified pair ( f , g ) = ( f n , g τ∧n + ) n≥0 still satisfies the domination relation (1.1): this follows from the identity d g n = 1 {τ≥n} d g n , valid for all n ≥ 1. Hence, applying (3.5) to this pair yields and, in consequence, (g * ≥ 0) ≤ lim n→∞ (g n ≥ 0) ≤ || f || 1 + U(x, y + ). It suffices to let → 0 to get (2.1).
Thus it remains to establish (3.5). The key observation is that the sequence (U( f n , g n )) ∞ n=0 is an ( n )-supermartingale. Indeed, by (3.4), applied to x = f n , y = g n , h = d f n+1 and k = d g n+1 , we get By part (ii) of Lemma 3.1, both sides above are integrable. Apply the conditional expectation with respect to n to obtain that which gives the supermartingale property. Now use the majorization (3.2) to get which completes the proof.
Before we proceed, let us mention here how we have constructed the function U. Note that the assertion of Theorem 2.1 (see also (3.5)) can be rephrased as Here V (x, y) = 1 { y≥0} − x and the supremum is taken over all n and all pairs ( f , g) starting from (x, y), such that f is a nonnegative submartingale and (1.1) is satisfied. Repeating the arguments from [3] and [4], we are led to the corresponding boundary value problem. Namely, U is the least function on [0, ∞) × which majorizes V on the whole domain and satisfies the following condition: for any y ∈ , the functions t → U(t, y + t) and t → U(t, y − t) are concave and nonincreasing on [0, ∞). Some experimentation with these two assumptions leads to (3.1). For example, to get that U(x, y) = 1 − x on D 0 , we use the following argument. First, note that if U 1 , U 2 are any solutions to the above boundary value problem, then so is their minimum min{U 1 , U 2 }. Applying this to U 1 = U and U 2 given by U 2 (x, y) = 1 − x, we obtain U(x, y) ≤ 1 − x for all x ≥ 0, y ∈ (this bound can also be directly derived from (3.8)). Therefore, equality must hold for y ≥ 0, since V (x, y) = 1 − x for these y. If 0 < x + y < x, then consider the half line H of slope −1, passing through (x, y). Let u(t) = U(t, y − x − t), t ≥ 0, be the restriction of U to H. Then u is concave, u(t) = 1 − t for small t and u(t) ≥ 1 { y−x−t≥0} − t for all t. This implies that u(t) = 1 − t for all t and hence U(x, y) = 1 − x if x + y > 0. Finally, the continuity of U along the lines of slope 1 gives U(x, y) = 1 − x on the whole D 0 . Similar reasoning yields the explicit formula for U on the remaining D i 's.

Sharpness of (2.1)
Let δ > 0 be a fixed small number, to be specified later. Consider a Markov family ( f n , g n ) on [0, ∞) × , with the transities described as follows. (·|( f 0 , g 0 ) = (x, y)), the sequence f is a nonnegative submartingale and g satisfies d g n = ±d f n for n ≥ 1 (so (1.1) is satisfied). In fact, the steps described in (i)-(v) are martingale moves in the sense that x, y ( f n+1 , g n+1 )|( f n , g n ) = (x , y ) = (x , y ), provided the conditioning event has nonzero probability and (x , y ) belongs to one of the sets from (i)-(v). Using this Markov process we will show that the bound (2.1) is optimal. How did we obtain the transity function? A natural idea which comes into one's mind is to search for such a pair ( f , g), for which both estimates in (3.7) become equalities, or "almost" equalities. This implies that equality must also hold in (3.6), so, in other words, the Markov process must move according to the following rule. Assume that ( f 0 , g 0 ) = (x, y), x = 0, and let I be the line segment with endpoints given by the possible values of ( f 1 , g 1 ). Then U is linear, or "almost" linear when restricted to I. One easily checks that the steps described in (i)-(v) satisfy this condition. Set P δ (x, y) = x, y (g * ≥ 0), M δ (x, y) = lim n→∞ x, y f n . (4.1) Usually we will skip the upper index and write P, M instead of P δ , M δ , but it should be kept in mind that these functions do depend on δ. We will prove that if this parameter is sufficiently small, then P(x, y) − M (x, y) is arbitrarily close to U(x, y). (4.2) This will clearly yield the claim. It is convenient to split the remaining part of the proof into a few steps. To get the first statement above, note that by (iii) and Markov property, we have which can be rewritten in the form Similarly, using (iv) and Markov property, we get where in the last passage we have used 2 • . Plugging this into (4.5) yields Analogous argumentation, with the use of (iii), (vi) and Markov property, leads to the equation Adding (4.6) to (4.7) gives where which can be bounded in absolute value by 2/x 2 . Now we use (4.8) several times: if N is the largest integer such that x + 3N δ < 2, then where |c| < 2/x 2 . Now we will study the limit behavior of the terms on the right as δ → 0. First, note that for any k = 1, 2, . . . , N , where |d(k)| ≤ 1/x 2 ; consequently, for somed satisfying |d| ≤ 1/x 2 . Since N = O(1/δ) for small δ, we conclude that the product in (4.10) converges to exp(− 1 3 2 x t −1 dt) = (x/2) 1/3 as δ → 0. The next step is to show that the expression in the square brackets in (4.9) converges to 2 as δ tends to 0. First observe that where in the first passage we have used the definition of B, in the second we have exploited (vi), and the latter is a consequence of 1 • and 3 • . To show that A(x + 3N δ) converges to 1, use (4.5), with x replaced by x + 3N δ, to get Note that by the definition of N , the point under P lies in D 3 , and arbitrarily close to the line y = −x, if δ is sufficiently small. Thus, by (v) and 3 • , can be made arbitrary close to 1, provided δ is small enough. Summarizing, letting δ → 0 in (4.9) yields the first limit in (4.4). To get the second one, multiply both sides of (4.7) by 1/2, subtract it from (4.6) and proceed as previously. We proceed exactly in the similar manner: arguing as reviously, C and D satisfy the same system of equations as A and B, that is, (4.6) and (4.7). The only difference in the further considerations is that C(x + 3N δ) → 2 and D(x + 3N δ) → 0 as δ → 0. The final step is to combine (4.3) and (4.11). We get in view of 1 • and 3 • . Since U(x, y) = P(x, y) − M (x, y), (4.2) follows. The other states are checked similarly. The proof of the sharpness is complete.
We conclude this section with an observation which follows immediately from the above considerations. It will be needed later in the proof of Theorem 2.2.

Applications
We start with the following auxiliary fact.

4)
Proof of (2.3) and (2.4). Clearly, it suffices to prove the inequality for β = 0, replacing ( f , g) by ( f , g − β), if necessary. Let C > 0 be an arbitrary constant. Application of (2.1) to the sequences C f , C g yields (g * ≥ 0) − C|| f || 1 ≤ U(C x, C y), which, by (2.2), leads to the bound If one maximizes the right-hand side over C, one gets precisely L(x, y, t). This follows from a straightforward but lengthy analysis of the derivative with an aid of the previous lemma. We omit the details. To get (2.4), note that for fixed x and t, the function L(x, ·, t) is nonincreasing. This follows immediately from (5.5) and the fact that U(x, ·) is nondecreasing, which we have already exploited. Sharpness of (2.3) As previously, we may restrict ourselves to β = 0. Fix x ≥ 0, y ∈ and t ∈ [0, 1]. If x + y ≥ 0, then the examples studied in 1 • and 3 • in the preceding section give equality in (2.3). Hence we may and do assume that x + y < 0. Consider three cases. (i) Suppose that t ≤ 2x/(x − y). Take C > 0 such that (C x, C y) ∈ D 3 and take the Markov pair ( f , g), with ( f 0 , g 0 ) = (C x, C y), from the previous section. Then the pair ( f /C, g/C) gives equality in (2.3) (see 5 • ). (ii) Let 2x/(x − y) < t < 1 and take 0 < < 1 − t. Recall the function Q defined in Remark 4.1. First we will show that Q(C x, C y) = t + for some C = C( , t) > 0. Otherwise, we would have a contradiction with (5.6), Remark 4.1 and the equality Q(0, 0) = 1. Now fix δ > 0 and consider a Markov pair ( f , g), starting from (C x, C y), studied in the previous section. If δ is taken sufficiently small, then the following two conditions are satisfied: first, by (4.2), we have (g * ≥ 0) − || f || 1 ≥ U(C x, C y) − ; second, by the definition of Q, (g * ≥ 0) = P(C x, C y) ∈ (t, t +2 ). In other words, for this choice of δ, the pair ( f /C, g/C) starts from (x, y), satisfies (1.1), we have ((g/C) * ≥ 0) ≥ t and || f /C|| 1 ≤ (t − U(C x, C y) + 3 )/C ≤ L(x, y, t) + 3 /C.
To get the claim, it suffices to note that was arbitrary and that (5.7) holds.
The analysis is similar to the one presented in the case 3 • in the previous section. The details are left to the reader. We turn to the proof of the weak type inequality from Theorem 2.3.