Variational estimates for martingale paraproducts

We show that bilinear variational estimates of Do, Muscalu, and Thiele (arXiv:1009.5187) remain valid for a pair of general martingales with respect to the same filtration. Our result can also be viewed as an off-diagonal generalization of the Burkholder--Davis--Gundy inequality for martingale rough paths by Chevyrev and Friz (arXiv:1704.08053).


Introduction
If f = (f n ) ∞ n=0 is a discrete-time real-valued martingale with respect to a filtration F = (F n ) ∞ n=0 , then Lépingle's variational inequality [Lép76] claims for any exponents p ∈ (1, ∞) and ∈ (2, ∞). Here for any random variable h we write h L p := (E|h| p ) 1/p and for a martingale f we set (1.2) Inequality (1.1) fails at the endpoint = 2 and then the corresponding result is Bourgain's jump inequality [Bou89], sup m∈N n 0 <n 1 <···<nm card k ∈ {1, . . . , m} f n k − f n k−1 ≥ λ 1/2 L p ≤ C p λ −1 f L p (1.3) for any exponent p ∈ (1, ∞) and any threshold λ ∈ (0, ∞). The supremum of cardinalities on the left hand side of (1.3) is usually called the λ-jump counting function and denoted by N λ (f ), so that (1.3) can be rewritten more elegantly as N λ (f ) 1/2 L p ≤ C p λ −1 f L p . Inequalities (1.1) and (1.3) provide quantitative refinements of the martingale convergence theorem, at least for martingales that are bounded in the L p -norm. The reader can consult the paper [MSZ18] for their elegant proofs and Banach-spacevalued generalizations.

If
df n := f n − f n−1 for n ≥ 1 denotes martingale differences, then the truncated paraproducts can be written, quite elegantly, as Π n,n (f, g) = n<i<j≤n df i dg j . (1.4) Note that g → Π(f, g) can be seen as a particular case of Burkholder's martingale transform [Bur66]. He took f to be an arbitrary process adapted to the filtration F and bounded in the L ∞ -norm, h L ∞ := ess sup|h|, and showed Π(f, g) L q ≤ C q f L ∞ g L q for any q ∈ (1, ∞). On the other hand, we additionally assume that f is a martingale, possibly unbounded. Indeed, the word "paraproduct" will be preferred because martingales f and g can be treated symmetrically thanks to a simple summation by parts identity. Estimates for martingale paraproducts outside Burkholder's range were first studied by Bañuelos and Bennett [BB88] (even though in the continuous time and with respect to the Brownian filtration only) and by Chao and Long [CL92]. Inequalities on the L p -spaces in the largest possible open range of exponents follow from [CL92,Theorem 7]: Indeed, [CL92, Theorem 7] deals with a maximal estimate, namely which is stronger than (1.5) when r ≤ 1. If r > 1, then Π(f, g) is again a martingale adapted to F, so in particular it also satisfies (1.1). However, one still cannot relax the condition > 2 for general f and g.
It is a bit surprising that there exists a variant of Lépingle's inequality for truncated martingale paraproducts that allows to go below 2. This is the main result of our paper and it is a generalization of Theorem 1.2 from the paper [DMT12] by Do, Muscalu, and Thiele.
Indeed, Do, Muscalu, and Thiele [DMT12] considered variants of Theorem 1.1 for either dyadic martingales or Littlewood-Paley-type convolutions. They motivate their result by an application to the bilinear iterated Fourier integral in the sequel paper [DMT17]. The main purpose of this note is to generalize their result to arbitrary martingales, since [DMT12] repeatedly relies on doubling conditions to both raise and lower the exponents p and q. In our approach we adapt many ideas from [DMT12], but we also use some fundamental martingale inequalities that only recently became available in the literature. Consequently, we are even able to give a somewhat shorter proof.
1.1. Continuous-time martingales. One benefit of having Theorem 1.1 formulated for general discrete-time martingales is that estimates (1.8) and (1.9) immediately transfer to continuous-time martingales X = (X t ) t≥0 and Y = (Y t ) t≥0 . It is standard in stochastic calculus to assume that X and Y almost surely have càdlàg paths and that their filtration F = (F t ) t≥0 satisfies "the usual hypotheses" from Protter's book [Pro05], i.e. that F 0 is complete and that F is right-continuous. We fix the exponents p, q, r satisfying (1.6) and additionally assume that (1.10) Under more restrictive conditions on X and Y , such as X L ∞ < ∞ and Y L 2 < ∞, the papers [BB88] and [KŠ18] proceed by defining the paraproduct as the process Π(X, Y ) = (Π t (X, Y )) t≥0 given by the stochastic integral Here X s− stands for the left limit lim u→s− X u . The above integral is understood as the Itô integral and it yields another process with almost surely càdlàg paths. In order to extend the definition of Π(X, Y ) to the martingales satisfying (1.10) only, and to enable the application of Theorem 1.1, we prefer to construct the martingale paraproduct as a limit of certain discrete-time paraproducts, namely the Riemann sums of (1.11).
for each ε > 0 and each t > 0. We say that Π(X, Y ) is the paraproduct of martingales X and Y . (b) Truncated paraproducts are now defined as random variables for any 0 ≤ t < t < ∞. We have for any ∈ (1, ∞) and for any λ ∈ (0, ∞).
1.2. Connection with rough paths. One can view the triple H n := (f n , g n , Π n (f, g)) as a process with values in a 3-dimensional Heisenberg group H ∼ = R 3 with the group operation (x, y, z) · (x , y , z ) = (x + x , y + y , z + z + xy ).
Then the truncated martingale paraproducts Π n,n are precisely the z-coordinates of the differences of this process. More precisely, for any times n ≤ n we have This corresponds to Chen's relation [FH14, (2.1)] in the theory of rough paths.
On the Heisenberg group we consider the homogeneous box norm (x, y, z) := max(|x|, |y|, |z| 1/2 ) and the corresponding distance function d ( Either using this estimate and [MSZ18, Lemma 2.17] or combining (1.8) and (1.1) one can also obtain the variational estimate (1.14) for any > 2. The estimate (1.14) for continuous martingales is a special case of a result of Friz and Victoir [FV06,Theorem 14], and for general càdlàg martingales it is a special case of a result of Chevyrev and Friz [CF19, Theorem 4.7] with F (x) = x r . Indeed, in our notation these results can be stated as (1.14) with f, g replaced by Sf, Sg on the right-hand side, where S denotes the martingale square function as in (2.1). Hence the estimate (1.13) can be seen as an off-diagonal and endpoint version of the cited results.

Some known martingale inequalities
We begin this section with a few words on the notation. Then we review several known martingale inequalities that will be needed in subsequent sections. Some of them we could not find formulated elsewhere with exactly the same assumptions. However, the proofs of those inequalities are still quite straightforward using the results available in the existing literature and we include them for completeness.
For any two quantities A, B ∈ [0, ∞] we will write A B when there exists an unimportant constant C ∈ [0, ∞) such that A ≤ CB. Furthermore, we will write A ∼ B if both A B and B A hold. Dependencies of the implicit constants on some parameters will be denoted in the subscripts of and ∼. For real numbers a and b we will write a ∧ b := min{a, b}, a ∨ b := max{a, b}.
We have already encountered the L p -quasinorms h → h L p in the introductory section, both for finite p and for p = ∞. Recall that for a martingale f = (f n ) ∞ n=0 the quantity f L p is defined by (1.2). Any nonnegative random variable w gives rise to the weighted L p -quasinorms, given for p ∈ (0, ∞) as On the other hand, the weak L p -quasinorm is defined as for any p ∈ (0, ∞). Any sequence of random variables h = (h (k) ) ∞ k=1 can be regarded as a vector-valued random element and for p ∈ (0, ∞] and q ∈ (0, ∞) we define the mixed L p ( q )-quasinorm Finally, p will always denote the conjugated exponent of p ∈ [1, ∞], i.e. the unique number p ∈ [1, ∞] such that 1/p + 1/p = 1.
For In different terminology these are the limits of the maximum process of f and the quadratic variation of f , respectively. If we start merely from a random variable h, then we automatically assign to it the martingale (h n ) ∞ n=0 defined by h n := E(h | F n ), so Mh and Sh still make sense.
The well known Burkholder-Davis-Gundy inequality claims that for every p ∈ [1, ∞). Indeed, the case p > 1 is due to Burkholder [Bur66], while the case p = 1 was shown by Davis [Dav70]. A weighted version of the latter case was established by Osękowski [Osę17]: where w is a nonnegative integrable random variable, interpreted as a weight. The implicit constant in (2.3) is an absolute one and Osękowski could choose 16( √ 2 + 1). Inequality (2.3) can also be viewed as a probabilistic analogue of a weighted estimate by Fefferman and Stein [FS71].
Moreover, Doob's maximal inequality reads It also has a weighted version, formulated for instance as a part of Theorem 3.2.3 in the book [HvNVW16]: for p ∈ (1, ∞]. In (2.5) we assume, for convenience, that (f n ) ∞ n=0 eventually becomes a constant sequence, so that f ∞ := lim n→∞ f n trivially makes sense with respect to every possible mode of convergence.
Suppose that T 0 ≤ T 1 ≤ T 2 ≤ · · · is a sequence of stopping times taking values in N 0 with respect to the filtration F and assume that each T k is bounded. These stopping times will be used for the purpose of certain "localization." Boundedness of each individual T k is a convenient assumption for the application of the optional sampling theorem; see e.g. [GS01]. For every k ∈ N and every n ∈ N 0 we note that (n ∨ T k−1 ) ∧ T k is also a stopping time with respect to F and define That way, each F (k) := (F (k) n ) ∞ n=0 becomes a filtration of the original probability space and each of these sequences of σ-algebras becomes constant for sufficiently large indices n.
Lemma 2.1. Let (T k ) ∞ k=0 be an increasing sequence of bounded stopping times, let (F (k) ) ∞ k=1 be a sequence of filtrations defined by (2.6), and for each k ∈ N let n=0 be a martingale with respect to F (k) that eventually becomes a constant sequence. For any p, q ∈ (1, ∞) we have (2.7) Lemma 2.1 can be viewed as an q -valued extension of Doob's maximal inequality (2.4). The proof of (2.7) is based on (2.5) and it already exists as the proof of [HvNVW16, Theorem 3.2.7]. However, the working assumption in [HvNVW16] is that f (k) are arbitrary martingales with respect to the same filtration, which is not the case here. For this reason and for the sake of completeness we prefer to repeat the short argument, rather than just invoke the result from [HvNVW16].
Proof of Lemma 2.1. The case p ≥ q is handled first. Let r ∈ (1, ∞] denote the conjugated exponent of p/q. To an arbitrary random variable w ≥ 0 satisfying w L r = 1 we associate the martingales (w n ) ∞ n=0 and w (k) = (w

By (2.5) this is at most a constant depending on q times
Applying (2.4) to Mw L r and recalling the freedom that we had in choosing w, we obtain which transforms into (2.7) after taking the q-th root of both sides. Turning to the case p ≤ q, we take some r ∈ (1, p) and denote a := p/r ∈ (1, ∞), b := q/r ∈ (1, ∞). Write (2.9) We are going to dualize the mixed L a ( b )-norm above and for this we take a sequence n ). Using (2.5) for each fixed k followed by Hölder's inequality we obtain Then applying the previous case of (2.7) (with p, q replaced by a , b ) to get and using duality we end up with . Recall the computation (2.9) and take the r-th root of both sides.
n=0 be a single martingale with respect to F. For every k ∈ N and every n ∈ N 0 we denote, for the rest of the paper, i.e.
That way, for each k ∈ N we have now defined a particular martingale f (k) : n=0 with respect to the filtration F (k) given by (2.6). It is "interesting" only for moments between T k−1 and T k . Consequently, the sequence (f (k) n ) ∞ n=0 eventually becomes constant and, in particular, the limit f (k) ∞ := lim n→∞ f (k) n exists (in every possible way) and simply equals f T k − f T k−1 . Many classical inequalities in terms of martingale f have their vector-valued extensions in terms of its "localized pieces" f (k) . Our next goal is to formulate and prove a couple of those, as they will be needed in the next section.
Lemma 2.2. Let (T k ) ∞ k=0 be an increasing sequence of bounded stopping times and let f be a martingale, both with respect to F. Moreover, let (f (k) ) ∞ k=1 be a sequence of martingales defined by (2.10).
(a) For any p ∈ (1, ∞) we have (2.12) Proof of Lemma 2.2. (a) Since the stopping times T k are bounded, using the optional sampling theorem (see Section 12.4 of the book [GS01]) and applying (2.2) and (2.4) to the "optionally sampled" martingale (f Tn ) ∞ n=0 we get Combining this with estimate (2.7) from Lemma 2.1 specialized to q = 2 establishes (2.11).
(b) Estimate (2.12) is immediate. We only need to observe Sf (k) 2 1/2 ≤ Sf, and then apply (2.2) and (2.4): 2.1. Multilinear interpolation. We will repeatedly use a multilinear version of the Marcinkiewicz interpolation theorem. We caution the reader that many such results exist in the literature, and not every version would be adequate for our purposes. We refer to [GLLZ12, Corollary 1.1], of which the result below is a special case, although it also follows e.g. from the result of [Jan88] on abstract interpolation spaces.

A vector-valued estimate for martingale paraproducts
The main ingredient in the proof of Theorem 1.1 is the following proposition.
Proposition 3.1. Let (T k ) ∞ k=0 be an increasing sequence of bounded stopping times and let f and g be martingales, all with respect to the same filtration F. Moreover, let (f (k) ) ∞ k=1 and (g (k) ) ∞ k=1 be sequences of martingales defined from f and g, respectively, via (2.10). Then for any exponents p, q, r satisfying (1.6) we have the estimate (3.1) Note that, for each k ∈ N, the paraproduct Π(f (k) , g (k) ) is a martingale with respect to the filtration F (k) given by (2.6). The sequence (Π n (f (k) , g (k) )) ∞ n=0 eventually becomes constant, so that Π ∞ (f (k) , g (k) ) makes sense. A crucial observation, following from (1.4) and needed later, is and these are precisely the truncated paraproducts appearing on the left hand sides of estimates (1.8) and (1.9) if we replace each n k with T k .
Proof of Proposition 3.1. Let us first discuss the case r ≥ 1 of estimate (3.1). We begin by proving the 1 -valued estimate (3.3) We will reduce it to the weighted estimate (2.3) for martingales Π(f (k) , g (k) ). Take an arbitrary nonnegative random variable satisfying w L r = 1 and define (w n ) ∞ n=0 and w (k) = (w (k) n ) ∞ n=0 as in (2.8). We have and, by (2.3) applied to martingale Π(f (k) , g (k) ) for each fixed k, this is at most a constant times Using Doob's inequality (2.4) for the martingale (w n ) ∞ n=0 we end up with Recalling the freedom that we had in choosing w we establish (3.3) by dualization.
In order to complete the proof of (3.1) in the case r ≥ 1, observe that the expression on the right hand side of (3.3) is, by the definition of the paraproduct, equal to which is, by Hölder's inequality, in turn bounded by In the last inequality we used (2.11) and (2.12) for the martingales f and g, respectively.
We will now prove the weak-type estimate for any p ∈ (1, ∞) and r ∈ (1/2, 1) such that 1/p + 1 = 1/r. This will conclude the proof of (3.1) for r < 1 by real interpolation with the previously established cases (Theorem 2.3). By the homogeneity of (3.4) we can normalize: assume f L p = 1 and g L 1 = 1. Fix a number ν > 0 and perform Gundy's decomposition [Gun68] of the martingale g at height α = ν r ; see its formulation as Theorem 3.4.1 in the book [HvNVW16]. It splits g as g n = g good n + g bad n + g harmless n , where g good = (g good n ) ∞ n=0 , g bad = (g bad n ) ∞ n=0 , and g harmless = (g harmless n ) ∞ n=0 are martingales with respect to F satisfying (3.7) Construct the martingales g good,(k) , g bad,(k) , and g harmless,(k) for the given sequence of stopping times via the formula (2.10). Using the previously established case r = 1 of estimate (3.1) and (3.5) we obtain Next, (3.6) yields Finally, by Hölder's inequality, Doob's inequality (2.4), and (3.7) we conclude Combining the above three estimates finishes the proof of (3.4).

Proof of variational and jump inequalities
Proof of Theorem 1.1. In the process of proving estimates (1.8) and (1.9) we can constrain the numbers n 0 , n 1 , . . . , n m to a finite interval of integers {0, 1, 2, . . . , n max }.
Then we only need to take care that the obtained constants do not depend on n max . Afterwards we will be able to let n max → ∞ and use the monotone convergence theorem, recovering Theorem 1.1 in its full generality. Let us begin with a stopping time argument enabling us to apply Proposition 3.1. We are given two martingales, f = (f n ) ∞ n=0 and g = (g n ) ∞ n=0 , with respect to the filtration F. Fix λ > 0 and recursively define an increasing sequence of stopping times (S k ) ∞ k=0 by setting S 0 := 0 and with the convention min ∅ = ∞. Then for each k ∈ N 0 set T k := S k ∧ n max . Denote by N λ (f, g) the supremum of cardinalities on the left hand side of (1.9), so that the desired estimate (1.9) becomes On the other hand, denote N λ (f, g) := sup{k ∈ N 0 | S k ≤ n max }.

Let us show that
and for this it is sufficient to show that each interval of integers (n , n ] ⊆ (0, n max ] such that n <i<j≤n has to contain at least one of the stopping times (S k ) ∞ k=1 . If that was not the case, then we could choose an index k ∈ N such that S k−1 ≤ n < n < S k , where we allow S k to be infinite. Let us use the identity n <i<j≤n and the fact that S k is strictly larger than n and n , which implies that each of the three terms on the right hand side is strictly less than λ/3 in the absolute value. That way we arrive at a contradiction with (4.2) and complete the proof of (4.1).
We plan to apply Proposition 3.1 with the above sequence of stopping times (T k ) ∞ k=0 . By the definitions of S k and T k we have In the first term above we use (3.2), while the second term is bounded by Altogether, by the Cauchy-Schwarz inequality, Mg (k) 2 1/2 , so using Hölder's inequality, (3.1), and (2.11) we obtain Recalling (4.1) we complete the proof of the jump inequality (1.9). By [MSZ18, Lemma 2.17] the jump estimate (1.9) immediately implies the weak type L p × L q → L r,∞ analogue of (1.8). The strong type -variational estimate (1.8) then follows by real interpolation for multisublinear operators (Theorem 2.3).

Continuous-time martingales
Proof of Corollary 1.2. (a) In the particular case X L ∞ < ∞ and Y L 2 < ∞ we already know that S(X, Y ; Σ n ) converge u.c.p. as n → ∞ to the stochastic process given by (1.11). This is the content of Theorem 21 in Chapter II of the book [Pro05].
In the general case, for any δ > 0 we find càdlàg martingales X = (X t ) t≥0 and Y = (Y t ) t≥0 with respect to F such that From the first part of the proof we know lim m,n→∞ P sup for each ε > 0 and each t > 0. By sampling arbitrary continuous-time martingales X and Y at times t ∧ τ (n) j we obtain discrete-time martingales such that (S t∧τ (n) j ( X, Y ; Σ n )) ln j=0 is their paraproduct. Thus, estimate (1.7) applies and, together with Doob's inequality for Y , easily gives with a constant independent of the partition Σ n . Applying this to each of the four terms in (5.1), using the Markov-Chebyshev inequality, applying (5.2), and finally letting δ → 0 + , we obtain lim sup m,n→∞ P sup s∈[0,t] |S s (X, Y ; Σ m ) − S s (X, Y ; Σ n )| > ε = 0 for ε, t > 0. Thus, S(X, Y ; Σ n ) converge u.c.p. as n → ∞ to some stochastic process, which we denote by Π(X, Y ). Note that Π(X, Y ) still has càdlàg paths a.s., since this property is preserved under taking u.c.p. limits. It is standard to conclude that Π(X, Y ) does not depend on the choice of (Σ n ) ∞ n=0 . (b) We explain how (1.8) implies (1.12); very similarly one can use (1.9) to prove (1.13). It is sufficient to establish a variant of (1.12) in which the numbers t 0 , t 1 , . . . , t m are only taken from a fixed finite set of nonnegative rational numbers Σ, but with a constant that does not depend on Σ. Afterwards, we can let those sets Σ exhaust [0, ∞) ∩ Q, invoking the monotone convergence theorem. At the very end one can recall that Π(X, Y ) almost surely has càdlàg paths, so that Π t,t (X, Y ) is almost surely right-continuous in t and t .
Starting with a finite set Σ we take an increasing sequence (Σ n ) ∞ n=0 of finite subsets of [0, ∞) with the following properties. If we write explicitly Σ n = a From part (a) applied to deterministic partitions Σ n we know that in probability. Repeatedly passing to almost surely convergent subsequences, we can assume that we already have almost sure convergence above for each of the finitely many choices of the numbers t 0 < t 1 < · · · < t m from Σ and for each 1 ≤ k ≤ m. It remains to apply estimate (1.8) to discrete-time martingales (X a (n) j ) ln j=0 and (Y a (n) j ) ln j=0 for each fixed n ∈ N, recognizing their truncated paraproducts in the last display. Then we use Fatou's lemma as n → ∞ to obtain control of the left hand side of (1.12).