Coupling Polynomial Stratonovich Integrals: the two-dimensional Brownian case

We show how to build an immersion coupling of a two-dimensional Brownian motion $(W_1, W_2)$ along with $\binom{n}{2} + n= \tfrac12n(n+1)$ integrals of the form $\int W_1^iW_2^j \circ dW_2$, where $j=1,\ldots,n$ and $i=0, \ldots, n-j$ for some fixed $n$. The resulting construction is applied to the study of couplings of certain hypoelliptic diffusions (driven by two-dimensional Brownian motion using polynomial vector fields). This work follows up previous studies concerning coupling of Brownian stochastic areas and time integrals (Ben Arous, Cranston and Kendall (1995), Kendall and Price (2004), Kendall (2007), Kendall (2009), Kendall (2013), Banerjee and Kendall (2015), Banerjee, Gordina and Mariano (2016)), and is part of an ongoing research programme aimed at gaining a better understanding of when it is possible to couple not only diffusions but also multiple selected integral functionals of the diffusions.


Introduction
A coupling of two probability measures µ 1 and µ 2 , defined on respective measure spaces (Ω 1 , F 1 ) and (Ω 2 , F 2 ), is a joint law µ defined on the product space (Ω 1 × Ω 2 , F 1 × F 2 ) whose marginals are µ 1 and µ 2 . A coupling of Markov processes X and X is an immersion coupling when the joint process {(X(t + s), X(t + s)) : s ≥ 0} conditioned on F t is again a coupling of the laws of X and X, but now starting from (X(t), X(t)). This is also called a co-adapted coupling [19], or faithful coupling [29], and is very closely related to the near-equivalent notion of a Markovian coupling [7], which additionally constrains the joint process (X(t), X(t)) to be Markovian with respect to the filtration (F t ) t≥0 . Immersion couplings are typically very much easier to describe than general couplings, since they may be specified in causal ways using, for example, stochastic calculus. In the following we consider couplings of smooth elliptic diffusions with d-dimensional state space R d . Specifically, for d ≥ 2 and 1 ≤ k ≤ d, consider the following Stratonovich stochastic differential equation on R d : Here x ∈ R d is the initial state, V 1 , . . . , V k are smooth vector fields, and (W 1 , . . . , W k ) is a standard Brownian motion on R k . We will consider couplings of two copies X and X of this diffusion, starting from arbitrary distinct initial states x, x ∈ R d . Interest focusses on the coupling time T , defined as T = inf{t ≥ 0 : X(s) = X(s) for all s ≥ t} .
The coupling is said to be successful if almost surely T < ∞ (where "almost surely" refers to the coupling measure µ). A major motivation to study couplings arises from the so-called Aldous' coupling inequality (see 1): where µ t and µ t denote the laws of X(t) and X(t) respectively and || · || T V denotes the total variation distance between probability measures given by Using inequality (2), construction of a coupling of X and X automatically bounds the total variation distance between the laws of the diffusions at time t. A maximal coupling is one for which the inequality (2) is actually an equality for all t. These have been shown to exist under very general conditions [14,28,13,30,11]. However in most cases the task of explicitly constructing such a maximal coupling is extremely hard, if not impossible. This provides strong motivation for considering immersion couplings which, although not maximal in most cases [4,23], are easier to describe and can provide helpful bounds via (2). Immersion couplings have been extensively studied for elliptic diffusions (which is to say diffusions given by (1) when k = d and {V 1 (x), . . . , V d (x)} form a basis for R d at each x ∈ R d ). The simplest example of such a coupling is the reflection coupling of Euclidean Brownian motions starting from two different points: the second Brownian path is obtained from the first by reflecting the first path on the hyperplane bisecting the line joining the starting points until the first path (equivalently, the second, reflected, path) hits this hyperplane. This coupling turns out to be maximal as well as Markovian! [24] extended this reflection construction to produce successful Markovian couplings for elliptic diffusions on R d with bounded and Lipschitz drift and diffusion coefficients when the diffusion matrix does not vary too much in space (see also 8). However in general the construction is not symmetric between the two coupled processes, and this method is not easily applicable when the diffusion matrix varies appreciably over space. A more geometric approach, depending symmetrically on the two coupled diffusions, is provided by the Kendall-Cranston coupling [18,9]. Consider the positive-definite diffusion matrix σ(x) formed with columns V 1 (x), . . . , V d (x). As x varies over R d , so this furnishes R d with a Riemannian metric g given by g(x) = (σ(x)σ(x) ⊤ ) −1 . With this intrinsic diffusion metric, R d becomes a Riemannian manifold and the diffusion can be recognized as a Brownian motion with drift on this manifold. One then uses an appropriate generalization of reflection involving parallel transport along geodesics to obtain reflection couplings. These couplings are successful when (for example) the Ricci curvature of the manifold is non-negative and the drift vector field satisfies appropriate regularity conditions. We note here that the Kendall-Cranston coupling also works for elliptic diffusions whose state space is any smooth manifold, and can be applied to even more general situations [32].
The above techniques fail for diffusions which are not elliptic, as there is no natural Riemannian metric intrinsic to such diffusions. However, an important class of non-elliptic diffusions has attracted attention in recent times: namely, the hypoelliptic diffusions. These are diffusions (X(t) : t ≥ 0) such that X(t) has a smooth density with respect to Lebesgue measure for each t > 0. They arise naturally in a variety of contexts: for example, modelling the motion of a particle following Newton's equations under a potential, white noise random forcing and linear friction (the kinetic Fokker-Plank diffusion, 31), describing stochastic oscillators (the Kolmogorov diffusion, 26), quantum mechanics and rough paths theory (Brownian motion on the Heisenberg group, 27,12) and modelling of macromolecular systems [15]. All these examples place a premium on gaining a good understanding of the behaviour of hypoelliptic diffusions. In particular, the construction of successful couplings for these diffusions immediately implies, via Aldous' inequality (2), that the total variation distance between the laws of two such diffusions started from distinct points converges to zero as time goes to infinity. Furthermore, estimates on the coupling time distribution deliver bounds on the convergence rate. This, in turn, yields estimates of rate of convergence to stationarity, when a stationary measure exists. Moreover, these couplings can also be used to furnish gradient estimates for harmonic functions corresponding to the generators of the diffusions via purely probabilistic means [10,9,2].
At the time of writing, coupling of hypoelliptic diffusions have only been studied for rather specific examples. Hypoelliptic diffusions can be viewed as "high dimensional processes driven by low dimensional Brownian motions", which suggests that the goal of producing successful Markovian couplings of such diffusions may be best achieved by learning how to produce Markovian couplings of the driving Brownian motion together with a (typically finite) collection of path functionals. These couplings, sometimes described as exotic couplings, were first studied in [6]. This described successful Markovian couplings for the Kolmogorov diffusion of order one (given by a Brownian motion B along with its running time integral t 0 B(s) ds) and Brownian motion on the Heisenberg group (a two-dimensional Brownian motion (B 1 , B 2 ) together with its Lévy stochastic area t 0 B 1 (s) dB 2 (s) − t 0 B 2 (s) dB 1 (s)). [22] showed how to generate successful Markovian couplings for the Kolmogorov diffusion of any finite order n (a Brownian motion B along with its n− 1 iterated time integrals · · · 0≤s1≤···≤si≤t B(s 1 ) ds 1 ds 2 . . . ds i for 1 ≤ i ≤ n−1). Later [19,20] described a construction of a successful Markovian coupling of Brownian motion on the step-two free nilpotent Lie group of any underlying finite dimension n (corresponding to an n-dimensional Brownian motion (B 1 , . . . , B n ) together with the n 2 stochastic areas t 0 B i (s) dB j (s) − t 0 B j (s) dB i (s) for 1 ≤ i < j ≤ n, or, using vector notation, B together with the alternating vector-product B ∧ d B). [21] described how to couple scalar Brownian motion together with local time, and used this to couple a rather degenerate diffusion arising in stochastic control theory. Even in these rather simple examples, the coupling constructions turn out to be quite complicated. Simpler cases use careful combinations of reflection coupling and synchronous coupling (making Brownian increments agree): [19,20] show that one also needs to use more varieties of coupling (for example what might be called "rotation couplings") when coupling all stochastic areas for Brownian motion in dimension of 3 or greater.
In this article, we provide constructions of immersion (in fact, Markovian) couplings for a considerable range of diffusions of the form given in (1) with k = 2, based on polynomial vector fields V i . This is a significant step beyond [19,20] in the development of the programme of understanding coupling for hypoelliptic diffusions, albeit limited here to the case of an underlying two-dimensional Brownian motion. Before going into the detailed description of the problem, we define the parabolic Hörmander condition which will be a crucial assumption in the coupling construction.
Consider the following sets of vector fields: where [U, V ] denotes the Lie bracket of the vector fields U and V . Set V j (x) = span{V (x), V ∈ V j }. We will make the following assumption: (PHC) The vector fields V 0 , V 1 , . . . V k satisfy the parabolic Hörmander condition, i.e., j≥0 V j (x) = R d for each x ∈ R d .
Subject to suitable regularity conditions, (PHC) is a necessary assumption if we want to construct successful couplings from arbitrary pairs of starting points. To see this, consider the distribution of sub-spaces {D(x) = k≥0 V j (x), x ∈ R d } generated by j≥0 V j (that is, the smoothly varying subspace of the tangent space spanned by these vector fields). [25] showed that if D is of "locally finite type" (in particular, if the vector fields V 0 , V 1 , . . . , V k are real analytic), then it has the maximal integral manifold property, i.e., for each point x ∈ R d , there exists an immersed submanifold S (called an integral manifold) containing x with the property that its tangent bundle coincides with the distribution D. Moreover, S can be chosen so that any other integral manifold which intersects S must be an open submanifold of S (note that S need not be complete in R d ). In this case, R d splits into disjoint maximal integral manifolds. It follows from support theorems [16, for example] that if a diffusion starts from a point inside one maximal integral manifold then almost surely it must stay in this manifold for all time. Thus, under regularity conditions such as real analyticity, if (PHC) does not hold, then there must be at least two disjoint maximal integral manifolds. Consequently, two copies of the diffusion started from points in different maximal integral manifolds will almost surely never meet. In order to make progress towards answering the general question of whether it is possible to construct successful immersion couplings of a diffusion satisfying (PHC) from arbitrary pairs of distinct starting points, this article considers a simplification. It will be convenient to view in a mild abuse of notation, w 3 denotes a (d − 2)-dimensional vector). We assume that our diffusions satisfy (1) when the drift vector field V 0 = 0, the driving Brownian motion is two-dimensional (i.e. k = 2) and the driving vector fields V 1 and V 2 are polynomial functions of the driving Brownian motion. Specifically, for d ≥ 3 and for each w = (w 1 , w 2 , w 3 ) ∈ R × R × R d−2 , suppose that X can be written as where (W 1 , W 2 ) is a two-dimensional standard Brownian motion and X 3 can be written in vector format as satisfying the Stratonovich differential equation Here σ 1 , σ 2 are ((d − 2)-dimensional) vector-valued polynomials: Lemma 1 below describes exactly when the system (4) satisfies (PHC).
Several important examples of hypoelliptic diffusions fall in this category, including Brownian motion on the Heisenberg group [27,6]. The problem of immersion coupling for diffusions in the form of (4) makes a useful next step in the bigger program of coupling hypoelliptic diffusions because of the following reasons. Firstly, the zero drift condition helps to simplify (PHC) and give a clearer exposition, although we believe that the methods developed here can be used even when the drift is non-zero but satisfies certain growth conditions. Secondly, the polynomial form of the driving vector fields ensures that X 3 can be written using linear combinations of monomial Stratonovich integrals of the form ( W i 1 W j 2 • dW 2 : i + j ≤ n) and thus, the problem reduces to successfully coupling the driving Brownian motions along with these integrals. Moreover, these polynomial vector fields can be used to approximate a large class of real analytic and nilpotent vector fields and we hope that our technique will extend to more general diffusions driven by such vector fields. Thirdly, as proved in Lemma 1 below, (PHC) for this class of diffusions simplifies to a non-singularity condition for a matrix formed by the vectors a l,m i . Finally, as described in [19] in the simpler context of coupling stochastic areas, successful Markovian coupling strategies can be achieved using only reflection/synchronous coupling of Brownian motions when the driving Brownian motion is two-dimensional (for example, Brownian motion on the Heisenberg group), but for higher dimensional analogues it is necessary to employ rotation couplings (which is to say, control strategies using orthogonal matrices), and this complicates the coupling strategy considerably. As we will see, the restriction to a two-dimensional driving Brownian motion in the case of (4) similarly allows for a rather explicit coupling construction using only synchronous coupling of W 2 at all times together with judicious switching between synchronous and reflection phases for W 1 . However, we anticipate that one of the challenges of dealing with higher-dimensional Brownian motions will be to deal with complexity entailed by no longer being able to keep one coordinate synchronously coupled and in agreement for all time. We plan to address the complexities of the higher dimensional case in a subsequent article.
In the remainder of this section, we will show that, in order to successfully couple two copies X and X of our diffusion (3) started from distinct points, it suffices successfully to couple simultaneously the driving Brownian motions along with integrals of the form where (x 1 , x 2 ) ∈ R 2 and ∂ i denotes the partial derivative with respect to the i th coordinate Computing the Stratonovich differential of Ψ 1 (w 1 +W 1 (t), w 2 + W 2 (t)), and then integrating this differential, amounts to establishing an integration-by-parts relation between certain Stratonovich integrals with respect to W 1 and other Stratonovich integrals with respect to W 2 , holding up to addition of a function of W 1 and W 2 whose coupling follows directly from coupling of (W 1 , W 2 ): Hence X 3 can be expressed as the sum of a function of W 1 and W 2 and a Stratonovich integral with respect to W 2 alone: 2 matrix formed by arranging in a row the n(n−1) Proof. Let ∂ 1 , ∂ 2 , . . . , ∂ d represent the standard basis vectors of R d . The diffusion X can be expressed in the form where w = (w 1 , w 2 , w 3 ) and the vector fields V 1 , V 2 are given by . . ]. Denote by n 1 (I), n 2 (I) the number of occurrences of 1, 2 respectively in the word I. We claim that if I is of length 2 or more and i N −1 = 1, i N = 2 then We prove this by induction on the length of the word I. For length 2, (7) follows from the definition of φ in (5): For any N > 2 assume that the induction hypothesis holds for words of length less than N .
and examine the case of I * = (i 0 , I). By the induction hypothesis, Observe that σ 1 , σ 2 depend only on x 1 , x 2 , so that the form of V I implies that (7) in that case also. Note that the coefficients of ∂ 1 and ∂ 2 are zero in V i1,...,iN−2,1,2 . Now observe that V 1 (x) and V 2 (x) are linearly independent for each x ∈ R d . Furthermore, V i1,...,iN−2,2,1 = −V i1,...,iN−2,1,2 while V i1,...,iN−2,1,1 = V i1,...,iN−2,2,2 = 0, so the coefficients of ∂ 1 and ∂ 2 for V I are zero whenever the word I has length greater than or equal to 2. Thus, for (PHC) to hold, the subspace spanned by {V I : length(I) ≥ 2} must have dimension d − 2. By (7) and the definition of Σ(x 1 , x 2 ), this is equivalent to requiring that Σ(x 1 , x 2 ) has rank d − 2. But φ is a vector polynomial, so Σ( which is non-singular. From the continuity of the determinant, we conclude that Σ * (y 1 , y 2 ), and hence Σ(y 1 , y 2 ), has rank d − 2 for all y in an open neighborhood of x. Conversely, if Σ(x 1 , x 2 ) has rank less than d − 2 for some x ∈ R d , then there exist constants c 1 , . . . , c d , not all zero, such that ) is a vector polynomial in x 1 , x 2 of degree n, Hence, Thus, Σ(y 1 , y 2 ) has rank less than d − 2 for all y ∈ R d . From the connectedness of R d , Σ(x 1 , x 2 ) has rank d − 2 for some x ∈ R d if and only if Σ(w 1 , w 2 ) has rank d − 2, completing the proof of the lemma.
Remark 2. It follows from the reasoning in the proof of Lemma 1 that V I = 0 whenever the length of the word I strictly exceeds n (the maximal degree of the polynomial coefficients σ 1 , σ 2 ), regardless of whether (PHC) is satisfied or not. From this observation it follows that any diffusion of the form implied by (4) is nilpotent [5]. Nilpotent diffusions serve as the starting point for many analyses of hypoelliptic diffusions owing to their simplicity. Consider the task of immersion coupling two copies X and X of the diffusion starting from w and w respectively, using driving Brownian motions (W 1 , W 2 ) and ( W 1 , W 2 ). Reflection coupling can be used to bring together the driving Brownian motions first. Thus there is no loss of generality in assuming that the Brownian starting points agree: (w 1 , w 2 ) = ( w 1 , w 2 ). Referring to the representation (6), it suffices to couple the two diffusions with starting points differing only in the third, vectorial, coordinates w 3 , w 3 . (This is because coupling of the summands Ψ 1 (w 1 + W 1 (t), w 2 + W 2 (t)) and Ψ 1 (w 1 + W 1 (t), w 2 + W 2 (t)) in (6) is immediately implied by coupling of (W 1 , W 2 ) and ( W 1 , W 2 ).) Denote by I(t) the vector formed by t 0 W1(s) l+1 W2(s) m • dW2(s) (l+1)! m! : 1 ≤ l + 1 + m ≤ n and similarly I(t). Decomposing the polynomials given by the integrands φ(w 1 + W 1 (s), w 2 + W 2 (s)) and φ(w 1 + W 1 (s), w 2 + W 2 (s)) according to whether or not monomials involve W 1 (respectively W 1 ), the last d − 2 coordinates of X * , X * can be written in vector form as X * 3 , X * 3 where X * 3 (t) = w 3 + P (w 2 , w 2 + W 2 (t)) + Σ(w 1 , w 2 )I(t) , X * 3 (t) = w 3 + P (w 2 , w 2 + W 2 (t)) + Σ(w 1 , w 2 ) I(t) , where P is a polynomial that arises from Stratonovich integration (with respect to W 2 ) of monomials in w 2 + W 2 alone. By Lemma 1, (PHC) implies that Σ(w 1 , w 2 ) has rank d − 2, hence w 3 , w 3 both lie in the space spanned by the columns of Σ(w 1 , w 2 ). Thus, there are z * 3 , z * 3 ∈ R n(n−1)/2 such that . It follows that if we can successfully couple from arbitrary pairs of starting points (w 1 , w 2 , z * Theorem 3. Consider the diffusion X(t) = (X 1 (t), X 2 (t), X 3 (t)) ∈ R d (for d ≥ 3, considering X 3 as a (d − 2)-dimensional process) defined by the following stochastic differential equation: where (W 1 , W 2 ) is a two-dimensional standard Brownian motion and σ 1 , σ 2 are polynomial vector fields such that (PHC) holds. Then there exists a successful Markovian coupling of two copies of the above diffusion starting from any pair of distinct points.
Theorem 3 summarizes the qualitative content of (and is a direct consequence of) Theorem 10 stated and proved in Section 4 below, but omits the tail estimate on the coupling time distribution. It is stated here as a separate theorem in order to highlight how the Brownian integral couplings constructed in the subsequent sections connect to the general theme of coupling hypoelliptic diffusions.

Technical preliminaries
To facilitate inductive arguments in the following proofs, we fix a total ordering of the discrete simplex ∆ n = {(a, b) ∈ Z 2 : 0 ≤ a, b, a + b ≤ n} (for some fixed n ≥ 1). We achieve this by specifying a function f : ∆ n → Z and defining the order by We choose f (a, b) = 2na + (2n + 1)b: totality of follows since f takes values in the totally ordered set Z; antisymmetry holds by a parity argument showing that f (a, b) = f (c, d) if and only if a = c and b = d; transitivity is immediate. Note that the -maximal element of ∆ is (0, n). We remark that can be replaced by any other total ordering extending the partial ordering induced by considering a + b.
From here onwards, to save cumbersome notation, (W 1 , W 2 ) will denote a two-dimensional Brownian motion starting from a general point (W 1 (0), W 2 (0)) ∈ R 2 . Let (a, b) ∈ ∆ n be the index representing the Brownian Stratonovich integral . We shall refer to such integrals as monomial Stratonovich integrals. Consider the -ordered collection of Brownian integrals (deeming W 1 to have precedence over all In the following, it is only necessary to consider c ≥ 1; in the case c = 0, I (c,d) reduces to a monomial in W 2 and so W 2 and its coupled counterpart will take equal values for all time in our coupling construction. Notice also the following: if Scaling arguments play a major rôle in the study of these couplings. The following lemma records a simple but crucial fact about scaling for Stratonovich integrals of Brownian motions. Consider the scaling transform S r , defined for any scalar r by Lemma 4. The following distributional equality holds: Proof. This is a direct consequence of linearity of Stratonovich integration taken together with the Brownian scaling property Note that is a total ordering extension of the partial order on ∆ 1 given by the scaling degree, deg((a, b)) = deg(I (a,b) ) = a + b + 1. Monomial Stratonovich integrals of lower degree evolve in time faster than those of higher degree; this is a key reason why our inductive arguments will work.
The following two technical lemmas complete the list of technical preliminaries.
Lemma 5. Let B t be a standard Brownian motion adapted to a filtration (F t : t ≥ 0). Let Y t be a random process and let τ be a stopping time, both adapted to the same filtration.

Proof. Consider the stopping time
The Burholder-Davis-Gundy (BDG) inequality (see for example 17, p. 163), respectively the monotonicity of the Lebesgue integral, implies that, for any M, T > 0, there exists a constant C ′′ > 0 not depending on M, T such that Under the hypothesis of (i) it follows that, for arbitrary M ≥ 1 and T ≥ ε, where the last inequality follows from the hypothesis of (i) together with a Markov inequality argument. Similarly, The first assertion of (i) now follows by optimization. To be explicit, set T = √ εx and M = ε −1/8 √ x in (13) and use ε ∈ (0, 1) (second inequality) followed by x ≥ ε 1/4 (third inequality) to obtain The second assertion of (i) follows similarly: set T = √ εx 1/4 and M = ε −1/4 x 1/4 in (14), and use ε ∈ (0, 1) (second inequality) followed by x ≥ ε 1/2 (third inequality) to obtain The proof of (ii) follows along similar lines (using M = √ εx and T = ε −1/2 √ x for the first assertion and M = √ εx 1/4 and T = ε −1/4 x 1/4 for the second assertion).
Lemma 6. Let X i , τ i be non-negative random variables adapted to a given filtration (F i : i ≥ 1) and satisfying for some α, β > 0 and x, t ≥ 1, where C α , C β are positive constants that do not depend on i. Then for any γ < α ∧ β there is ε 0 > 0 (depending on α, β and γ) such that for some constant C ′ β depending only on β, and t ≥ 1.
For any ε > 0, we can write for any t ≥ 1, where the second inequality is obtained using (16), and the third by using (17) together with the specific choice of ε 0 . We have used max{1, C β } in place of C β to account for the situation when where we have used the Markov inequality to obtain the second inequality above and (17) together with the choice of ε 0 for the third inequality. The above estimates can be used with (18) to show which proves the lemma.

Coupling BM(R 2 ) and a single monomial Stratonovich integral
In this section, we construct couplings of (W 1 , W 2 , I (a,b) ) and ( W 1 , W 2 , I (a,b) ) for (a, b) ∈ ∆ n , a ≥ 1, b ≥ 0. The cases a + b = 1 (which implies a = 1, b = 0) and a + b > 1 differ in complexity, so we first consider the simpler case a + b = 1 (Lemma 7). This case is significantly easier to describe, and corresponds to the case of Brownian motion on the Heisenberg group already treated in [6] and [19] as noted in Remark 8 below; however the present technique carries through to the case a + b > 1 (Lemma 9). Thus, the construction given in the simplest non-trivial case (Lemma 7) is a good model for the general approach. Lemma 9 deals with coupling just one monomial Stratonovich integral of more general form, but this is an essential component of the inductive argument that will be required to establish coupling for a finite set of monomial Stratonovich integrals in Section 4. We will use some further notation, namely

Case of simplest non-trivial monomial Stratonovich integral
The next lemma establishes a coupling result based on a driving 2-dimensional Brownian motion W 1 , W 2 plus the single monomial stochastic integral I (1,0) .

Lemma 7.
For any γ < 1 3 , there exists a successful Markovian coupling P γ of (W 1 , W 2 , I (1,0) ) and Proof. We first outline the general proof strategy. At all times W 2 and W 2 will be synchronously coupled; hence W 2 (t) = W 2 (t) for all t ≥ 0. Brownian scaling as given in Lemma 4 can be used to re-scale to a unit difference between the two stochastic integrals, thus reducing all cases to the case of starting points (w 1 , w 2 , i) and (w 1 , w 2 , i − 1) for w, i ∈ R. The coupling decomposes naturally into disjoint cycles. Each cycle consists of a patterned alternation between phases of reflection and synchronous coupling for W 1 and W 1 , so that the distance between the coupled processes (W 1 , W 2 , I (1,0) ) and ( W 1 , W 2 , I (1,0) ) at the end of the cycle is roughly a fixed proportion of the distance between them at the start of the cycle. At the end of each cycle, the next cycle is constructed by applying the same coupling strategy as the previous cycle after appropriately re-scaling the coupled processes via Lemma 4, so that there is unit re-scaled distance between them at the start of the next cycle. Lemma 6 is then used to show that the end-points of these cycles have an accumulation point which corresponds to a finite coupling time. As the coupling strategy within each cycle is the same (modulo re-scaling), it is sufficient to describe in detail only the construction of the first cycle. Note that iterated cycles and re-scaling to achieve successful coupling have been used to couple Kolmogorov diffusions by Ben Arous et al. [6], Kendall and Price [22], Banerjee and Kendall [3].
A: Description of the first cycle As noted before, the scaling argument represented by Lemma 4 shows there is no loss of generality in assuming that |∆I (1,0) (0)| = 1. Choose and fix a constant R > 1. The estimates derived for the first cycle will be uniform with respect to R > 1 and an optimal choice of R will be made at the end of the proof. In the proof, C, C 1 , C 2 , . . . will denote generic positive constants whose values will not depend on R, w 1 , w 2 , i and whose value might change from line to line. The first cycle consists of three phases whose end-points are defined by the following stopping times: Phase 1: Using Brownian scaling, independent Brownian increments, and eigenvalues of the Laplacian with Dirichlet boundary conditions on [−1, 1], together with W 1 (0) = W 1 (0), it follows that Now consider the increment of ∆I (a,b) over the time interval [0, T 1 ]. Since the second Brownian coordinates satisfy W 2 = W 2 throughout the entire coupling, and the first Brownian coordinates W 1 , W 1 are reflection coupled hence independent of W 2 = W 2 , we may re-write the Stratonovich integral for the increment as an Itô integral: On the other hand, where the last inequality follows from (20) using . By a Markov inequality argument, it now follows for any x > 0 that But we have assumed that |∆I (1,0) (0)| = 1, so for any x ≥ 2 ). Since ∆W 1 (T 1 ) = R −1 by construction of T 1 , it follows that T 2 − T 1 has the same distribution as the hitting time of a one-dimensional Brownian motion on the level −R sgn(∆W 1 (T 1 ))∆I (1,0) (T 1 ). Thus, for x ≥ 2 and t > 0, we can assert that where the last inequality is a consequence of (21) and a hitting time estimate for Brownian motions derived from the reflection principle. Taking x = t 1/6 in the above expression and recalling that R > 1, The above expression gives a useful bound on the probability P [T 2 − T 1 > t] only when t ≥ (5CR) 3 . We therefore adjust the above bound (using C as a new generic positive constant): Note that ∆I (1,0) (T 2 ) = 0 follows from the definition of T 2 .
Phase 3: Using reflection coupling, and conditioning on the past at time T 2 , we may view T 3 − T 2 as the hitting time of level 1 2R by a standard Brownian motion. Employing the reflection principle for Brownian motion Moreover, for x > R −2 , H > R −2 , and once again re-writing the Stratonovich integral of ∆I (1,0) as an Itô integral, (23), Tchebychev and BDG inequalities) Moreover, the combined effect of the estimates in (20), (22) and (23) can be summarized as The estimates (24) and (25) give bounds on the difference of the integrals I (1,0) and I (1,0) at the end of the first cycle and the time taken to complete the first cycle respectively.
B: Describing subsequent cycles and successful coupling For t ≥ T 3 , define further stopping times T k , k > 3, such that for any k ≥ 1, is the time of completion of the j th phase of the first cycle constructed above for the re-scaled processes The concatenation of these cycles does in fact lead to a successful coupling. The proof of this follows from two facts: (i) lim k→∞ ∆I (1,0) (T 3k ) = 0, meaning that the coupled processes, observed at the end-points of the cycles, come arbitrarily close as the number of cycles becomes large, and (ii) lim k→∞ T 3k < ∞ almost surely, meaning that the end points of these cycles have a finite accumulation point T ∞ , so that the concatenation completes in finite time.
Continuity of Brownian motion and stochastic integrals then implies successful coupling at time T ∞ .
To prove that the coupling is successful in finite time almost surely and that the coupling time has a power law tail given by (19), apply Lemma 4 to (24) and (25): (X * k , τ * k /R 3 ) satisfies the hypotheses of (X k , τ k ) of Lemma 6 with α = 2/5 and β = 1/3. Thus, by Lemma 6, for any Now observe the following product collapses because of the definitions expressed by (26): Thus, for any 0 < γ < 1/3, the above coupling construction with R = max{R 0 , R γ } gives the required successful coupling satisfying (19).
Remark 8. Recall the Brownian motion in the Heisenberg group started at (w 1 , w 2 , i), defined as the R 3 valued process given by where (W 1 , W 2 ) is a two-dimensional Brownian motion started at (w 1 , w 2 ). Lemma 7 is of independent interest as it gives a successful Markovian coupling of Brownian motions on the Heisenberg group started at (w 1 , w 2 , i) and (w 1 , w 2 , i) with explicit bounds on the tail probabilities of the coupling time. To see this, note that by the Itô formula, From this, we obtain Thus, the successful coupling construction given in Lemma 7 for (W 1 , W 2 , I (1,0) ) and ( W 1 , W 2 , I (1,0) ), started from (w 1 , w 2 , i/2) and (w 1 , w 2 , i/2) respectively, is also a successful coupling of the corresponding Brownian motions on the Heisenberg group started from (w 1 , w 2 , i) and (w 1 , w 2 , i).
Couplings of Brownian motions on the Heisenberg group have appeared in several papers in recent times: [6] and [19] have constructed successful Markovian couplings of Brownian motions for the Heisenberg group. Kendall [20, Theorem 3.1] established some coupling time distribution asymptotics for the coupling constructed in [19], under some limiting operation on the starting points. But our result gives explicit bounds on the tail probabilities of the coupling time for each t and each pair of starting points (w 1 , w 2 , i) and (w 1 , w 2 , i) (in fact, this coupling can be extended to general pairs of distinct starting points (w 1 , w 2 , i) and ( w 1 , w 2 , i) and associated bounds can be derived

Case of general monomial Stratonovich integral
The next lemma generalizes the previous coupling construction, establishing a coupling result based on a driving 2-dimensional Brownian motion plus a single monomial stochastic integral: (W 1 , W 2 , I (a,b) ) for a single fixed (a, b) ∈ ∆ n with a ≥ 1, b ≥ 0 and a + b > 1. Recall from Section 2 that f (k, l) = 2nk + (2n + 1)l.
Lemma 9. For any (a, b) ∈ ∆ n with a ≥ 1, b ≥ 0, a+b > 1, there exists R 0 > 1 such that for each R ≥ R 0 , we can obtain a successful Markovian coupling construction P R of (W 1 , W 2 , I (a,b) ) and ( W 1 , W 2 , I (a,b) ) starting from (w, Rw, i) and (w, Rw, i) respectively, with coupling time T R,(a,b) , such that: (i) There are positive constants γ, C not depending on R such that, for large t, In the interval [0, T R,(a,b) ] we identify the active region S R,(a,b) , (a,b) ) are unions of countable sequences of disjoint random closed intervals, where for each sequence of intervals the left-end-points of the intervals form an increasing sequence. Writing the total length of S R,(a,b) by |S R,(a,b) |, the following holds for large t, (ii) There are positive constants α, C not depending on R such that for large t, For convenience, we will prove the inequalities (28), (29) and (30) for t ≥ 1.
Proof. As before, W 2 and W 2 will be synchronously coupled at all times so we may take W 2 = W 2 . Brownian scaling (Lemma 4) can be applied to ensure the monomial stochastic integrals differ by 1: so it suffices to consider starting points (w, Rw, i) and (w, Rw, i − 1) for w, i ∈ R. Let γ, δ, C, C 1 , C 2 . . . be generic positive constants not depending on R, w, i, (but often depending on a and b) whose values might change from line to line. The proof uses some martingale estimates, so we will use the decomposition of the Stratonovich integral I (a,b) in Itô integral form: In contrast with the case of Lemma 7, here the Stratonovich integral has a drift component if b ≥ 1.
As in the previous lemma, the coupling decomposes into disjoint cycles, and the successive cycles are connected via scaling. We describe the first cycle and then discusses the total effect of this and subsequent cycles on finiteness and moment estimates for the coupling time.
A: Description of the first cycle The first cycle consists of five phases. The coupling strategy alternates between synchronous coupling and reflection coupling of W 1 and W 1 between the phases. We will first describe each phase in terms of an arbitrary value of the tuning parameter R ≥ R 0 > 1. The estimates derived for the first cycle will hold uniformly with respect to R > 1 and the appropriate lower bound R 0 for R will arise in the course of the proof and be specified at the end of the coupling construction. The end-points of the five phases are defined by the following stopping times. Initially W 1 (0) = W 1 (0) (W 2 = W 2 throughout.) 1: θ 1 = inf{t ≥ 0 : W 2 (t) = RW 1 (t) and |W 1 (t)| ≥ R 2n }, synchronous till W 1 hits R −1 W 2 and |W 1 | ≥ R 2n , and note W 1 (θ 1 ) = W 1 (θ 1 ); reflection till ∆W 1 hits 0, and note W 1 (λ 1 ) = W 1 (λ 1 ); 5: β 1 = inf{t ≥ λ 1 : W 2 (t) = RW 1 (t)}, synchronous till RW 1 − W 2 hits 0, and note W 1 (β 1 ) = W 1 (β 1 ).
Note that the first and last phases both use synchronous coupling. However we do not amalgamate these across cycles, since at the β k times we have RW 1 = W 2 as well as W = W .
We first estimate the tail probability of θ 1 as follows. If t ≥ 1 then To see this, note that θ 1 is obtained by starting a planar Brownian motion located at distance √ 1 + R 2 w along the line W 2 = RW 1 from the origin, and running it till it hits the diagonal W 2 = RW 1 at a distance at least R 2n from the origin. By Brownian scaling and rotational invariance of planar Brownian motion, θ 1 is stochastically dominated by R 4n+2 θ ′ 1 , where Here 2 were replaced by √ 1 + R −2 then the stochastic domination would become an equality (recall, R > 1). If L(t) denotes the local time of W * 1 at 0 at time t and ζ(t) denotes the inverse local time, then W * 2 (ζ(t)) = C(t), where C is a Cauchy process starting at √ 1 + R 2 w/R 2n+1 . If L(θ ′ 1 ) > s, then the continuity of L implies θ ′ 1 > ζ(s). The range of ζ is a subset of the set of times where the monotone function L increases (namely, the times where W * 1 = 0), so θ ′ 1 > ζ(s) yields √ 2 > W * 2 (ζ(s)) from the definition of θ ′ 1 . Hence, for t ≥ 1, and u = C −1 2 log t for a certain positive constant C 2 , By the Lévy transform, the local time process (L(s) : s ≥ 0) has the distribution of the running supremum of Brownian motion, so To bound the second probability in (33), recall that the Cauchy process C is a pure jump Lévy process. Consequently, the increments (C(j) − C(j − 1) : 1 ≤ j ≤ ⌊u⌋) are i.i.d. with a common Cauchy distribution. If sup s≤u |C(s)| ≤ √ 2 holds then |C(j) − C(j − 1)| ≤ 2 √ 2 for 1 ≤ j ≤ ⌊u⌋, and therefore (using positive constants C 1 , C 2 not depending on w) Applying these bounds to (33), The required bound (32) follows by taking u = C −1 2 log t and choosing a suitable C bearing in mind that t ≥ 1.

Phase 2:
This phase employs reflection coupling between W 1 and W 1 , and runs from time θ 1 till the stopping time Thus, for t ≥ 1, applying successively reflection coupling and Brownian scaling, Consider the telescoping sum (for s ≥ 0), Since As W 1 and W 1 are reflection coupled in [θ 1 , τ 1 ], therefore (37) Writing W 1 (s) = (W 1 (s) − W 1 (θ 1 )) + W 1 (θ 1 ) and using R > 1 and |W 1 (θ 1 )| ≥ R 2n as well as (37), If b = 0 then the right-hand side of (36) simplifies, and for x > 2 a−1 it is immediate that If b ≥ 1, we can use (36) with (38) to obtain Now introduce the requirement that x ≥ 2 a+b−1 , so that ( Applying this together with |W 2 (θ 1 )| ≥ R 2n+1 , and then applying a Markov inequality argument, followed by an application of the BDG inequality [17, p. 163] after conditioning on σ{(W 1 (s), W 2 (s)) : s ≤ θ 1 }, From (34), since f (a − 1, b) ≥ 2n + 1 for a, b ≥ 1, Using this estimate in (40), and using a new constant C, we obtain the following when b ≥ 1, when x ≥ 2 a+b−1 , Note that (39) yields an upper bound of 0 when b = 0 (and x > 2 a−1 ). Using (42) in (35), for whatever b, and writing x = 2 a+b−1 M for future convenience of exposition, if M > 1 then We now rewrite (43) and (34) to match the first assertion in part (i) of Lemma 5 (after conditioning on σ{(W 1 (s), W 2 (s)) : s ≤ θ 1 }). For s > θ 1 , we set t = s − θ 1 , To match the indices in part (i) of Lemma 5, set α = 2/b, β = 2 if b ≥ 1, and choose any β > 0 if b = 0. Then (43) is equivalent to the following, holding when M > 1: Note that ε < 1 (since R > 1), so the above implies the weaker inequality, if M > 1 then On the other hand (34) becomes Noting e −C2t ≤ 1/(C 2 t) 2 for t > 0, and then re-scaling time and using n ≥ 1, a + b > 1, we obtain We can now apply the first assertion in part (i) of Lemma 5 to deduce the following. For z > ε 1/4 , Writing Y s in full, this amounts to the following: when x > a2 a+b−1 R −1/4 , and taking A similar procedure leads to a bound concerning Here we need only argue for the case b ≥ 1, as the time integral does not appear for I (a,0) . Referring to (43), but using Choosing B, ε and τ as before, and again conditioning on σ{(W 1 (s), W 2 (s)) : s ≤ θ 1 }, but now To match the indices in part (i) of Lemma 5, set α = 2/(b−1) (for b > 1), and take any β > α. When M > 1, Applying the second assertion in part (i) of Lemma 5, and using γ ′′ = 1 2 (1/(1 ∨ (b − 1))), Applying the inequalities (44) and (46) to the Itô representation of I (a,b) given in (31), we conclude that for any Phase 3: Now, we address the time interval [τ 1 , η 1 ]. In this phase, starting at time τ 1 , synchronous coupling is employed to the driving Brownian motions till W 2 (t + τ 1 ) − W 2 (τ 1 ) hits the level −a −1 sgn(∆W 1 (τ 1 ))(sgn(W 1 (θ 1 ))) a+b−1 . Applying the reflection principle to (W 1 (t+τ 1 )−W 1 (τ 1 ) : t ≥ 0), we can deduce the following estimate related to hitting times of Brownian motion: Consider the fluctuations of ∆I (a,b) on this interval. Using (35), it suffices to address the integrals As this is a synchronous coupling phase, Combining this with the facts that W 1 (θ 1 ) = W 1 (θ 1 ) and We will show that the first term above is small with high probability. If a = 1, or more generally if k = a, then the first term is identically zero. If a ≥ 2 and k ≤ a − 1 then Fix x ≥ 1/R 2n . Recall that |W 1 (θ 1 )| ≥ R 2n , and note firstly that for x ≥ 1/R 2n , by the reflection coupling implications summarized in (37), and secondly by a Tchebychev inequality argument and Doob's L 2 -maximal inequality where the last inequality follows by taking T = (xR 2n ) 4/3 .
Similarly, for x ≥ 2, where the last inequality follows from the computations performed to obtain (52). A similar estimate for P sup τ1≤t≤η1 | A 1 (t)| > x holds by replacing W 1 with W 1 in the above calculations.
To derive an analogous estimate for P sup τ1≤t≤η1 |A 2 (t)| > x , first observe that where the first equality is because conditional on σ{(W 1 (s), W 2 (s)) : s ≤ θ 1 }, W 2 − W 2 (θ 1 ) is independent of τ 1 − θ 1 and the last inequality follows from (41). Using this observation along with the Tchebychev inequality, we obtain where the last inequality follows by taking T = (xR 2n ) 4/3 . Using (54) and recalling |W 2 (θ 1 )| ≥ R 2n+1 , From the above estimates, we can argue the following in case x ≥ 2 2(a+b−1) /R 2n : for some γ > 0 (in fact γ = 1/3) that does not depend on R (the last three probabilities appearing after the first inequality above can be taken to be zero if a + b − 1 − j = 0). By applying the above argument to each term on the right hand side of (51), we obtain (50) are subject to estimates of the same form, based on P sup τ1≤t≤η1 | A 1 (t) − 1| > x and P sup τ1≤t≤η1 |A 2 (t) − 1| > x respectively in place of P sup τ1≤t≤η1 |A 1 (t) − 1| > x , but otherwise using the same arguments. Hence (50) and the above estimates yield the following for x ≥ (a + b − 1)2 2(a+b−1) /R 2n : (56) The above holds for all 1 ≤ k ≤ a; consequently (35) implies that, for x ≥ a(a+b−1)2 2(a+b−1) /R 2n , Now P [η 1 − τ 1 > t] ≤ Ct −1/2 ; so the first assertion in part (ii) of Lemma 5 implies there is γ ′ > 0, not depending on R, such that for x ≥ a(a + b − 1)2 2(a+b−1) /R 2n But it follows from the definition of η 1 that Together with the above inequality this yields, for x ≥ a(a + b − 1)2 2(a+b−1) /R 2n , To estimate the integral η1 τ1 ∆W 1 (s)W 1 (s) a−k W 1 (s) k−1 W 2 (s) b−1 ds for b ≥ 1, we can once more use the synchronous coupling of W 1 , W 1 on [τ 1 , η 1 ] to show that for any t ∈ [τ 1 , η 1 ], For b ≥ 1 we may use (56) to show, for x ≥ 2 2(a+b−1) , Using the above and the fact that P [η 1 − τ 1 > t] ≤ Ct −1/2 in the second assertion in part (ii) of Lemma 5, we obtain γ, δ > 0 not depending on R such that Recalling the expression of I (a,b) in terms of the Itô integral and the time integral given in (31), we obtain from (58) and (59), Phase 4: The next phase occurs in the time interval [η 1 , λ 1 ]. In this phase, after time η 1 the Brownian motions W 1 and W 1 are subjected to reflection coupling till they meet. Applying the reflection principle, and using the fact that |∆W (η 1 )| = 1/(|W (θ 1 )| a+b−1 R b ) together with other consequences of the definitions of the stopping times θ 1 and τ 1 , we see that when t > 0 Once again (35) can be applied, so it suffices to consider the integrals For η 1 ≤ t ≤ λ 1 we can write Recalling that |∆W 1 (η 1 )| = |∆W 1 (τ 1 )| = 1 |W1(θ1)| a+b−1 R b , and bearing in mind that W 1 and W 1 are reflection coupled on [η 1 , λ 1 ], when x ≥ 1 it follows that where the last equality follows from the optional stopping theorem. Fixing x ≥ 2, we can employ (61) and a Tchebychev inequality argument to show The bound P |W 1 (η 1 ) − W 1 (θ 1 )| > xR 2n /4 ≤ C (xR 2n ) 2/3 follows from the calculations leading to (52), where, in fact, we obtained the following bound when x ≥ 1 R 2n : Taking T = (xR 2n ) 4/3 , we obtain the following when x ≥ 2: Similar estimates for P sup η1≤t≤λ1 W2(θ1) > x follow by replacing W 1 by W 1 and W 2 respectively in the above calculations (in the latter case, we use (54)). Using these estimates along with (64) and (62), we obtain for x ≥ 2 a+b , where the last step follows as R > 1. From (61), (65) and the first assertion in part (i) of Lemma 5, it follows that there are δ, γ > 0 not depending on R such that for x ≥ 2 a+b /R δ . Arguing as above, using (61), and (65) but with b − 1 replacing b, and appealing to the second assertion in part (i) of Lemma 5, if b ≥ 1 then Phase 5: The final phase concerns the interval [λ 1 , β 1 ], in which the Brownian motions (W 1 , W 2 ) and ( W 1 , W 2 ) are coupled synchronously till the time β 1 when (W 1 , W 2 ) = W 1 , W 2 ) hits the line We claim there is a positive constant C not depending on R such that To see this, observe that β 1 − λ 1 depends on how far away the Brownian motion (W 1 , W 2 ) is from the line u 2 = Ru 1 at time λ 1 . As W 2 (θ 1 ) = RW 1 (θ 1 ), this distance, in turn, depends on the size of the total duration λ 1 − θ 1 of the previous three phases. Indeed, for any α, x > 0 (to be chosen later), To estimate the first probability above, note that an application of the strong Markov property at time θ 1 allows us to deduce where the last inequality follows from Doob's submartingale inequality applied to the radial part of two-dimensional Brownian motion. The second probability is controlled by conditioning on the past event [|(W 1 , W 2 )(λ 1 ) − (W 1 , W 2 )(θ 1 )| ≤ x] and using the strong Markov property to argue that the hitting time on the line u 2 = Ru 1 by the Brownian motion ((W 1 , W 2 )(t) − (W 1 , W 2 )(λ 1 ) : t ≥ λ 1 ) is stochastically dominated by the hitting time on zero by a one dimensional Brownian motion starting from x. Therefore, From (34), (48) and (61), we deduce that Putting these bounds together, it follows that The target inequality (67) is obtained by taking α = 1/3 and x = t 1/3 in the above bound. From (47), (60) and (66), we see that there exist positive constants C 1 , C 2 , δ, γ not depending on R, w, i such that B: Describing subsequent cycles and successful coupling The above account gives a description of the five phases that constitute the first cycle. Subsequent cycles are defined similarly as follows: For t ≥ β 1 , we apply scaling using Lemma 4 with r = ∆I (a,b) (β 1 ) −1/(a+b+1) and define further stopping times θ 2 , . . . , β 2 corresponding to θ 1 , . . . , β 1 for the scaled process, and continue in this fashion to obtain successive cycles. As in the proof of Lemma 7, in order to show that constructing these cycles leads to a successful coupling we need to show that lim k→∞ ∆I (a,b) (β k ) = 0, and lim k→∞ β k < ∞ almost surely. This would imply that the end points of these cycles have an accumulation point and thus that the coupling is successful in finite time. We now demonstrate that these facts follow from the estimates obtained above, via Lemma 6.
For k ≥ 1, if |∆I (a,b) (β k−1 )| = 0, then the coupling is successful. If the coupling is not where δ is as used in (69) and we adopt the convention that β 0 = 0. Taking τ k = 1 for k ≥ 1, we see that X k , τ k satisfy the hypotheses of Lemma 6, and thus we obtain almost surely. In particular this implies that almost surely Choose and fix any R ≥ R ′ 0 . From (32), (34), (48), (61) and (67), we have α > 0 such that where δ is the same as that used in (69). By (69), observe that The following holds: Thus, for any γ ′ < α ∧ γ(a+b+1) 2 , using Lemma 6 with (X * i , τ * i /R 4n+2 ) in place of (X i , τ i ), we obtain R ′′ 0 ≥ R ′ 0 such that for every R ≥ R ′′ 0 , This shows that the coupling construction represented by P R yields an almost surely successful coupling with coupling time given by T R,(a,b) = lim k→∞ β k . R 0 claimed in the theorem can be taken to be R ′′ 0 . From the coupling construction, we see that the active region S R,(a,b) referred to in the theorem can be written as The estimate on the tail probabilities of |S R,(a,b) |, claimed in the statement of the lemma, follows from Lemma 6 using an argument similar to that given above, after re-scaling by considering |λ k −θ k | |∆I(a,b)(βk−1)| 2/(a+b+1) for τ * k (in fact, it follows from (68) that the tail estimate holds for any γ < 1/2).
Assertion (ii) claimed in the lemma follows first from observing that and then from applying Lemma 6 with ( X * k , M * k ) in place of (X k , τ k ), where The tail estimates for M * k needed to apply Lemma 6 are derived by recalling |W 1 (θ 1 )| ≥ R 2n and applying scaling to deduce for x ≥ 1 where the last step follows from (63).

Simultaneously coupling multiple monomial Stratonovich integrals
This section describes the construction of a successful coupling based on a driving 2-dimensional Brownian motion (W 1 , W 2 ) and the complete finite set of monomial stochastic integrals up to a given scaling degree n, given by (I (a,b) : a ≥ 1, b ≥ 0, a + b ≤ n). The construction uses an inductive strategy; coupling first at the level of monomial stochastic integrals I (k,l) for all (k, l) ≺ (a, b) and then coupling I (a,b) while ensuring that the lower order integrals do not deviate too far from coupled agreement. Recall X (a,b) = W 1 , I (c,d) ; (c, d) (a, b), c ≥ 1 .
We will abbreviate the complete set of monomial stochastic integrals (up to I (0,n) ) as X (a,b) and X are defined in a similar manner.
The main theorem of this article states the existence of this successful coupling and estimates the rate at which it happens. In the following, we will need a simple norm on quantities such as X; we use Euclidean norm viewing X as a vector in the Euclidean space of appropriate dimension.
Theorem 10. For any pair of starting points X(0) and X(0) there exists a successful Markovian coupling construction P of X and X, with coupling time T satisfying the following rate estimate: There are positive constants C, γ such that if t ≥ 1 then Proof. As before, C, γ will denote generic positive constants whose values will change from line to line. The constant R > 1 is a tuning parameter for the coupling construction: its value will be specified later.
By a combination of reflection coupling and then synchronous coupling, we may assume that the starting points satisfy (W 1 , W 2 )(0) = ( W 1 , W 2 )(0) and W 2 (0) = RW 1 (0). We will write this as (X(0), X(0)) ∈ R where At the end of the proof we will check that the rate of coupling is not affected by the time taken to arrange for this.
The main body of the proof is based on induction on the number of -ordered monomial Stratonovich integrals to be coupled. Induction hypothesis: Define ∆X (a,b) = X (a,b) − X (a,b) . For any (a, b) ∈ ∆ n , there exists a successful Markovian coupling between the arrays of monomial Stratonovich integrals X (a,b) and X (a,b) , and between W and W , with coupling time T (a,b) such that for all t ≥ 1 sup{P T (a,b) > t : |∆X (a,b) (0)| ≤ 1, (X(0), X(0)) ∈ R} ≤ Ct −γ .
for positive constants C, γ.
Lemma 7 establishes the inductive hypothesis in the initial case of (a, b) = (1, 0), since then (W 1 , W 2 )(0) = ( W 1 , W 2 )(0) and |∆I (1,0) of (a, b). The inductive step of the proof is as follows: suppose the induction hypothesis is true for (a − , b − ); then it is required to show that the hypothesis is also true for (a, b). The key to this is to conduct a careful analysis of the cycles described informally above. By scaling arguments, it is sufficient to do this for the first cycle, and then to show how scaling arguments can be used to establish suitable convergence over the whole sequence of cycles.
If a = 0, then from the definition of X (a,b) , X (a,b) = X (a − ,b − ) (as remarked in Section 2) and there is nothing to prove. Therefore, we assume a ≥ 1.  I (a,b) ). The three phases of the first cycle have end-points given by the following stopping times.
Phase 1: At the end of this phase ∆X (a,b) (σ 1 ) = (0, ∆I (a,b) (σ 1 )). By the induction hypothesis We need a tail bound on P |∆I (a,b) (σ 1 )| > x for x ≥ 2. Using (71), x ≥ 2 and t ≥ 1, Since x ≥ 2 and |∆I (a,b) (0)| ≤ 1, the second probability satisfies Using the Itô representation of I (a,b) (Equation (31)), By the BDG inequality, while a further application of the Cauchy Schwarz inequality yields Using the Tchebychev inequality and the above two bounds together with Equation (73), if x ≥ 2 and t ≥ 1 then Combining inequalities (72) and (74), and choosing t = Phase 2: During this phase, the driving Brownian motions are synchronously coupled till (W 1 , W 2 ) hits the line y = Rx. This is done to get to the starting configuration of the coupled processes in Lemma 9. Between σ 1 and σ 2 , the two Brownian motions are coalesced and synchronously coupled and hence ∆X (a,b) (σ 2 ) = (0, ∆I (a,b) (σ 1 )).
To get a bound on the tail of the distribution of σ 2 − σ 1 , we rewrite it as follows, using (71): for t ≥ 1, and arbitrary α ∈ (0, 1). The second term above can be estimated in terms of the distance of (W 1 , W 2 ) from the line y = Rx at time σ 1 , in fact following the lines of the proof of (67): where x, α > 0 will be chosen appropriately to optimize the bounds. To estimate the first probability in (76), note that To control the second probability in (76), condition on the event [|(W 1 , W 2 )(σ 1 )−(W 1 , W 2 )(0)| ≤ x] and use the strong Markov property to argue that the hitting time on the line y = Rx by the Brownian motion ((W 1 , W 2 )(t) − (W 1 , W 2 )(σ 1 ) : t ≥ σ 1 ) is stochastically dominated by the hitting time on zero by a one dimensional Brownian motion starting from x. Therefore, Using the above estimates in (76), we obtain Using this and (71), and choosing suitable values of x and α, we obtain γ > 0 such that Phase 3: In this phase, Lemma 9 is used to couple (W 1 , W 2 , I (a,b) ) with ( W 1 , W 2 , I (a,b) ) while controlling the difference between the lower order integrals of the coupled processes. Note that at time σ 3 the array ∆X (a,b) (σ 3 ) of monomial Stratonovich integrals is obtained by appending ∆I (a,b) will be the discrepancy between the coupled sets of integrals at the end of the third phase and thus, it is necessary to control its size. We do this by controlling the size (in an appropriate sense) of each individual integral appearing in ∆X (a − ,b − ) (σ 3 ) and showing the coupling strategy of Lemma 9 does not make this size large. Fix any (k, l) (a − , b − ). Note that, as ∆I (k,l) (σ 2 ) = ∆I (k,l) (σ 1 ) = 0, scaling yields the following distributional equality (where the second equality simply involves rewriting the Stratonovich integral as the sum of an Itô integral and a time integral): where U has the same distribution as |∆I (a,b) (σ 1 )| (k+l+1)/(a+b+1) , and (B 1 , B 2 ) and ( B 1 , B 2 ) are two-dimensional Brownian motions starting respectively from and T R,(a,b) is the coupling time for the coupling construction of given in Lemma 9. Furthermore, U is independent of ((B 1 (t) − B 1 (0), B 2 (t) − B 2 (0)) : t ≥ 0) and Define stopping times θ j , τ j , η j , λ j , β j , j ≥ 1 in the time interval [0, T R,(a,b) ] as in the proof of Lemma 9. As the Brownian motions move together on the intervals [β j−1 , θ j ] and [λ j , β j ], the monomial Stratonovich integral ∆I (k,l) does not change on these intervals.
Together with (75) this implies that when t ≥ 1 Thus, from (71), (77) and (90), when t ≥ 1 B: Describing subsequent cycles and successful coupling After completion of the first cycle, at time σ 3 , we re-scale X (a,b) and X (a,b) according to Lemma 4 by a (random) scaling S R1 such that |S R1 ∆X (a,b) (σ 3 )| = 1 .
Define σ 4 , σ 5 , σ 6 (for the original process) corresponding to σ 1 , σ 2 , σ 3 for the coupled process after scaling exactly as before, and so on. At each stopping time σ 3k , k ≥ 1, we denote by S R k the (random) scaling that renormalizes at 1 the norm of the difference of the re-scaled processes. For any r ≥ 1, t ≥ 0, with inequalities reversed if r ≤ 1.
We achieve this by estimating the tail of the distribution of R −1 1 . Note that if R −1 1 ≤ 1, then from the above relations Thus, for x ≥ 1, if R 1/(a+b+1) > x then where the last inequality is a consequence of (89).
Finally, to show that the coupling is successful and to verify the induction hypothesis for (a, b), it is necessary to show that lim k→∞ σ 3k is almost surely finite and that its law has a power law tail. This follows by applying Lemma 6 to the sum on the right hand side of the expression with (X 2 k ,τ k ) in place of (X j , τ j ) in the lemma, where X k = R 1/(a+b+1) R k , defined for k ≥ 1, and , defined for k ≥ 2, and σ 0 = 0. As the law ofτ k has the same tail as that of σ 3 , it follows from (91) and (92) that if t ≥ 1 then for sufficiently large R. This establishes the induction hypothesis, and so completes the construction of a successful coupling when the starting points of the coupled Brownian motions satisfy (W 1 , W 2 )(0) = ( W 1 , W 2 )(0) and W 2 (0) = RW 1 (0). The argument is completed by showing how to construct the coupling from arbitrary starting points X(0) and X(0) satisfying |X(0)| ≤ 1, | X(0)| ≤ 1. To do this, define the stopping times σ −1 = inf{t ≥ 0 : using reflection coupling, (W 1 , W 2 )(t) = ( W 1 , W 2 )(t)} , σ 0 = inf{t ≥ σ −1 : using synchronous coupling, W 2 (t) = RW 1 (t)} T = inf{t ≥ σ 0 : coupling strategy constructed above, X(t) = X(t)} .

Conclusion
In this article, we have constructed a successful Markovian coupling for the two-dimensional Brownian motion along with a finite collection of its monomial Stratonovich integrals. In the context provided by Theorem 3, this is a further step in the direction of extending Markovian coupling techniques beyond the realm of specific examples towards a more general context. Our method shares some features with an iterative coupling scheme employed in [22] for coupling iterated Kolmogorov diffusions, though the inductive strategy described in the current paper seems to be more robust as one can build iterations within iterations into the coupling, exploiting the inductive approach described here. A natural next step for the general program of coupling hypoelliptic diffusions would be to couple diffusions driven by nilpotent vector fields which do not just depend on the driving Brownian motion but the entire diffusion. The Baker-Campbell-Hausdorff formula can be employed to show that such a coupling can be achieved if one can construct successful Markovian couplings for Brownian motion on the free Carnot group of finite order [5]. The geometry of the Carnot group seems to lend itself particularly to our inductive approach: the Lie algebra U of the Carnot group has a graded structure given by U = U 1 ⊕ U 2 · · · ⊕ U N and there are dilation operators δ t that act by multiplication by t i on the elements of U i while preserving the graded structure. A possible strategy for constructing the coupling in this case would be to use the graded structure in the induction hypothesis and to use the dilation operator to implement the scaling strategy used repeatedly in the above arguments. We will investigate this in future work. The current article also provides quantitative bounds on the distribution of the coupling time. These can be used to obtain estimates on the total variation distance between the laws of the diffusions. Employed in conjunction with the scaling property (Lemma 4), this would lead to gradient estimates for heat kernels and harmonic functions corresponding to the generator of the diffusion [9,10,2]. We note here that it was shown in recent work [3,2] that optimal bounds on total variation distance and good gradient estimates, especially in the case of hypoelliptic diffusions, require non-immersion couplings. However, so far, it has been possible to provide explicit constructions of these couplings only in rather special examples: (generalized) Kolmogorov diffusions [3] and Brownian motion on the Heisenberg group [2]. An important challenge is to find robust non-immersion coupling constructions applicable to a wider framework and then to compare their performance with analogous immersion or Markovian couplings.