Coupling Iterated Kolmogorov Diffusions

The Kolmogorov (1934) diffusion is the two-dimensional diffusion generated by real Brownian motion B and its time integral (cid:82) B d t . In this paper we construct successful co-adapted couplings for iterated Kolmogorov diffusions deﬁned by adding iterated time integrals (cid:82) (cid:82) B d s d t , . . . as further components to the original Kolmogorov diffusion. A Laplace-transform argument shows it is not possible successfully to couple all iterated time integrals at once; however we give an explicit construction of a successful co-adapted coupling method for ( B, (cid:82) B d t, (cid:82) (cid:82) B d s d t ) ; and a more implicit construction of a successful co-adapted coupling method which works for ﬁnite sets of iterated time integrals.


Introduction
The Kolmogorov (1934) diffusion is the two-dimensional diffusion generated by real Brownian motion B and its time integral B d t. Analytic studies of distribution and winding rate about (0, 0) have been carried out by McKean (1963). More recent workers (Lachal 1997;Khoshnevisan and Shi 1998;Groeneboom, Jongbloed, and Wellner 1999;Chen and Li 2003) have considered growth asymptotics, distribution under conditioning, and small ball probabilities. Ben Arous et al. (1995) showed that (B, B d t) can be successfully coupled co-adaptedly, meaning that for any two different starting points (a 1 , a 2 ) and (b 1 , b 2 ) it is possible to construct random processes (A, A d t) and (B, B d t) begun at (a 1 , a 2 ) and (b 1 , b 2 ) respectively, adapted to the same filtration and such that A and B are real Brownian motions with respect to this filtration, which couple successfully in the sense that A T = B T and a 2 + T 0 A d t = b 2 + T 0 B d t for some random but finite time T . The iterated Kolmogorov diffusion is obtained by adding (perhaps a finite number of) further iterated time integrals B d s d t, . . . as components, and the object of this note is to study its coupling properties.
There are many different kinds of coupling: co-adapted or Markovian coupling as described above, co-adapted time-changed coupling, which relaxes the filtration requirements to permit random time-changes (an example is to be found in Kendall 1994); non-adapted coupling, which lifts the filtration requirement; and finally shift-coupling, which relaxes the coupling requirement to permit coupling up to a random time (Aldous and Thorisson 1993). Elementary martingale arguments show a diffusion cannot be successfully coupled if there exist nontrivial bounded functions which are parabolic (space-time harmonic) with respect to the diffusion; more generally a diffusion cannot be successfully shift-coupled if there exist non-trivial bounded functions which are harmonic. The converse statements are also true: absence of non-constant parabolic functions means there exist successful non-adapted couplings (Griffeath 1975;Goldstein 1979), and absence of non-constant harmonic functions means there exist successful shift-couplings (Aldous and Thorisson 1993).
Co-adapted couplings are generally less powerful than non-adapted couplings, but can provide significant links to mathematical notions such as curvature. For example Kendall (1986) describes a co-adapted coupling construction for Brownian motion on Cartan-Hadamard manifolds of negative curvature bounded above away from zero, and shows that there is no successful co-adapted coupling. If it could be shown in this case that successful non-adapted coupling implied existence of a successful co-adapted coupling, then one could use the link with parabolic functions to deduce that all such manifolds must support non-constant bounded parabolic functions; this question from Riemannian geometry is currently open! Furthermore, it is typically much easier to construct co-adapted couplings when they do exist; a matter of major significance when using coupling to explore convergence of a Markov chain to equilibrium (when using Markov chains as components of approximate counting algorithms as expounded in Jerrum 2003; or when implementing Coupling from the Past as in Propp and Wilson 1996). Burdzy and Kendall (2000) explore the difference between non-adapted and co-adapted couplings; see also Hayes and Vigoda (2003), who describe a non-adapted variation on an adapted coupling which provides better bounds for mixing in a particular graph algorithm.
In this paper we extend the results of Ben Arous et al. (1995) for (B, B d t), giving an explicit construction for a successful co-adapted coupling at the level of the twice-iterated time integral (Theorem 3.5). We also give an implicit construction for successful co-adapted couplings for higher-order iterated Kolmogorov diffusions (Theorem 5.14), and note that it is impossible successfully to couple all iterated time integrals simultaneously (Theorem 6.1). Acknowledgements: this work was supported by the EPSRC through an earmarked studentship for CJP. We are grateful to Dr Jon Warren and to Dr Sigurd Assing for helpful discussions. Part of this work was carried out while the first author was visiting the Institute for Mathematical Sciences, National University of Singapore in 2004. The visit was supported by the Institute. Finally, we are grateful to a referee, who asked an astute question which led to a substantial strengthening of the results of this paper.

Parabolic functions and harmonic functions
We begin by sketching some general mathematical considerations. It is possible to derive information about the existence or otherwise of couplings from analytic considerations, albeit in a rather non-constructive fashion. The existence of a successful non-adapted coupling is known to be equivalent to the nonexistence of non-constant parabolic functions (Griffeath 1975;Goldstein 1979). The same is true if we replace "non-adapted coupling" and "parabolic" by "shift-coupling" and "harmonic" (Aldous and Thorisson 1993). (These papers consider the discretetime case: the technical issue of moving to continuous time is dealt with for example in Thorisson 2000.) In general it is known that manifolds which are (for example) unimodular solvable Lie groups will not carry non-constant bounded harmonic functions (Lyons and Sullivan 1984;Kaȋmanovich 1986;Leeb 1993). The iterated Kolmogorov diffusion can be viewed as a Brownian motion on a nilpotent Lie group, so we can deduce the existence of successful shift-couplings for the iterated Kolmogorov diffusion.
We are concerned here with successful couplings rather than successful shiftcouplings, corresponding to parabolic functions rather than harmonic functions. However Cranston and Wang (2000, §3, Remark 3) show that a parabolic Harnack inequality holds for left-invariant diffusions on unimodular Lie groups (and therefore successful shift-couplings exist for such diffusions if and only if successful non-adapted couplings exist). It suffices to indicate how the iterated Kolmogorov diffusion can be viewed as such a diffusion. We outline the required steps.
First observe that there is a homomorphism of the semigroup of paths B under concatenation into the quotient group which identifies paths with the same timelength and the same endpoints and iterated time integrals up to order n. The resulting group is graded by a degree defined inductively by time-integration, and is nilpotent with this grading. It is a Lie group, since it can be coordinatized smoothly by t, B t , and n iterated integrals of the form evolution of B generates the required left-invariant diffusion. Nilpotent Lie groups are unimodular (Corwin and Greenleaf 1990), so the Cranston and Wang (2000) work applies.
Thus at a rather abstract and indirect level we know it is possible to construct successful non-adapted couplings for the iterated Kolmogorov diffusion. However in the following we will show how to construct successful co-adapted couplings; while our general construction ( §5) is not completely explicit, nevertheless it is much more direct than the above, as well as possessing the useful co-adapted property.

Explicit co-adapted coupling for the twice-iterated Kolmogorov diffusion
We now describe a constructive approach to successful co-adapted coupling of Brownian motion and its first two iterated integrals: B, B d t and B d s d t. In later sections we will show how to deal with higher-order iterated integrals.
We use the conventional probabilistic language of "event A n happens eventu-ally in n" to mean, almost surely A n occurs for all but finitely many n (in measuretheoretic terms this corresponds to the assertion that the event n m≥n A n has null complement).

Case of first integral
Coupling of the first two iterated integrals is based on the Ben Arous et al. (1995) coupling construction for (B, B d t); we begin with a brief description of this in order to establish notation.
The coupling control J switches between values +1 ("synchronous coupling") and −1 ("reflection coupling"). In the figure, switches to fixed periods of J = +1 are triggered by successive crossings of ±1 by W .
Co-adapted couplings are built on two co-adapted Brownian motions A and B begun at different locations A(0) and B(0): we shall suppose they are related by a stochastic integral B = B(0) + J d A, where J is a piece-wise constant ±1-valued adapted random function. The coupling is defined by specifying J: so that W is constant on intervals where J = 1 (holding intervals), and evolves as Brownian motion run at rate 4 on intervals where J = −1 (intervals in which W is run at full rate). The coupling is illustrated in Figure 1. So our coupling problem is reduced to a stochastic control problem: how should one choose adapted J so as to control W and V = V (0) + W d t to hit zero simultaneously?
We start by noting that the trajectory (W, V ) breaks up into half-cycles according to successive alternate visits to the positive and negative rays of the axis V = 0. (We can assume V (0) = 0 without loss of generality; we can manipulate W and V to this end using an initial phase of controls!) We adopt a control strategy as follows: if the n th half-cycle begins at W = ±a n for a n > 0 then we compute a level b n depending on a n , with b n ≤ a n ≤ κb n for some fixed κ > 1, and run this half-cycle of W at full rate (J = −1) until W hits ∓b n or the halfcycle ends. If W hits ∓b n before the end of the half-cycle then we start a holding interval (J = 1) until V hits zero, so concluding the half-cycle. Set a n+1 to be the absolute value of W at the end of the half-cycle. We will call the holding interval the Fall of the half-cycle and will refer to the initial component as the Brownian component or BrC. The construction is illustrated in Figure 2.
With appropriate choices for the a n and b n , it can be shown that this control forces (W, V ) almost surely to converge to (0, 0) in finite time. To see this, note the following. By the reflection principle applied to a Brownian motion B begun at 0, Simple dynamical arguments allow us to control the duration of the Fall: Figure 2: Illustration of two half-cycles for the case b n = a n /2, κ = 2, labelling Fall and BrC for first half-cycle.
which we can combine with the following (for x n > 0): We now use a Borel-Cantelli argument to deduce that for all sufficiently large n, so long as n a n √ t n < ∞ , (Bear in mind, we have stipulated that b n ≤ a n ≤ κb n .) Now this convergence is ensured by setting √ t n = x n = a n n 1+α for some α > 0, in which case we obtain Duration of half-cycle n ≤ 1 + κx n a n t n ≤ 1 + κn 1+α a 2 n n 2+2α .
(7) If we arrange for a n ≤ κ/n 2+β then the sum of this over n converges, since we can choose α < 2β/3. Thus we have proved the following, which is a trivial generalization of Ben Arous et al. (1995, Theorem 2.1): Theorem 3.1 Suppose the evolution of (W, W d t) is divided into half-cycles as described above: if the n th half-cycle begins at W = ±a n , then it is run at full rate till W hits ∓b n and then allowed to fall to the conclusion of the half-cycle. (The fall phase is omitted if the half-cycle concludes before W hits ∓b n .) Our control consists of choosing the b n ; so long as a n /κ ≤ b n ≤ min{a n , 1/n 2+β } for all sufficiently large n for some constants κ and β > 0 , then (W, W d t) converges to (0, 0) in finite time.
Remark 3.2 By definition of a n we know a n ≤ b n−1 ≤ 1/(n − 1) 2+β , so it is feasible to choose b n such that a n /κ ≤ b n ≤ min{a n , 1/n 2+β } for all large n.
Remark 3.3 Note that a n is determined by the location of W at the end of halfcycle n − 1.

Remark 3.4
We can assume the initial conditions W 0 = 1, V 0 = 0 (otherwise we can run the diffusion at full rate till V hits zero, as can be shown to happen almost surely, then re-scale accordingly). It then suffices to set b n = min{a n , 1/(n + 1) 2+β }. However this is not the only option; for example Ben Arous et al. (1995) use b n = a n /2. Note, in either case we find a n+1 ≤ b n ≤ a n ≤ κb n for κ = 2.

Controlling two iterated integrals
Inspection of the above control strategy reveals some flexibility which was not exploited by Ben Arous et al. (1995); in the n th half-cycle there is a time T n at which W first hits 0, and we may then hold W = 0 constant (by setting J = 1) and so delay for a time C n , without altering either W or V = W d t. We may choose C n as we wish without jeopardizing convergence of (W, V ) to (0, 0). This flexibility allows us to consider controlling U = W d s d t as follows: we hold at T n for a duration long enough to force U , V to have the same sign: The event [V (T n ) = 0] turns out to be of probability zero, since it can only happen if T n occurs at the very start of the half-cycle, which in turn happens only if W hits zero exactly at the end of the previous half-cycle; that this is a null event will be a weak consequence of the lower bound at Inequality (16) below. The construction is illustrated in Figure 3. Suppose a n ≤ κ/n 2+β as in the previous subsection. If we can show that n C n < ∞ then this strategy results in (W, V ) tending to (0, 0) in finite time ζ, Figure 3: Two consecutive half-cycles for the case b n = a n /2, together with a graph of U against time. The disks signify time points at which there is an option to hold the diffusion to allow U to change sign if required.
with U hitting zero in infinitely many half-cycles accumulating at ζ and therefore also converging to 0 at ζ. To fix notation, let us suppose W is positive at the start of the half-cycle in question. This ensures V > 0 at time T n . So the issue at hand is to control Consider −U at time T n . At the start of the half-cycle we know V = 0 and W = a n , so subsequent contributions make −U more negative and need not detain us. At the start of the previous half-cycle U will be non-negative. Consequently an upper bound for −U at time T n is given by thus (given the work of §3.1) we may suppose that, eventually in n, at time T n the quantity −U is bounded above by Now apply a Borel-Cantelli argument and the reflection principle to show that eventually in n the Brownian component takes time at least a 2 n /(4n 2+2α ) in travelling from a n to a n /2. We deduce almost surely for all sufficiently large n at time T n it must be the case that V = W d t exceeds (a n /2) × time to move from a n to a n /2 ≥ a 3 n 8n 2+2α , Thus C n is bounded above, eventually in n, by This leads to the crux of the argument; we need an eventual upper bound on the ratio a n−1 /a n . First note that the lower bound of Inequality (10), applied to the (n−1) st cycle, shows that eventually in n So it suffices to obtain a suitable lower bound on a n , the value of W at the end of half-cycle n − 1, in terms of − T n−1 W d t and holding eventually in n. Moreover we may ignore the Fall component of half-cycle n − 1 so long as the lower bound is smaller than b n−1 , and treat W over the whole of this half-cycle as a Brownian motion of rate 4. We now introduce a discontinuous time-change based on the continuous (but for a new standard Brownian motion B. The time-changed process Z is illustrated in Figure 4. Note that Z must be non-negative. A nonlinear transformation of scale Z = Z 3/2 /3 produces a Bessel( 4 3 ) process Z in time intervals throughout which Z (equivalently Z) is positive: by Itô's formula the stochastic differential equation The effect of the time-change is to delete the loops extending into W > 0, and to continue deletion till V = W d t re-attains its minimum, thus generating discontinuities (some of which are indicated in the figure by small dots on the V -axis) in the time-changed process Z, which follows the red /dark trajectory.
holds in intervals for which Z > 0. Now observe that the zero-set {u : Z(u) = 0} is almost surely a null-set. For certainly the Brownian zero-set {t : W (t) = 0} is almost surely null, and {V (t) : W (t) = 0} is then almost surely null by Sard's lemma, since V is a C 1 function with derivative W (indeed this is an easy exercise in this simple context).
Because {u : Z(u) = 0} is almost surely a null-set, it follows that the stochastic integral (using Equation (13) to construct B when Z > 0) defines a Brownian motion. Furthermore we can apply limiting arguments to write where H is the non-decreasing pure jump process which is constant away from Z = 0.
Using the theory of the Skorokhod construction (El Karoui and Chaleyat-Maurel 1978), we now derive the comparison where L is local time of B at 0, so that B + L is reflected Brownian motion. (H is discontinuous, but the argument of the Skorokhod construction applies so long as H is non-decreasing.) Consequently and so we can deduce from a Borel-Cantelli argument and the lower bound of Inequality (12) that the following holds eventually in n: for α > 0. Thus eventually in n C n ≤ 32 1 + (n − 1) 1+α 2 (n − 1) 9+9α n 2+2α × a 2 n−1 and so we can arrange for n C n < ∞ so long as we choose a n ≤ 1/n 7+β for β > 0.
We state this as a theorem: Theorem 3.5 The modified strategy described at the head of this subsection (hold at each T n till W d s d t is zero or has the same sign as W d t) produces convergence of to (0, 0, 0) in finite time so long as the conditions of Theorem 3.1 are augmented by the following: half-cycle n begins at W = ±a n for a n ≤ 1/n 7+β for β > 0 and for all sufficiently large n. (This can be arranged by choosing b n in the range a n /κ ≤ b n ≤ min{a n , 1/n 7+β } for constants κ and β > 0.)

Remark 3.6
The choice b n = a n /2 of Ben Arous et al. (1995) will suffice.
Remark 3.7 It is possible to obtain a modest gain on the above by recognizing that the hitting time of Brownian motion on −1 has the same distribution as the inverse square of a standard normal random variable. This argument permits replacement of Equation (10) by Remark 3.8 The elementary comparison approach above can of course be replaced by arguments employing the exact computations of McKean (1963).
Remark 3.9 The method described here (control coupling of higher-order iterated integrals by judicious waits at W = 0) appears to deliver effective control of just one higher-order iterated integral in addition to W , V = W d t. Attempts to control more than one higher-order iterated integral seem to lead to problems of propagation of over-correction from one half-cycle to the next. We therefore turn to a rather different, less explicit, approach in the remainder of the paper.

Reduction to non-iterated time integrals
Before considering the problem of coupling more than two iterated time integrals, we first reformulate the coupling problem in terms of integrals of the form t m m! B(t) d t rather than the less amenable iterated time integrals of above. We begin with some notation. Suppose W is defined as the difference between two coadapted coupled Brownian motions, as in §3.1. Then we set W = W (0) = B − A and define the first N iterated time integrals inductively by . . .
(Note that we have allowed for arbitrary initial conditions conditions all vanish, then we find by exchange of integrals that Binomial expansion leads to the following: . . = W (N ) (0) = 0, and J is a given adapted control, and ζ is a given stopping time. Then If the iterated time integrals have non-zero initial values then we can reduce to the case of zero initial values by supposing W is deterministically extended backwards in time to time −1, with corresponding generalization of Equation (17). By a simple argument using orthogonal Legendre polynomials P n on [−1, 1], we can choose W | [−1,0] to produce W (0) (−1) = W (1) (−1) = W (2) (−1) = . . . = W (N ) (−1) = 0, and W (0) (0) = a 0 , W (1) (0) = a 1 , W (2) (0) = a 2 , . . . , W (N ) (0) = a N as required. For if Expanding the Legendre polynomials and adapting the argument leading to Lemma 4.1, we can find a 1 , . . . , a N in terms of b 0 , b 1 , . . . , b N −1 by solving a triangular linear system of equations. Finally, b N and b N +1 may be fixed by the requirement that W (0) = a 0 = N +1 n=0 b n P n (1) and 0 = W (−1) = N +1 n=0 b n P n (0) (note that Legendre polynomials do not vanish at their end-points, and are odd or even functions according to whether their order is odd or even!).
This allows us to use Lemma 4.1 to deduce the required reduction: Lemma 4.2 It is possible to use adapted controls J to ensure (Here the constants b n depend on the initial conditions W (n) (0) for the iterated integrals).

Coupling finitely many iterated integrals
To motivate the control strategy required to couple more than two iterated integrals, we consider a discrete analogue to our problem which is in fact a limiting case. Suppose we choose only to switch between J = ±1 at instants when W switches over between two constant levels (as illustrated in Figure 1). To aid explanation we temporarily entertain the fiction that the Brownian motions conspire to produce instantaneous switching as soon as J is switched to −1. Then it is a matter of simple integration to compute the effect on integrals of the form t m m! d W : if we hold J = +1 at successive levels ±1, beginning at +1, making switches from ±1 to ∓1 at times 0 = T 0 < T 1 < . . . < T r , then (under the instantaneous switching approximation) In §5.1 below we show that particular patterns of switching times produces zero effect on integrals up to a fixed order, at least for the discrete analogue. This permits us to eliminate a whole finite sequence of the integrals. (Of course in practice, because switching is not instantaneous, the use of such patterns creates further contributions to the integrals which then must be dealt with in turn!) It is algebraically convenient to formulate the required patterns using a sequence S 0 , S 1 , . . . S 2 N −1 of values of ±1 defined recursively in a manner reminiscent of the theory of experimental design. We set Here is the pattern formed by the first sixteen S n values:

+ --+ -+ + --+ + -+ --+
We will be considering perturbations and re-scalings of the deterministic control which applies control J = −1 throughout the time interval [m, m + 1) till a switch has occurred to level S m , and then applies J = 1 for the remainder of the time interval. The discrete analogue can be viewed as a limiting case under homogeneous (not Brownian!) scaling of space and time. See Figure 5 for an illustration.

Algebraic properties of the sign sequence
We now prove some simple properties of the sign sequence S 0 , S 1 , . . . S 2 N −1 .

Proof:
Since b(0) = 0 this holds for S 0 = 1. The recursive definition (19) shows that if the lemma holds for the first 2 n entries in the sequence of S n then it will also hold for the next 2 n entries. The result follows by induction.
The proofs of the next three corollaries are immediate from the recursive definition of the S m 's.

Corollary 5.3 S 2m+1 = −S 2m
Notice the analogue of Corollary 5.3 does not hold between S 2m+2 and S 2m+1 ! Corollary 5.4 These results imply the vanishing of certain sums of low-order powers: Proof: Use induction on the level k. If k = 0 then Equation (20)  Suppose Equation (20) holds for all levels below level k and suppose k < N . Using the recursive construction (19), using the binomial expansion and cancelling the (m + 1) k terms. Now we can apply the inductive hypothesis to dispose of terms involving (m+1) k−u for u > 1: Thus Equation (20) follows by applying the inductive hypothesis to the right-hand side.
Remark 5.6 An alternative approach uses generating functions, applying the recursive construction of Equation (19) We can now compute the discrete analogue of

Proof:
Equation (21) follows from Equation (20) by binomial expansion: for example we use m = (m + 1) − 1 to deduce Note that Equation (21) for S 0 , S 1 , . . . , S 2 N −1 is equivalent to Equation (18) with appropriate definitions of the switching times T 0 , T 1 , . . . , T r . We now introduce notation for these deterministic times, as they will be basic to our coupling construction.

Application to the coupling problem
We can now summarize our control strategy for successful coupling. By global analysis (specifically, the inverse function theorem), as long as initial conditions for integrals of order up to t N −1 (N −1)! B d t are sufficiently small we can obtain a perturbation T 0 0 = 2 N < T 0 1 < . . . < T 0 r(N ) of the switching times T 1 , . . . , T r(N ) (Definition 5.8) which will dispose of these initial conditions by time 2 N . (As will become apparent, scaling arguments can be deployed to deal with larger initial conditions.) Since the coupled Brownian motions cannot actually produce instantaneous transitions between ±1, the switching activity will have introduced further nonzero contributions to the integrals by time 2 N . So long as these contributions are in turn sufficiently small, we can dispose of them in turn by administering a new control based on switching times T 1 0 = 2 N < T 1 1 < . . . < T 1 r(N ) which form a small perturbation of the scaled control obtained from T 1 , . . . , T r(N ) by re-scaling both space and time homogeneously by a factor 1/2 (not by Brownian scaling!) and shifting forwards in time by 2 N . Thus switching now occurs between levels ±1/2. A key reason for the success of the coupling is that in terms of Brownian scaling there is now twice as much effective time in which to carry out each switch! This means that the probability of all switches completing within their assigned times will increase rapidly to 1.
We can now continue this procedure, disposing of further non-zero contributions by appending further controls using perturbations based on smaller delays and smaller levels. A Borel-Cantelli argument shows there is a positive lower bound on the probability of this infinite sequence completing before a finite time: and moreover the size of the integrals decreasing to zero.
If this fails (because at some stage no small enough perturbation is available, or because a switch fails to complete before its successor is due) then we simply restart the procedure, re-scaling time to ensure existence of the perturbation required initially. Continuing in this manner allows us to deduce that almost surely coupling is eventually successful.
All depends on analyzing the behaviour of perturbations of deterministic controls of the form of Definition 5.8. Consider the map whose coordinates correspond to analogues of time integrals of order less than N : . . . , t N , u N +1 , . . . , u r(N ) so F m (t 0 ; t, u) describes the contribution to t m m! W d t made by instantaneous switching between levels ±1 happening at times t 1 , . . . , t N , u N +1 , . . . u r(N ) , starting at level 1 at time t 0 . (Recalling Corollary 5.9, We compute the Jacobian for F (t 0 ; t, u) with respect to the arguments t 1 , . . . , t N : This is proportional to a Vandermonde determinant and in fact evaluates to which is non-zero so long as the t i 's are distinct. This and the inverse function theorem allows us to assert the following fact: Lemma 5.10 The polynomial (hence smooth) map is invertible in a neighbourhood of the initial sequence of switching times T 0 , T 1 , . . . , T N corresponding to switching between levels S m , . . . at times m, . . . . In particular there is κ > 0 and ε > 0 such that for all ε < ε , if |W (m+1) (0)| < ε for m = 0, 1, . . . , N − 1, then there is a κε-perturbation (t 1 , . . . , t N ) of ( T 1 , . . . T N ) with 0 < t 1 < . . . < t N < T N +1 (hence generating a valid switching strategy) which is such that for m = 0, 1, . . . , N − 1.
Note further that from Equation (22) and the binomial theorem we have a translation symmetry: while Equation (22) directly yields a scaling property: We need just one more lemma, concerning the behaviour of Brownian motion, before we can state and prove the main coupling result for this section.

Proof:
This follows easily from the reflection principle and elementary Gaussian integral estimates: Theorem 5.14 There is a successful co-adapted coupling for Brownian motion and its first N iterated time integrals.

Proof:
By the work of §4 this is reduced this to the problem of finding an adapted control J = ±1 which delivers W such that at a particular stopping time ζ we have W (ζ) = W (0) (ζ) = 0 and W (m+1) (0) + ζ 0 t m m! W (t) d t = 0 for m = 0, 1, . . . , N − 1. Without loss of generality we assume W (0) = 2 (for otherwise we can run the control J = −1 till this occurs!).
Using Lemma 5.10 and Lemma 5.12, for fixed ε > 0 we can choose C large enough to solve for (t 0 1 , . . . , t 0 for m = 0, . . . , N − 1: carry out this switching strategy over the time period [0, 2 N C) to eliminate the initial conditions. We now apply the following algorithm iteratively starting at step k = 1, and continuing, to reduce the further contributions to the integrals made during previous switching strategies.
(taking into account that previous steps will have eliminated for m = 0, 1, . . . , N − 1. in terms of bounds on |W |, and (b) the probability that a switch begun at T k r−1 fails to complete by time the next switch T k r is due to start. Set Z −1 α = sup k=0,1,... (1 + k) 1+α 2 −k/2 , for some fixed α > 1, and recall from Lemma 5.10 that ε > 0 is the bound on initial conditions required if κεperturbation switching controls are guaranteed to exist. Let D k be the event that both for m = 0, . . . , N − 1, and also So long as D 0 , D 1 , . . . , D k−1 have been satisfied, we know W (T k 0 ) = 2 1−k , and so (bearing in mind the effects of the switching strategy) D k holds only if W (T k s ) = 2 −k for s = 1, . . . , r(N ); moreover, the condition ensures we will be able to determine the solution (t k 1 , . . . , t k N ) in Equation (25). Now the event D k is contained in the union of R(n) events of the form and W makes an down-crossing from , and the downcrossing may be replaced by a down-crossing from 2 −k to −2 −k , or an up-crossing from −2 −k to 2 −k (but this does not decrease the probability of the event concerned!).
By Brownian scaling any one such event has probability bounded above by the probability of the following event: and W makes an down-crossing from Consequently by Lemma 5.13 we deduce that k P [D k ] < ∞, and moreover we can use the Markov property and the density of Brownian paths to deduce there is a positive chance p > 0 that k F k occurs. If this happens then coupling succeeds at time lim k→∞ T k 0 : otherwise we can start the strategy anew. We can therefore assert, almost surely success will occur eventually.

Impossibility of coupling all iterated integrals
Is it possible to arrange successful coupling for all iterated integrals at a single stopping time ζ using some adapted control J?
Summation of the coupling statements produces a statement about Laplace transforms of the path, which allows us to demonstrate that coupling of all iterated integrals is possible only in trivial cases.
Theorem 6.1 Suppose that the initial conditions for the iterated stochastic integrals are feasible, in the sense that they could have been produced by integration of a continuous path starting at some previous time (without loss of generality, time −1). Consider an adapted control J producing coupling for all iterated integrals at a stopping time ζ. This can be produced only if W = B − A is actually identically zero over [0, ζ].

Proof:
Suppose the Brownian paths and all iterated integrals couple at ζ, so W (n) (ζ) = 0 for all n. We show that in this case W ≡ 0 must hold over the interval [0, ζ].
By hypothesis, we may convert into statements about integrals over [−1, ζ] (with a suitable extension of W ) using powers of t. We can write Remark 6.2 This argument is essentially non-stochastic, based only on the continuity of the path which is the difference of the two coupled processes, and so holds for any coupling, whether co-adapted or not.

Remark 6.3
More generally, this argument extends immediately to cover for example the case when the sequence of initial conditions W (m) (0) is L 2 summable (use an L 2 path over [−1, 0]!).

Conclusion
We conclude by noting that the successful coupling strategies of §3 and §5 are both in essence very simple, involving switching between synchronous (J = 1) and mirror (J = −1) coupling. It would be interesting to construct a successful coupling strategy which optimized, for example, a specific exponential moment of the coupling time; one expects there would be a whole family of such couplings parametrized by the coefficient in the exponential moment, and that the coupling strategies themselves would have some kind of geometric flavour. The results of this paper can be viewed as introducing a new notion to coupling theory: that of an "exotic coupling", a co-adapted coupling for a diffusion (in this case real Brownian motion) which successfully couples not only the diffusion itself but also a number of path functionals of the diffusion. It is striking that exotic coupling is feasible at all; the method of proof for the general case ( §5) is very suggestive for how to address more general situations. Ben Arous et al. (1995) also showed the existence of an exotic coupling for planar Brownian motion using the path functional given by the Lévy stochastic area, and it would be interesting to see how far the Ben Arous et al. (1995) result could be extended to higher dimensional Brownian motion; this would be a useful next step towards the natural bold conjecture which we now present: Conjecture 7.1 Hypoelliptic diffusions with smooth coefficients can be coupled co-adaptively with positive chance of success from any two starting points.
It would of course be of great interest to obtain specific applications of these couplings, perhaps for example in Coupling from the Past constructions.
Finally we remark that Price (1996) gives some results concerning exotic coupling using single functionals of the form f (t)B d t.