Brownian forgery of statistical dependences

The balance held by Brownian motion between temporal regularity and randomness is embodied in a remarkable way by Levy's forgery of continuous functions. Here we describe how this property can be extended to forge arbitrary dependences between two statistical systems, and then establish a new Brownian independence test based on fluctuating random paths. We also argue that this result allows revisiting the theory of Brownian covariance from a physical perspective and opens the possibility of engineering nonlinear correlation measures from more general functional integrals.


I. INTRODUCTION AND OVERVIEW
The modern theory of Brownian motion provides an exceptionally successful example of how physical models can have far-reaching consequences beyond their initial field of development. Since its introduction as a model of particle diffusion, Brownian motion has indeed enabled the description of a variety of phenomena in cell biology, neuroscience, engineering, and finance [1]. Its mathematical formulation, based on the Wiener measure, also represents a fundamental prototype of continuous-time stochastic process and serves as powerful tool in probability and statistics [2,3]. Following a similar vein, we develop in this note a new way of applying Brownian motion to the characterization of statistical independence.
Our connection between Brownian motion and independence is motivated by recent developments in statistics, more specifically the unexpected coincidence of two different-looking dependence measures: distance covariance, which characterizes independence fully thanks to its built-in sensitivity to all possible relationships between two random variables [4], and Brownian covariance, a version of covariance that involves nonlinearities randomly generated by a Brownian process [5]. Their equivalence provides a realization of the aforementioned connection, albeit in a somewhat indirect way that conceals its naturalness. Our goal is to explicit how Brownian motion can reveal statistical independence by relying directly on the geometry of its sample paths.
The brute force method to establish the dependence or independence of two real-valued random variables X and Y consists in examining all potential relations between them. More formally, it is sufficient to measure the covariances cov[f (X), g(Y)] associated with transformations f, g that are bounded and continuous (see, e.g., Theorem 10.1 in Ref. [6]). The question pursued here is whether using sample paths of Brownian motion in place of bounded continuous functions also allows to characterize independence, and we shall demonstrate that the answer is yes. In a nutshell, the statistical fluctuations of * vwens@ulb.ac.be Brownian paths B, B enable the stochastic covariance index cov[B(X), B (Y)] to probe arbitrary dependences between the random variables X and Y.
Our strategy to realize this idea consists in establishing that, given any pair f, g of bounded continuous functions and any level of accuracy, the covariance cov[f (X), g(Y)] can be approximated generically by cov[B(X), B (Y)]. Crucially, the notion of genericity used here refers to the fact that the probability of picking paths B, B fulfilling this approximation is nonzero, which ensures that an appropriate selection of stochastic covariance can be achieved by finite sampling of Brownian motion. This core result of the paper will be referred to as the forgery of statistical dependences, in analogy with Levy's classical forgery theorem [2].
Actually Levy's remarkable theorem, which states that any continuous function can be approximated on a finite interval by generic Brownian paths, provides an obvious starting point of our analysis. Indeed, it stands to reason that if the paths B and B approach the functions f and g, respectively, then cov[B(X), B (Y)] should approach cov[f (X), g(Y)] as well. A technical difficulty, however, lies with the restriction to finite intervals since the random variables X and Y may be unbounded. Although it turns out that intervals can not be prolonged as such without ruining genericity, we shall describe first a suitable extension of Levy's forgery that holds on infinite domains. Our forgery of statistical dependences will then follow.
From a practical standpoint, using Brownian motion to establish independence turns out to be advantageous. Indeed, exploring all bounded continuous transformations exhaustively is realistically impossible. (This practical difficulty motivates the use of reproducing kernel Hilbert spaces, see, e.g., Ref. [7] for a review.) Generating all possible realizations of Brownian motion obviously poses the same problem, but this unwieldy task can be bypassed by averaging directly over sample paths. In this way, and quite amazingly, the measurement of an uncountable infinity of covariance indices can be replaced by a single functional integral. We shall discuss how this idea leads back to the concept of Brownian covariance and how the forgery of statistical dependences clarifies the way it does characterize independence, without reference arXiv:1705.01372v2 [cond-mat.stat-mech] 11 Sep 2017 to the equivalence with distance covariance. Brownian covariance represents a very promising tool for modern data analysis [8,9] but appears to be still scarcely used in applications (with seminal exceptions for nonlinear time series [10] or brain connectomics [11]). Our approach based on random paths is both physically grounded and mathematically rigorous, so we believe that it may help further disseminate this method and establish it as a standard tool of statistics.

II. MAIN RESULTS
Here we motivate and describe our main results, with sufficient precision to provide a self-contained presentation of the ideas introduced above while avoiding technical details. Their mathematical proofs are then developed in the dedicated Section III. We also use here assumptions that are slightly stronger than is necessary, and some generalizations are relegated to Appendix A.

A. Extension of Levy's forgery
Imagine recording the movement of a free Brownian particle in a very large number of trials. In essence, Levy's forgery ensures that one of these traces will follow closely a predefined test trajectory, at least for some time. To formulate this more precisely, let us focus for definiteness on standard Brownian motion B, whose initial value is set to B(0) = 0 and variance at time t is normalized to B(t) 2 = |t|. We fix a real-valued continuous function f with f (0) = 0 (the test trajectory) and consider the uniform approximation event that a Brownian path B fits f tightly up to a constant distance δ > 0 on the time interval [−T, T ], Levy's forgery theorem states that this event is generic, i.e., it occurs with nonzero probability P(U f,δ,T ) > 0 (see Chapter 1, Theorem 38 in Ref. [2]). This result requires both the randomness and the continuity of Brownian motion. Neither deterministic processes nor white noises satisfy this property. In all trials though, the particle will eventually drift away to infinity and thus deviate from any bounded test trajectory. Indeed, let us further assume that the function f is bounded and examine what happens when T → ∞. If the limit event U f,δ,∞ = T >0 U f,δ,T occurs, the path B must be bounded too since the particle is forever trapped in a finite-size neighborhood of the test trajectory. However Brownian motion is almost surely (a.s.) unbounded at long times [2], so that P(U f,δ,∞ ) = 0. Hence Levy's forgery theorem does not work on infinite time domains.
To accommodate this asymptotic behavior, we should thus allow the particle to diverge from the test trajectory, Extended forgery of continuous functions. This example depicts a test trajectory (smooth curve), its allowed neighborhood (shaded area) and two sample paths, one (solid random walk) illustrating the generic event (II.3) and the other (dotted random walk), the fact that arbitrary paths have low chances to enter the expanding neighborhoods through the bottlenecks.
at least in a controlled way. Let us recall that the escape to infinity is a.s. slower for Brownian motion than for any movement at constant velocity (which is one way to state the law of large numbers [2]). This suggests adjoining to event (II.1) the loose approximation event whereby the particle is confined to a neighborhood of the test trajectory that expands at finite speed v > 0.
Asymptotic forgery theorem. Let f be bounded and continuous, and v, T > 0. Then P(E f,v,T ) > 0.
An elegant, albeit slightly abstract, proof rests on a short/long time duality between the classes of events (II.1) and (II.2), which maps Levy's forgery and this asymptotic version onto each other (see Section III A). For a more concrete approach, let us focus on the large T limit that will be used to study statistical dependences. Since the path B(t) and the neighborhood size v|t| both diverge, the bounded term f (t) can be neglected in Eq. (II.2) and the event E f,v,T thus merely requires not to outrun deterministic particles moving at speed v. The asymptotic forgery thus reduces to the law of large numbers, which ensures that P(E f,v,T ) is close to one for all T 1. This probability decreases continuously as T is lowered [since the defining condition in Eq. (II.2) becomes stricter] but does not drop to zero until T = 0 is reached. This line of reasoning can be completed and also generalized to allow slower expansions (see the strong asymptotic forgery theorem, Appendix A).
We now combine Levy's forgery and the asymptotic version to obtain an extension valid at all timescales. Specifically, let us examine the joint approximation event In words, the particle is constrained to follow closely the test trajectory for some time but is allowed afterwards to deviate slowly from it ( Fig. 1).
This result relies on the suitable integration of a "local" version of the theorem (see Section III B), but it can also be understood rather intuitively as follows. Imagine for a moment that the events (II.1) and (II.2) were independent. Their joint probability would merely be equal to the product of their marginal probabilities, which are positive by Levy's forgery and the asymptotic forgery, and genericity would then follow. Actually they do interact because the associated neighborhoods are connected through the narrow bottlenecks at t = ±T ( Fig. 1) but this should only increase their joint probability, i.e., The reason lies in the temporal continuity of Brownian motion. A particle staying in the uniform neighborhood while |t| ≤ T necessarily passes through the bottlenecks, and is thus more likely to remain within the expanding neighborhood than arbitrary particles, which have low chances to even meet the bottlenecks (Fig. 1).
In other words, the proportion P(J f,δ,T )/P(U f,δ,T ) of sample paths B ∈ E f,v,T among all those sample paths B ∈ U f,δ,T should be larger than the unconstrained probability P(E f,v,T ), hence the bound (II.4).

B. Forgery of statistical dependences
We now turn to the analysis of statistical relations using Brownian motion. Let us fix two random variables X, Y and a pair of bounded test trajectories f, g. Consider the covariance approximation event whereby the stochastic covariance cov[B(X), B (Y)], built by picking independently two sample paths B, B , coincides with the test covariance cov[f (X), g(Y)] up to a small error ε > 0 (Fig. 2). We argue that this event is generic too. The first step is to ensure that the set (II.5) is measurable so that its probability is meaningful. Physically, this technical issue is rooted once again in the escape of Brownian particles to infinity. The stochastic covariance can be expressed as a difference of two averages B(X) B (Y) and B(X) B (Y) (computed at fixed sample paths) involving the coordinates B(t), B (t ) at random moments t = X, t = Y. If long times and thus large coordinates are sampled too often, the two terms may diverge and lead to an ill-defined covariance, i.e., ∞ − ∞. To avoid this situation, we should therefore assume that asymptotic values of X and Y are unlikely enough. Actually we shall adopt hereafter the sufficient condition that X, Y are L 2 , i.e., they have finite mean and variance (see Section III C for proof of measurability). probability density of stochastic covariance Forgery of statistical dependences. The distribution of the stochastic covariance (black histogram) and its standard deviation [B(X, Y)] are shown for simulated dependent random variables [n = 10 4 black dots, Insert (a)] as well as a test covariance (plain arrow) and a sample (dotted arrow) falling within the allowed error (shaded area). Insert (b) shows the associated functions. The case of independent variables is superimposed for comparative purposes.
Forgery theorem of statistical dependences. Let X, Y be L 2 random variables, f, g be bounded and continuous, and ε > 0. Then P(C X,Y,f,g,ε ) > 0.
The idea is that one way of realizing the event (II.5) is to pick sample paths B ∈ J f,δ,T , B ∈ J g,δ,T that fit the test trajectories f, g tightly (δ 1) over a long time period (T 1), see Fig. 2 To understand Eq. (II.6), imagine first that the random times were bounded with |X|, |Y| ≤ T . Then B(X), B (Y) differ from f (X), g(Y) by less than δ for all times X, Y [Eq. (II.1)] so the covariance error must be O(δ) at most. Now for unbounded random times the distance between sample paths and test trajectories may exceed δ and must actually diverge at long times, which could have led to an infinitely large covariance error if not for the fact that the occurence of |X| ≥ T or |Y| ≥ T is very unlikely. So the fit divergence v|t| [see Eq. (II.2)] is counterbalanced within averages by the fast decay of long times probability [e.g., P(|X| ≥ |t|) ≤ X 2 /t 2 ]. The contribution of |X| ≥ T or |Y| ≥ T to the covariance error is thus finite and scales as O(v), which leads to Eq. (II.6).
This forgery theorem allows us to probe enough possible relationships to establish statistical dependence or independence, as we explain now. Consider the two probability densities of stochastic covariance shown in Fig. 2, which were generated using simulations differing only by the presence or absence of coupling between X and Y. The distribution appears significantly wider for the dependent variables, so this suggests that width is the key indicator of a relation. Actually, for the independent variables the narrow peak observed reflects an underlying Dirac delta function (its nonzero width in Fig. 2 is due to finite sampling errors in the covariance estimates). Indeed the vanishing of all stochastic covariances is a necessary condition of independence. The impossibility of sampling nonzero values also turns out to be sufficient.
To prove sufficiency, we show that the hypothesis cov[B(X), B (Y)] = 0 a.s. implies that all test covariances vanish, which is equivalent to the independence of X and Y (Theorem 10.1 of Ref. [6]). This can be understood concretely using the following thought experiment. Imagine that cov[f (X), g(Y)] = 0 for some pair of test trajectories f, g and let us fix, say, ε = |cov[f (X), g(Y)]|/4 (as in Fig. 2). We then generate sequentially samples of stochastic covariance until the approximation event C X,Y,f,g,ε [Eq. (II.5), shaded area in Fig. 2] occurs. The forgery of statistical dependences ensures that this sequence stops eventually and our choice of ε, that the last covariance sample is nonzero. However this contradicts our hypothesis, which imposes that all trials result in vanishing covariance. (See also Section III E for a settheoretic argument.)

C. Brownian covariance revisited
The forgery of statistical dependences provides an alternative approach to the theory of Brownian covariance, hereafter denoted B(X, Y). This dependence index emerges naturally in our context as the root mean square (r.m.s.) of the stochastic covariance (or equivalently its standard deviation since the mean cov[B(X), B (Y)] vanishes identically by symmetry B ←→ −B, see also which is only a slight reformulation of the definition in Ref. [5]. For L 2 random variables X, Y, the quadratic Gaussian functional integrals over the sample paths B, B can be computed analytically and the result reduces to distance covariance, so B(X, Y) inherits all its properties [5]. Alternatively, we argue here that the central results of the theory follow in a natural manner from Eq. (II.7).
The first key property is that X, Y are independent iff B(X, Y) = 0. This mirrors directly the Brownian independence test since the r.m.s. (II.7) measures precisely the deviations of stochastic covariance from zero. This argument thus replaces the formal manipulations on the regularized singular integrals that underlie the theory of distance covariance [4]. Furthermore the forgery of statistical dependences clarifies how this works physically: Brownian motion fluctuates enough to make the functional integral (II.7) probe all the possible test covariances between X and Y.
The second key aspect is the straightforward sample estimation of B(X, Y) using a parameter-free, algebraic formula. This is an important practical advantage over other dependency measures such as, e.g., mutual information [12]. Instead of relying on the sample formula of distance covariance [4,5], Eq. (II.7) prompts us to estimate the stochastic covariance (or rather, its square) before averaging over the Brownian paths. So, given n joint samples X i , Y i (i = 1, . . . , n) of X, Y and an expression for their sample covariance cov n , the functional integral determines an estimator B n (X, Y) of Brownian covariance. If X, Y are L 2 random variables, this procedure allows to build the estimation theory of Brownian (and thus, distance) covariance from that of the elementary covariance. For instance, the rather intricate algebraic formula for the unbiased sample distance covariance [13,14] is recovered by using, quite naturally from Eqs. (II.7) and (II.8), an unbiased estimator cov n [·, ·] 2 of the covariance squared. (See Section III F for the explicit expressions of these estimators and a derivation of this statement.) The unbiasedness property cov n [·, ·] 2 = cov[·, ·] 2 is then automatically transferred to the corresponding estimator (II.8), i.e., because the L 2 convergence hypothesis is sufficient to ensure that averaging over the samples X i , Y i commutes with the functional integration over the Brownian paths B, B . It is noteworthy that our construction of Brownian covariance and its estimator can be generalized by formally replacing the Brownian paths B, B by other stochastic processes or fields (in which case we may consider multivariate variables X, Y). This determines a simple rule to engineer a wide array of dependence measures via functional integrals, and opens the question of what processes allow to characterize independence. Our approach relied on the ability to probe generically all possible test covariances but, critically, the class of processes satisfying a forgery of statistical dependences might be relatively restricted. On the other hand, the original theory of Brownian covariance does extend to multidimensional Brownian fields or fractional Brownian motion [5], which are not continuous or Markovian (two properties central for the forgery theorems). So the forgery of statistical dependences provides a new and elegant tool to establish independence, but it may only represent a particular case of a more general theory of functional integral-based correlation measures.

III. MATHEMATICAL ANALYSIS
We now proceed with a more detailed examination of the results presented in Section II. The most technical parts of the proofs are relegated to Appendix B.

A. Asymptotic forgery and duality
We sketched in Section II A a derivation of the asymptotic forgery theorem using the law of large numbers. A generalization can actually be developed fully (see Appendix A). However, this particular case enjoys a concise proof based on a symmetry argument.
The short/long time duality. We start by establishing the duality relation where the dual f d of the function f is given by Proof. By the time-inversion symmetry of Brownian motion [2], replacing B(t) by |t|B(1/t) in the right-hand side of Eq. (II.2) determines a new event with the same probability as E f,v,T . Explicitly, this event is Proof of the asymptotic forgery theorem. The theorem naturally follows from this duality and Levy's forgery. The explicit formula (III.2) establishes that f d is continuous [for t = 0 this corresponds to the continuity of f , and for t = 0 to its boundedness sup t∈R |f (t)| ≤ M since then |tf (1/t)| ≤ M |t| → 0 as t → 0]. Levy's forgery theorem thus applies and shows that the right-hand side of Eq. (III.1) is nonzero.

B. Local extended forgery
We now describe an analytical derivation of the extended forgery theorem that formalizes the intuitive argument given in Section II A. The ensuing bound for the probability of the joint event (II.3) does not quite reach that in Eq. (II.4) but is sufficient to ensure genericity.
Restriction to positive-time events. As a preliminary, it will be useful to consider the positive-and negative-time events U ± f,δ,T and E ± f,v,T , which correspond to (II.1) and (II.2) except that their defining conditions are enforced only for 0 ≤ ±t ≤ T and ±t ≥ T , respectively. These events are generic too because they are less constrained (e.g., by monotonicity of the probability measure P(·), Levy's forgery and the asymptotic forgery.
We are going to focus below on the derivation of This is sufficient to prove the extended forgery because the backward-time (t ≤ 0) part of Brownian motion is merely an independent copy of its forward-time (t ≥ 0) part. Hence a similar result necessarily holds for the negative-time events and, in turn, The integral formula. The first step in the demonstration of Eq. (III.6) relies on the following explicit expression for the joint probability as a functional integral. Let us introduce the family of auxiliary events and denote their probability by for all x ∈ R. Then The first factor in this expectation value denotes the indicator function of the event U + f,δ,T and enforces the constraint that all considered sample paths B must lie within the uniform neighborhood (II.1) for 0 ≤ t ≤ T . In the second factor the function (III.9) is evaluated at the position of the random path B(t) at t = T . As we explain below this function is Borel measurable, so p f,δ,T (B(T )) represents a proper random variable and the expectation value is well defined.
Proof. The formula (III.10) is a direct consequence of the two following statements, each involving the conditional probability P[E + f,v,T | B(T )]: The next step is to show that the integrand in the right-hand side of Eq. (III.10) cannot vanish identically, which provides a "local" version of the extended forgery. The full theorem will follow by integration.
Extended forgery theorem (local version). There exists a subinterval J of the bottleneck f (T ) − δ < x < f (T ) + δ at t = T such that for some continuous function g satisfying g(0) = 0 and some distance parameter > 0, and for all x ∈ J. By Eqs. (III.4) and (III.5), both lower bounds are positive. The two parts of the theorem are closely related to Levy's forgery and the asymptotic forgery, and we treat them separately.
Proof of (III.13). This property actually holds for arbitrary subintervals J, which we shall parameterize as and |B(T ) − x 0 | = |B(T ) − g(T )| < as g(T ) = x 0 . The bound (III.13) follows from these two inclusions by the monotonicity of P(·).
Two properties of the function p f,δ,T . For the second part it is useful to interject here the following simple statements about the function (III.9): (i) p f,δ,T vanishes identically outside the bottleneck, (ii) p f,δ,T is continuous within the bottleneck (and therefore, it is Borel measurable as well).
The first claim rests on the observation that the set (III.8) is empty whenever |x−f (T )| ≥ δ (since its defining condition at t = 0 cannot be satisfied, as B(0) = 0), so we find p f,δ,T (x) = P(∅) = 0.
The second claim may appear quite clear as well, since the set (III.8) should vary continuously with x, but this intuition is not quite right. For a more accurate statement, let us fix a point x in the bottleneck and consider an arbitrary sequence x n converging to it. Then the limit event of A xn f,δ,T (assuming it exists) coincides with A x f,δ,T modulo a zero-probability set, so by continuity of P(·) lim n→∞ p f,δ,T (x n ) = p f,δ,T (x) . (III.16) The full proof appears rather technical, so we only sketch the key ideas here and relegate the details to Appendix B 3. The limit event imposes that sample paths lie within the expanding neighborhood associated with (III.8) but are allowed to reach its boundary [essentially because the large n limit of the strict inequalities (III.8) at x = x n yields a nonstrict inequality]. However this hitting event a.s. never happens because the typical roughness of Brownian motion forbids meeting another trajectory without crossing it (see Appendix B 4 for this "boundary-crossing law", which generalizes Lemma 1 on page 283 of Ref. [3] to time-dependent levels.).
Proof of (III.14). The key observation here is that which is merely the integral formula (III.10) with the constraint B ∈ U + f,δ,T removed [equivalently, this is Eq. (B.5) with S = R] and the expectation over B(T ) made explicit. If p f,δ,T (x) ≤ P(E + f,v,T ) everywhere, then for almost all x this inequality must saturate to an equality by Eq. (III.17) and thus p f,δ,T (x) > 0 by Eq. (III.5). This contradicts property (i) of p f,δ,T , so that Eq. (III.14) must hold for at least one point x in the bottleneck, and thus also on some subinterval J by the continuity property (ii).
Proof of the extended forgery theorem. Finally we combine the local version of the theorem with the integral formula (III.10) to obtain Indeed, further constraining the paths to B(T ) ∈ J allows to bound the right-hand side of (III.10) from below by f,v,T ) by (III.14), ≥ P(U + g, ,T ) P(E + f,v,T ) by (III.13).
In passing through the first inequality we also used the identity that 1 M = P(M ) for any event M . The theorem (III.6) follows from the inequality (III.18) together with Eqs. (III.4) and (III.5).
It is noteworthy that we stated and proved the extended forgery under the assumption that v = δ/T for convenience, but in the above arguments this restriction was actually artificial so the theorem holds for arbitrary parameters δ, v, T > 0.

C. Measurability of the stochastic covariance index
The premise of all the forgery theorems is that the associated approximation events are measurable. For the sets (II.1), (II.2) enforcing algebraic conditions on a time domain, this is a general property of separable stochastic processes such as Brownian motion [3]. Indeed, the principles of probability theory only allow constraints on at most a countable infinity of time points but the notion of separability enables considering continuous time domains as well. In a nutshell, Brownian motion is separable because conditions such as a ≤ B(t) ≤ b imposed on a dense grid of rational time points t = k/n extend automatically to nonrational times by sample path continuity.
The story is different for the set (II.5) because it involves a functional (III.19) of the sample paths B, B that depends in particular on the distribution of the random times X, Y. Here we provide convergence conditions that ensure the measurability of this functional and thus of the covariance approximation event (II.5).
Sufficient conditions for measurability. The expectation values B(X) , B (Y) , and B(X)B (Y) are measurable functionals of the sample paths B, B whenever |X| 1/2 , |Y| 1/2 , and |XY| 1/2 are finite, respectively. When these three conditions hold, the stochastic covariance (III.19) is thus measurable too.
It is noteworthy that these conditions can be replaced by the stronger constraint that X, Y are L 1 , i.e., |X| and |Y| are finite (since the Cauchy-Schwarz inequality implies, e.g., |XY| 1/2 ≤ |X| 1/2 |Y| 1/2 ). In Section II B we used the even stronger assumption that X, Y are L 2 to simplify and shorten our formulation of results.
Proof. This is a direct consequence of the Fubini-Tonelli theorem [3]. Let us focus on the functional B → B(X) [the argument for B (Y) and B(X)B (Y) is completely similar]. In this setup, the theorem ensures its measurability if the iterated expectation value of B(X), computed first by averaging over sample paths B and then over random times X, is absolutely convergent. Since B(t) is a Gaussian random variable with zero mean and variance |t|, we find explicitly Thus the finiteness condition on |X| 1/2 indeed fulfills the requirement of the Fubini-Tonelli theorem. Technically this application also needs the extra prerequisite that B(X) be a random variable, i.e., that the evaluation map (B, X) → B(X) be jointly measurable (and not just at fixed path B or fixed time X). This is a property of continuous stochastic processes such as Brownian motion [3].

D. Nested forgery of statistical dependences
We showed that the forgery of statistical dependences is induced by the extended forgery using the somewhat rough estimate (II.6) of the covariance error. We provide now an exact upper bound and a complete proof of the theorem.
Error bound. For arbitrary sample paths B ∈ J f,δ,T and B ∈ J g,δ,T , we have where v = δ/T and the error bound ε b (δ, v) is given by The derivation of this estimate relies on a relatively straightforward application of a series of inequalities and is relegated to Appendix B 5. Its significance rests on the fact that, when the polynomial coefficients in (III.23) are finite, the covariance error can be made arbitrarily small by taking δ, v → 0. Hence we obtain the following key intermediate result.
The nested forgery lemma. Assume that M f , M g , |X| , |Y| , and |XY| are finite, and let ε > 0. Then there exists δ, T > 0 such that where the prime on event J g,δ,T indicates that it applies to the process B .
Proof of the forgery theorem of statistical dependences. This is a direct corollary. The L 2 hypothesis and the boundedness assumption on f, g ensure that the lemma (III.24) applies. Using the monotonicity of P(·) and the independence of B, B then yields It is now tempting to invoke the extended forgery theorem, but here the conditions f (0) = g(0) = 0 were not imposed. This is however not an issue thanks to the translation invariance of covariance, i.e., cov Thus where we used Eq. (III.25) and the extended forgery theorem in the second line. Note that the L 2 assumption used in Section II B was slightly stronger than needed, as made clear by the nested forgery lemma.

E. Brownian independence test and dichotomy
To prove that the assertion cov[B(X), B (Y)] = 0 a.s. implies cov[f (X), g(Y)] = 0, we used in Section II B a discrete sampling method for which it is straightforward that the covariance approximation event (II.5) must occur simultaneously with the zero covariance event (III.27) Their compatibility, which was indeed central in our derivation, is actually a general property of probability theory and we use it here to provide an alternative, settheoretic argument.
The dichotomy lemma. For arbitrary functions f, g and parameter ε, we have Second proof of sufficiency for the Brownian independence test. By the forgery of statistical dependences the event C X,Y,f,g,ε is generic for f, g bounded and continuous and ε > 0, and by hypothesis the event Z X,Y occurs a.s. Since the intersection of a generic event and an almost sure event is never empty [if it were empty, the generic event would be a subset of a zero probability event (i.e., the complement of the almost sure event), which is forbidden by monotonicity of P(·)], the second possibility in Eq. (III.28) is ruled out so |cov[f (X), g(Y)]| < ε must hold true. Since ε > 0 was arbitrary, we conclude that cov[f (X), g(Y)] = 0.

F. Unbiased estimation of Brownian covariance
We now exemplify how an explicit estimator of Brownian covariance can be derived from the functional integral (II.8). Our construction enforces unbiasedness at finite sampling and allows to recover the unbiased sample formula of distance covariance [13,14] that we review first.
The unbiased estimator. Let us introduce the distance matrix a ij = |X i − X j | between the samples X i of X as well as its "U-centered" version , (III.29) where 1 ≤ i, j, k, l ≤ n. The analogous matrices for the corresponding samples of Y are denoted b ij and B ij , respectively. With these notations and assuming n ≥ 4, This expression differs from the formula given in Refs. [13,14] by a trivial factor 1/4 due to our use of the standard normalization for Brownian motion. The fact that Eq. (III.30) provides an unbiased estimation of Brownian covariance [see Eq. (II.9)] will follow directly from our derivation based on the functional integral (II.8), which starts naturally from an unbiased estimation of the covariance squared.
Estimation of covariance squared. It is convenient to introduce the new random variables x = B(X), y = B (Y) and their samples x i = B(X i ), y i = B (Y i ), all being defined at fixed sample paths B, B . An unbiased estimator for cov(x, y) itself is well-known from elementary statistics, but it can be checked by developing its square that the estimation of cov(x, y) 2 is then hampered by systematic errors of order 1/n. Here, given n ≥ 4, we shall rather define (III.31) where the primed sum is taken over all distinct indices 1 ≤ i, j, k, l ≤ n. This expression differs from the aforementioned development by O(1/n) and is indeed free from finite sampling biases.
This can be proven by averaging Eq. (III.31) over the n joint samples x i , y i . Indeed, using the identities x i x j y i y j = xy 2 , x i x j y i y k = xy x y , and x i x j y k y l = x 2 y 2 and the fact that the primed sum contains n!/(n − 4)! terms, we find cov n (x, y) 2 = xy 2 − 2 xy x y + x 2 y 2 = cov(x, y) 2 . (III.32) respectively. This substitution rule can actually be simplified further to x i x j → −a ij /2 because the terms involving |X i | cancel out thanks to the algebraic identity all distinct j,k,l =i (y i y j − y i y k − y j y k + y k y l ) = 0 . (III.34) Similar cancellations also allow to use y i y j → −b ij /2. We thus obtain the unbiased estimator (III.35) The equivalence with Eq. (III.30) is not obvious at first sight. To make contact with it, let us fix i, j and consider the sum of the second factor over k, l, all indices being distinct. This contribution is the sum of the following four terms: the sums in the right-hand sides now running over unconstrained indices 1 ≤ k, l ≤ n. The total thus reduces to (n − 1)(n − 2)B ij , so Eq. (III.35) can be rewritten as Finally we recover Eq. (III.30) because a ij can be replaced by its U-centering A ij . Indeed the extra terms cancel in the sum thanks to the centering property j =i B ij = 0 (for all i fixed), which can be checked from the definition (III.29). FORGERY THEOREMS The various forgery theorems obtained by extending Levy's original forgery relied on approximations involving neighborhoods expanding at long times with a constant speed v. However it turns out that this assumption can be weakened. We now describe a stronger version of the asymptotic forgery theorem and then briefly discuss some implications.
Slowly expanding neighborhoods. The key observation is that, in our discussion of the asymptotic forgery, the law of large numbers can be replaced almost verbatim by the more precise law of the iterated logarithm, which establishes that Brownian motion diverges a.s. at long times not faster than 2|t| log log |t| [2].
Thus we naturally generalize the definition (II.2) using neighborhoods that expand strictly faster than almost all Brownian paths. Explicitly, we let where the growth function φ(t) is positive and continuous on t > 0, and satisfies the divergence condition The particular example φ(t) = vt leads back to Eq. (II.2). The asymptotic behavior (A.2), together with the law of the iterated logarithm, indeed ensures that the expansion described by φ(t) is a.s. faster than Brownian motion, i.e., The law of the iterated logarithm states precisely that the first factor in the right-hand side equals a.s. to one, and the second factor tends to zero by definition (A.2). Therefore we obtain lim sup |t|→∞ |B(t)|/φ(|t|) = 0 a.s., which is equivalent to Eq. (A.3).
Strong asymptotic forgery theorem. Let f be bounded and continuous, φ be a growth function as specified above, and T > 0. Then P(E f,φ,T ) > 0.
Proof. We revisit here the strategy of proof sketched in the main text. Let us start by demonstrating that The event (A.1) increases monotonously when T is raised (since the defining condition becomes less strict) so lim T →∞ P(E f,φ,T ) is equal to P( T >0 E f,φ,T ) by continuity of the measure P(·). Therefore we need to show that the limit event T >0 E f,φ,T occurs a.s. This is the set-theoretic version of the statement that, for almost all sample paths B, we can find some T > 0 ensuring the inequality |B(t) − f (t)|/φ(|t|) < 1, ∀ |t| ≥ T . In turn this is a particular case of the slightly stronger assertion It thus remains to prove the theorem for 0 < T < T .
To that aim let us split E f,φ,T at |t| = T and identify it as a joint event U f,φ,T, Our strategy is to show first that these two events are generic, and then apply the techniques of Section III B to derive the genericity of their intersection. The case of E f,φ,T is taken care of by Eq. (A.7). For (A.8) observe that U f,φ,T,T ⊃ U f,δ,T * whenever δ is chosen below the minimum value of the continuous function φ(|t|) over the compact time domain T ≤ |t| ≤ T * , since |B(t) − f (t)| < δ ≤ φ(|t|). If we pick a δ > 0 in this way (this is possible since φ is positive), we then obtain P(U f,φ,T,T ) ≥ P(U f,δ,T * ) > 0 by monotonicity of P(·) and Levy's forgery theorem. The last step is to ensure that the intersection is generic too. This is analogous to the extended forgery theorem (III.7) and can be proven along the exact same lines.
Consequences for the other forgery theorems. Strong versions of the subsequent forgery theorems can be obtained by replacing the event (II.2) with its generalization (A.1). The end result is to weaken the convergence assumptions required for the random variables X, Y and therefore widen the applicability of the forgery of statistical dependences and the Brownian independence test. We gather the results here without repeating the derivations.
The strong extended forgery theorem states that if f is bounded and continuous with f (0) = 0, φ is a growth function as above, and δ, T > 0, then A covariance error inequality similar to Eq. (III.22) then holds for arbitrary sample paths B ∈ U f,δ,T E f,φ,T and B ∈ U g,δ,T E g,φ,T with the bound instead of (III.23). As a consequence, the forgery theorem of statistical dependences and the Brownian independence test both hold under the weaker convergence conditions that there exists a growth function φ for which φ(|X|) , φ(|Y|) , and φ(|X|)φ(|Y|) are finite. This equation relies on the weak Markov property and is actually valid for general Markovian processes. Denoting by U f,δ,T the σ-algebra generated by the event U + f,δ,T , we can write merely by definition of the conditional probability P[E + f,v,T | U f,δ,T ] [3]. Furthermore, U f,δ,T is contained in the σ-algebra F T generated by the process B(t) with 0 ≤ t ≤ T , since U + f,δ,T only involves conditions on this time interval. From this observation, where . | U f,δ,T denotes the expectation value conditional to U f,δ,T . In the first line we applied the tower property of conditioning that follows from U f,δ,T ⊂ F T .
In the second we used the weak Markov property that events involving conditions for t ≥ T (such as E + f,v,T ) only depend on their past via the (σ-algebra generated by the) random variable B(T ).
Combining Eqs. (B.1) and (B.2), we therefore find that the left-hand side of Eq. (III.11) is equal to The conditional expectation to U f,δ,T can be dropped in this expression, again by mere definition of conditioning to U f,δ,T , so we recover the right-hand side of (III.11).

Representation
The defining property of the conditional probability for all Borel sets S ⊂ R, which only characterizes it modulo a zero-probability set. To prove the statement (III.12), it thus suffices to check that p f,δ,T (B(T )) satisfies the same property. This is a consequence of the fact that Brownian motion has independent increments themselves distributed as Brownian motions [3]. where we used the identity 1 {B ∈A x f,δ,T } = P(A x f,δ,T ) and the definition (III.9) in passing through the second equality. This ends the demonstration of Eq. (III.12).
3. Continuity property (III.16) of p f,δ,T We shall prove that the limit of p f,δ,T (x n ) = P(A xn f,δ,T ) exists and equals to p f,δ,T (x) = P(A x f,δ,T ) whenever x n → x with x inside the bottleneck interval at t = T .
We start from the difference formula and argue that each of the two terms in the right-hand side converge to zero. Since sequences of probabilities are nonnegative, it is actually sufficient to show that vanish. Here i.o. stands for "infinitely often" and means that an infinite number of events in the sequence occur, and indeed allows evaluating superior limits [3].
To proceed with the proof, let us introduce the zeroprobability set Z of paths that are not continuous or do not satisfy the law of large numbers lim t→∞ B(t)/t = 0, and the boundary-hitting event that is closely related to (III.8) but imposes that sample paths actually reach the boundary of the associated neighborhood. This event turns out to also have probability zero, because a Brownian path hitting the boundary must a.s. also cross it and thus leave the neighborhood. For completeness we provide a derivation of this boundary-crossing law in Appendix B 4.
With these definitions at hand, we claim that They imply directly that the two limits (B.7) and (B.8) vanish by monotonicity and subadditivity of P(·), because P(Z ) = P(B x f,δ,T ) = 0. Therefore the right-hand side of Eq. (B.6) does indeed tend to zero as n → ∞.
It remains to derive these two inclusions.
First inclusion (B.10). Let us consider an arbitrary sample path B satisfying B ∈ A x f,δ,T \A xn f,δ,T i.o. Explicitly, this means that we can build a diverging sequence n k of indices such that B ∈ A x f,δ,T and B / ∈ A xn k f,δ,T , i.e., for some t k ≥ 0. At this point two cases emerge. Both will lead to the conclusion that B ∈ Z , which corresponds to the sought inclusion (B.10).
(i) The set of times t k is bounded. In that situation, the Bolzano-Weierstrass theorem provides a subsequence t km converging to some finite time τ . Assuming that B is continuous, let us evaluate Eq. (B.12) along k = k m and take m → ∞. Using x n km → x and the continuity of B and f , we find |B(τ ) + x − f (T + τ )| ≥ δ + vτ , which contradicts the fact that B ∈ A x f,δ,T . Therefore B cannot be continuous at τ , henceforth B ∈ Z .
(ii) The set of times t k is unbounded. We can then divide both sides of Eq. (B.12) by t k and let k → ∞. Since t k → ∞ and both x n and f are bounded, it follows that lim sup k→∞ |B(t k )|/t k ≥ v > 0. This shows that the sample path B cannot satisfy the law of large numbers, hence B ∈ Z .
Second inclusion (B.11). Similarly, let us now consider B ∈ A xn f,δ,T \A x f,δ,T i.o., meaning that there exists a diverging sequence n k such that for all t ≥ 0, and a set of times t k ≥ 0 such that |B(t k ) + x − f (T + t k )| ≥ δ + vt k . (B.14) Taking k → ∞ in Eq. (B.13) yields the nonstrict inequality |B(t) + x − f (T + t)| ≤ δ + vt. Repeating our analysis of Eq. (B.12) to Eq. (B.14) also shows that equality must hold at some time τ , i.e., B ∈ B x f,δ,T , or else B ∈ Z , so we obtain the sought inclusion (B.11).
(B.15) They inherit the continuity of f , and since x lies within the bottleneck interval at t = T they also satisfy the condition γ − (0) < 0 < γ + (0). The boundary-crossing law that we shall derive can be stated formally as follows: This provides the sought estimate of P(B x f,δ,T ). Indeed, observing that the boundary-hitting event (B.9) implies either B(t) ≤ γ + (t) with equality at some time, or else B(t) ≥ γ − (t) with equality at some time, we find In what follows we fix τ < ∞ and we will eventually let τ → 0. The hitting (without crossing) event M ∞ = 0 implies that either B(t) reaches γ + (t) before t = τ , which corresponds to the event M τ = 0, or else the increment B (t ) = B(τ + t ) − B(τ ) reaches γ + (τ + t ) − B(τ ) for t = t − τ > 0, which corresponds to the hitting event The limit event M τ = 0 i.o. means explicitly that B(t) hits γ + (t) for arbitrarily small times t > 0, which implies that lim t→0 B(t) = γ + (0) > 0 = B(0). Thus B cannot be continuous at t = 0, so the probability that M τ = 0 occurs i.o. is zero. The second term vanishes for all 0 < τ < ∞. Noting that the increment process B (and thus, W) is independent of B(τ ) and applying the Fubini-Tonelli theorem, we can write explicitly