Inference on local causality and tests of non-causality in time series

The study of the causal relationships in a stochastic process (Yt, Zt)t∈Z is a subject of a particular interest in finance and economy. A widely-used approach is to consider the notion of Granger causality, which in the case of first order Markovian processes is based on the joint distribution function of (Yt+1, Zt) given Yt. The measures of Granger causality proposed so far are global in the sense that if the relationship between Yt+1 and Zt changes with the value taken by Yt, this may not be captured. To circumvent this limitation, this paper proposes local causality measures based on the conditional copula of (Yt+1, Zt) given Yt = x. Exploiting results by [5] on the asymptotic behavior of two kernel-based conditional copula estimators under α-mixing, the asymptotic normality of nonparametric estimators of these local measures is deduced and asymptotically valid confidence intervals are built; tests of local non-causality are also developed. The suitability of the proposed methods is investigated with simulations and their usefulness is illustrated on the time series of Standard & Poor’s 500 prices and trading volumes. MSC 2010 subject classifications: Primary 62G99; secondary 62M99.


Introduction
The concept of causality as originally introduced by [27] and [15] is helpful for studying the dynamic relationships in multivariate time series. This notion is defined in terms of predictability at horizon one of a random variable (or random vector) Y from its past and the past of another random variable (or vector) Z. 4122 T. Bouezmarni et al. Specifically, assume that data are available for the process (Y t , Z t ) t∈Z , and let Y t , Z t be realizations, respectively, of Y and Z up to time t. According to [15], the causality from Z to Y one period ahead is defined as follows: Z is said to cause Y if Z t can help predict Y t+1 , conditional on Y t .
Many works considered testing the null hypothesis of non-causality. For example, [4] investigated tests for multivariate ARMA models. Because the Granger non-causality is a form of conditional independence, tests can be deduced from standard conditional independence tests; see [9], for instance. Tests based on copulas have been proposed by [6]. When the hypothesis of non-causality is rejected, one may be interested in measuring the strength of this causal relationship. The first causality measures were proposed by [11] and [12] using the mean-squared forecast errors, and by [14] based on the Kullback-Leibler information. Causality measures for short and long run under parametric models were investigated by [23] and [7]. Mainly inspired by the fact that these measures suffer from model misspecification, nonparametric mesures where proposed by [24] and [16] using the Kullback-Leibler information and nonparametric density copula estimators. Recently, [28] investigated causality measures at multiple horizons for exchange rate and commodity prices.
It is worth mentioning that all of the above-cited papers focused on characterizing the global relationship between Y t+1 and Z t , conditional on Y t . However, if the nature of the link between Y t+1 and Z t changes with the value taken by Y t , it may not be captured by global measures. A possible solution to this issue is to compute the partial correlation coefficient. However, such an approach implicitly assumes linear relationships and the measure depends on the marginal behavior. In other words, this strategy would suffer from the same well-documented drawbacks of the classical Pearson correlation coefficient.
To circumvent these limitations, this paper proposes marginal-free local causality indices for measuring the strength of the relationship in the pair (Y t+1 , Z t ) given a particular value taken by Y t . In order to simplify the presentation, a focus is put on Markovian models of order one where one considers the dependence structure of (Y t+1 , Z t ) given Y t = x ∈ R as captured by its associated bivariate conditional copula. This approach allows for the definition of nonparametric measures of local causality that do not suffer from the drawbacks that arise when using partial correlations. Specifically, let {(Y t , Z t )} t∈Z be a stationary process and define the local causality distribution function Assuming that the conditional marginal distributions F 1x (y)=P(Y t+1 ≤ y|Y t = x) and F 2x (z) = P (Z t ≤ z|Y t = x) are continuous, Sklar's Theorem guarantees the existence of a unique copula C Z→Y The bivariate function C Z→Y x will be called the local causality copula and corresponds to the dependence structure of (Y t+1 , Z t ) given Y t = x.

Two estimators
Consider a stationary process {(Y t , Z t )} t∈Z that is α-mixing, i.e. its α-mixing coefficients α(r) are such that α(r) → 0 as r → ∞. Recall that the latter are defined by α(0) = 1/2 and for each lag r ∈ N, where F b a is the σ-field generated by {(Y t , Z t )} a≤t≤b . In the sequel, one assumes a realization (Y 1 , Z 1 ), . . . , A first estimator of the local causality copula arises upon noting that C Z→Y x can be extracted from H Z→Y An estimator of the joint conditional distribution H Z→Y x defined in (1) will then provide a plug-in estimator of C Z→Y x . In this paper, one considers estimators based on the local linear weights defined for each i ∈ {1, . . . , n} by 4124 T. Bouezmarni et al. where h = h n > 0 is a bandwidth parameter that tends to zero as the sample size n → ∞, K is a symmetric density and for ∈ {0, 1, 2}, The local linear estimator of H Z→Y x is then defined by The reader is referred to [8] for details on how to derive this kind of estimators. A natural plug-in estimator of C Z→Y x is then given by where F −1 1xh and F −1 2xh are the left-continuous generalized inverses of As noted by [26] and [13] in the i.i.d. case, the plug-in estimator C Z→Y xh may be severely biased, especially when the conditional marginal distributions strongly depend on the covariate. For that reason, a second estimator that aims at removing this possible effect of the covariate on the margins is proposed. To this end, let h 1 , h 2 > 0 be bandwidth parameters that may differ from h, and define for each i ∈ {1, . . . , n} the pair of pseudo-observations where F 1Yih1 and F 2Yih2 follow from the definition in (5). Their associated empirical conditional joint distribution is then where G −1 1xh and G −1 2xh are the left-continuous generalized inverses of Remark 2.1. Altough it will not be further treated in this paper, copula-based local causality could be extended beyond horizon one. In other words, one could consider the local causality copula that arises from the conditional distribution of  [19], for instance, for more details on multivariate local linear weights.

Weak convergence
In their paper, [5] obtained the weak convergence of two local linear estimators of conditional copulas from a realization of a general three-dimensional α-mixing process ( 1t , 2t , 3t ) t∈Z , where 3t is the conditioning variable, i.e. the covariate. The setup here is therefore a special case of [5] where 1t = Y t+1 , 2t = Z t and 3t = Y t . Before describing the results, let f Y be the density associated to the stationary process {Y t } t∈Z and define Moreover, let α Z→Y x be a Gaussian process on [0, 1] 2 such that for κ defined in Assumption (N ) andḦ Z→Y As a special case of Proposition 2 in [5], and for a fixed value of x ∈ R, one can then conclude that under Assumptions (S), (C), (L), (H) and (N ) described in Appendix A.1, the process C Z→Y where C [1] x (u, v) = ∂C Z→Y x (u, v)/∂u and C [2] x (u, v) = ∂C Z→Y x (u, v)/∂v. Before stating the result on the weak convergence of C Z→Y and whose covariance function is the same as that of α x . Then one can invoke Proposition 3 of [5] and conclude that as long as Assumptions (S), (C), (L), (H ) and (N ) in Appendix A.2 hold, and for a fixed value of x ∈ R, C xh converges As noted by [5] in the general case, the asymptotic behavior of the limit processes C Z→Y xh and C Z→Y xh under α-mixing is the same as that under serial independence. In other words, their respective limits are the same as those identified by [26]. Thus, the impact of time-dependency is asymptotically negligible, somewhat as a consequence of Assumption (S) on the α-mixing coefficients combined with the use of a kernel function that smooths the covariate space in a shrinking neighborhood of x as n goes to infinity. Observe that when C Z→Y xh and C Z→Y xh are computed with the same bandwidth h, the limit processes C Z→Y x and C Z→Y x have the same covariance structure given implicitly by Hence, unlike for C Z→Y xh , the asymptotic bias of C Z→Y xh does not include the partial derivatives of F 1x and F 2x with respect to x.

Theoretical measures of local causality
Measuring the strength of the causal relationship from Z to Y in a stationary process {(Y t , Z t )} t∈Z can be done from the information provided by the local causality copula. Specifically, consider a general functional Λ : ∞ ([0, 1] 2 ) → R satisfying Λ(Π) = 0, Λ(M ) = 1 and Λ(W ) = −1, where Π(u, v) = uv, M (u, v) = min(u, v) and W (u, v) = max(u + v − 1, 0) are respectively the independence, perfect positive dependence and perfect negative dependence copulas. A measure of local causality from Z to Y at x based on Λ is then This measure has the desirable property of being marginal-free.

Nonparametric estimators
The estimation of the local causality index θ Z→Y Λ,x defined in Equation (8) i.e. the operator inside the brackets is computed with respect to the first two arguments of γ, and then Λ is computed again on Λ {γ(·, ·, u , v )} ∈ ∞ ([0, 1] 2 .

The Kendall and Spearman measures of causality
It is well known, see [22] and [17] for details, that the popular Spearman's rho and Kendall's tau associated to a continuous random pair with unique copula C can be expressed as Thus, these measures are marginal-free and can be expressed as functionals of C via ρ S = Λ ρ (C) and τ = Λ τ (C), where for δ ∈ ∞ ([0, 1] 2 ), Local causality indices based on the Spearman and Kendall measures are then ρ Z→Y The conclusions of Proposition 3.1 apply to the Spearman and Kendall functionals Λ ρ and Λ τ since both are Hadamard differentiable with respective derivatives (9) and, from [26],

Estimation of asymptotic biases/variances and bandwidth selection
In order to develop valid inference methods for a given local causality measure θ Z→Y Λ,x , the results stated in Proposition 3.1 cannot be used directly because the asymptotic variance σ 2 Λ,x as well as the asymptotic biases μ Λ,x and μ Λ,x must be estimated. Another aspect of importance is the choice of an appropriate bandwidth parameter with respect to some optimality criterion. These topics are treated in this section.

Estimation of the asymptotic variances
First observe that the process C Z→Y x can be seen as the weak limit of where ω i (x, h) is defined in (2) and for each i ∈ {1, . . . , n}, in Proposition 3.1 (i) arises as the limit in law of ; the last equality is a consequence of the linearity of Hadamard derivatives (see [25]). However, since the marginal conditional distributions F 1x , F 2x and the partial derivatives C [1] x , C [2] x are unknown, one considers instead the version of L xi given by In the above expression, C [1] x and C [2] x are uniformly consistent estimators of the partial derivatives of C Z→Y x in the sense that for any ε > 0, converge in probability to zero. A version of λ xh is therefore given by Based on the idea of [8] for the estimation of a conditional variance, the proposed estimator of σ 2 Λ,x is given by The consistency of σ 2 Λ,x for the estimation of σ 2 Λ,x is stated next. Proposition 4.1. Assume that Λ is Hadamard differentiable with derivative at g given by Λ g and that Assumptions (S), (C), (L), (H) and (N ) hold. Moreover, suppose that C [1] x and C [2] x satisfy (11) and that there exists a constant D > 0 such that as n → ∞, Then σ 2 Λ,x converges in probability to σ 2 Λ,x . An alternative estimator of σ 2 Λ,x can be based on the approximate version of the process C Z→Y x based on the pseudo-observations. Specifically, let

Estimation of the asymptotic biases
The proposed estimator of the asymptotic bias μ Λ,x follows an idea similar to that of [21] by considering a local polynomial regression approach. Specifically, for some integer p ≥ 2 and some bandwidth h B > 0, consider Then an estimator of Roughly speaking, the rationale for the above estimator is that the theory of local polynomial smoothing allows to deduce that β Λ,j consistently estimates the j-th derivative of the mapping For any z in a neighbourhood of x, one then has The consistency of μ Λ,x as an estimator of μ Λ,x is established next.

Proposition 4.2. Under the conditions of Proposition 4.1, and if in addition
The estimation of the asymptotic bias μ Λ,x of the estimator θ Z→Y Λ,xh based on C Z→Y xh can be done similarly by using λ xi instead of λ xi ; the resulting estimator is noted μ Λ,x in the sequel. Its consistency as an estimator of μ Λ,x could be established similarly as in the proof of Proposition 4.2, again by replacing Assumptions (H) and (N ) by Assumptions (H ) and (N ). Note that Assumption (H B ) is no longer required since the marginal distributions have been uniformized.

A note on the estimation of Hadamard derivatives
When the functional Λ is linear, its Hadamard derivative is free of g, i.e. Λ g = Λ for all g; this happens in particular for the Spearman functional, see Equation (9). Otherwise, Λ g needs to be estimated. In that case, if one can find an then the conclusions of Proposition 4.1 and Proposition 4.2 remain valid. This situation occurs for the Kendall functional, for which a natural plug-in estimator of its derivative in the light of (10) is given by When working with the second estimator τ Z→Y xh , one uses C xh instead of C xh in the above expression.

Data-driven algorithms for the selection of optimal bandwidths
The computation of the estimator θ Z→Y Λ,xh of the local causality index θ Z→Y Λ,x requires the choice of the bandwidth h. From Proposition 3.1 (i), one deduces that is approximately Normal with variance σ 2 Λ,x /nh and mean A strategy described for instance by [10] and [18] is to select the bandwidth that minimises the asymptotic mean-squared errors, which in the current context is It is then easy to see that the minimum of AMSE Λ,x is achieved when Since σ 2 Λ,x and Λ C Z→Y x (B x ) are unknown, an idea is to first estimate these quantities with the estimators introduced in subsection 4.1 and subsection 4.2, namely σ 2 Λ,x and β Λ,2 /h 2 B , by using pilot bandwidths and then plugging these values into the expression for h Λ,x in (13). The procedure can then be repeated recursively until some convergence criterion is reached; see [10], for instance. Specifically, let cst be a constant whose value remains bounded as n increases and fix p ≥ 2. Then, proceed as follows: Λ,x and β Λ,2 ; (4) Repeat Steps 1-3 with h 0 = h Λ,x until a convergence criterion or a maximum number of iterations is reached.
It can readily be verified that the above algorithm ensures that h 0 and h Λ,x satisfy Assumption (N ), while h B satisfies Assumption (N B ). In practice, cst is set as the inter-quartile range of Y 1 , . . . , Y n , as suggested for instance by [13]. The procedure is more involved for the estimator θ Z→Y Λ,xh since one needs to first select the bandwidths h 1 , h 2 that are necessary for the computation of the pseudo-observations in Equation (6). To this end, one follows the approach suggested by [13] for the estimation of the marginal conditional distributions F 1xh1 and F 2xh2 in order to first select h 1 and h 2 in an optimal manner. Then, conditionally on these optimal choices h 1 and h 2 , the pseudo-observations in (6) are computed and the previously described algorithm is run using the estimators of the asymptotic variance and bias of θ Z→Y Λ,xh .

Confidence intervals
A confidence interval of level 1 − α for θ Z→Y Λ,x based on the asymptotic normality of θ Z→Y Λ,xh established in Proposition 3.1 (i) is where for Φ −1 being the inverse cumulative distribution function of the standard Normal, z α = Φ −1 (1−α). Proposition 4.1 and Proposition 4.2 on the consistency of σ Λ,x and μ Λ,x ensure that this interval is asymptotically of The coverage probabilities of CI α,Λ,x and CI α,Λ,x in the case of small and moderate sample sizes is investigated in Section 6.

Testing for local non-causality
Saying that there is no local causality the local causality copula corresponds to the independence copula. In that case, the limit covariance structure of C Z→Y The following result is a special case of Proposition 3.1 and is stated without proof. is asymptotically Normal with mean μ Λ,x and variance

Similarly, as long as (S), (C), (L), (H ) and (N ) hold,
√ nh θ Z→Y Λ,xh is asymptotically Normal with mean μ Λ,x and variance σ 2 Λ,Π,x . Proposition 5.1 can be exploited to test the null hypothesis of local noncausality at x from Z to Y . In that case, the null and alternative hypotheses are are known values in the expression for σ 2 Λ,Π,x , an estimator of the asymptotic variance under local non-causality is Test based on the statistics θ Z→Y Λ,xh and θ Z→Y Λ,xh will then reject the null hypothesis of local non-causality whenever For the Spearman measure of local causality, one has from (9) that while for the Kendall measure, (10) entails These values correspond to the well-known asymptotic variances of Spearman's rho and Kendall's tau under independence.

Preliminaries
The goal of this section is to investigate the performance of the methodologies introduced in this work when local causality is measured using the Spearman and Kendall functionals. As suggested by [13], modified versions of Kendall's measure has been employed, namely and similarly for τ Z→Y xh . These versions are asymptotically equivalent to those defined from the functional Λ τ . The results that will be reported have been obtained using the local linear weights in (2) based on the Epanechnikov kernel The latter satisfies Assumption (L) and one can show that a 1 = 1/10 and a 2 = 3/5. The estimation of the partial derivatives C [1] x and C [2] x , which is needed to estimate the asymptotic biases and variance, will be based on the finite-difference estimators and The latter are uniformly consistent and fulfill condition (12) needed in Proposition 4.1 and Proposition 4.2. For the estimation of the biases, many experiments suggest that taking p = 3 provides the more accurate results. Finally note that the bandwidth selection used the convergence criterion |h 0 − h Λ,x | < 5 × 10 −4 .

Accuracy of the local causality estimators
The performance of the nonparametric local causality measures ρ Z→Y xh , ρ Z→Y xh , τ Z→Y xh and τ Z→Y xh is studied here in the light of their bias and mean-squared errors. To this end, time series have been simulated from the stationary vector autoregressive model of order one given by With this particular choice of parameters, (Y t , Z t , Y t−1 , Z t−1 ) is centered Normal with some covariance matrix depending on θ 1 , θ 2 , θ 3 in such a way that the local causality copula C Z→Y x is the Normal copula with parameter This model satisfies Assumptions (S), (H), (H ) and (C). In Figure 1, the estimated bias and mean-squared error of ρ Z→Y as a function h (not presented here) are very similar, however, in accordance with the fact that they are asymptotically equal. Looking at the curves on the right panels, one sees that the data-driven selection rule for an optimal bandwidth is quite accurate in the sense that it minimizes the mean-squared error. All the above comments can also be made about τ Z→Y xh and τ Z→Y xh upon looking at Figure 2.

Coverage probability of interval estimations
The aim of this subsection is to evaluate the finite-sample coverage probabilities of interval estimations of ρ Z→Y x and τ Z→Y x as described in subsection 5.1. To this end, a general D-Vine structure for bivariate processes suggested by [2] will be used; this construction allows for the choice of the local causality copula C Z→Y x ; for the upcoming simulations, C Z→Y x is the Normal copula parameterized in such a way that τ Z→Y x ∈ {.0, .2, .4}. The algorithm of [2] necessitates to select a vector (C 1 , C 2 , C 3 , C 4 ) of four copulas, where C 1 is the copula of (Y t , Z t ), The results in Table 1 concern the estimated coverage probabilities of 95% interval estimates based on ρ Z→Y xh and ρ Z→Y xh , while those in Table 2 concern τ Z→Y xh and τ Z→Y xh . The copulas (C 1 , C 2 , C 3 , C 4 ) are Normal with Kendall's tau (τ 1 , τ 2 , τ 3 , τ 4 ) ∈ { (.3, .1, .1, .1), (.05, .05, .05, .05), (.5, .3, .3, .05), (.5, .3, .75, .05)} . These scenarios generate various kinds of serial structures among the data. The results have been obtained by setting the bandwidth to h /2 instead of the optimal bandwidth h , since many numerical experiments suggested that it generally provides better results in terms of coverage probabilities. It is an indication that in the context of interval estimation, what matters the most is the bias; looking at Figures 1-2, the latter is minimized for h taken smaller than h .
First of all, the coverage probabilities tend to be closer to their 95% confidence level as n increases; it is a simple consequence of the fact that the intervals are based on the asymptotic normality of the estimators. In Table 1, it can be seen that the estimated probabilities based on ρ xh tend to be closer to their nominal level than those based on ρ xh when n ∈ {250, 500}, while the results are similar when n = 1000. This is particularly true when (τ 1 , τ 2 , τ 3 , τ 4 ) = (.5, .3, .75, .05) and τ Z→Y x ∈ {.0, .2}. Similar observations can be drawn from Table 2.

Power of the tests of local non-causality
Consider testing the null hypothesis H 0 of the local non-causality from Z to Y at x, i.e. the conditional independence between Y t and Z t−1 given Y t−1 = x. To this end, one considers again the D-Vine structure for (Y t , Z t ) t∈Z described in Subsection 6.3. Here, the local causality copula C Z→Y x is taken to be either the Normal or the Clayton copula; the latter is defined by  Table 3. Under the null hypothesis of non-causality, i.e. when τ Z→Y x = .0, the four tests are slightly too liberal when n = 250; nevertheless, the estimated probabilities of rejection of H 0 tend to the theoretical 5% nominal level. As expected, the ability of the four tests at rejecting the null hypothesis under departures from H 0 increases with the sample size and is larger when τ Z→Y

Illustration on financial data
The following illustration is based on the bivariate time series of the 1 512 daily observations taken between January 2010 and January 2016 for the compounded changes in prices (returns) and trading volume of the Standard and Poor's 500 (S&P500) Index. The relationship between these two indices has been extensively studied, both from a theoretical and from an empirical perspective. According to the tests of stationarity reported in [6], one will work instead with the first difference in logarithmic returns (Y ) and with the first difference in logarithmic volume (Z). Put differently, Z and Y are respectively the log of the ratio of two consecutive recorded values of stock return and stock volume. Y t and Z t are therefore indicators of the growth rate of respectively stock return  and stock volume from period t − 1 to period t, and the upcoming conclusions will have to be interpreted accordingly. The causality from Z to Y is then investigated from the sample (Y 2 , Z 1 , Y 1 ), . . ., (Y 1511 , Z 1510 , Y 1510 ). For these data, the value of the partial correlation coefficient of (Y t+1 , Z t ) given Y t is −0.024 × 10 −4 , leading to the conclusion of a global non-causality (p = .36). However, such a conclusion can be misleading when the relationship between Y t+1 and Z t changes according to the value taken by Y t . This is exactly what happens here. For example, if one considers the sub-  , τ 2 , τ 3 , τ 4 ) = (.30, .10, .10, .10); lower panel: (τ 1 , τ 2 , τ 3 , τ 4 ) = (.50, .30, .75, .05 sample for which Y t > 0, then the partial correlation coefficient is 0.072, which this time is significantly different from zero (p = .039). On the other hand, the subsample for which Y t < 0 leads to a partial correlation of −0.095, which also significantly departs from zero (p < .01).
In order to take into account the levels of Y t , a solution is to rely on local causality indices as introduced in Section 3. Figure 3 reports the values of ρ Z→Y xh , ρ Z→Y xh , τ Z→Y xh and τ Z→Y xh as a function of x, together with 95% point-wise confidence intervals. The values of x that have been considered range between the 10-th and 90-th percentile of the Y . It is clear from these figures that the level of Y t has an influence on the four local causality indices.
To interpret the recorded values at Figure 3, note that a positive value taken by an index under consideration (for e.g. at x ≈ 0.01) suggests that, in cases  where the first difference in logarithmic volume at period t is given by x, then large (resp. small) values of the return's growth rate Z t tend to be followed by large (resp. small) values of the volume's growth rate Y t+1 . Quite the opposite, when a local causality index is negative (for e.g. at x ≈ −0.01), this conveys the idea that given Y t = x, large (resp. small) values of Y t+1 generally occur after small (resp. large) values of Z t .
From Figure 3, it can be seen that causal relationship between Y and Z appears to be stronger for negative Y t , i.e. when the ratio of the volume at period t and the volume at period t−1 is less than 1, i.e. when there is a decrease in stock volumes from period t − 1 to period t. This technically suggests that, in these cases, the return's growth rate at period t is more likely to influence the volume's growth rate at period t + 1 compared to when Y t is positive, i.e. compared to when there is an increase in stock volumes from period t − 1 to period t. To complement the above analysis, the tests for the null hypothesis of local non-causality have been performed for selected values of x; the results on the p-values of the tests are in Table 4. First note that the four tests are in agreement on the acceptance or rejection of H 0 at the 5% level. Hence, for for testing at multiple values of x leads to the stronger statement that all of the reported p-values that were significant at the prescribed point-wise 5% level are also jointly significant at a family-wise error rate of 5%, except for the test based on τ Z→Y xh at x = 0, which is no longer regarded significant after the rectification for multiple testing. These results are in accordance with the conclusions of the tests of non-causality based on the partial correlation coefficient. This is also in agreement with the curves in Figure 3, where it can be observed that when x < 0.005, the 95% confidence bands all lie below zero, indicating a negative causal relationship; on the other side, these confidence bands are above zero for x > 0.009, suggesting a positive causal relationship.

A.3. Assumptions for the consistency of μ
Inference on local causality x is a linear functional and C Z→Y x is Gaussian, Lemma 3.9.8 in [25] entails that Θ Z→Y Λ,xh converges in distribution to a normal random variable Θ Z→Y Λ,x with mean Next, upon noting that the variance of Θ Z→Y Λ,x can be written (ii) By the same arguments as for Θ Z→Y Λ,xh , one can write where Θ Z→Y Λ,x is a normal random variable with variance σ 2 Λ,x and mean

B.2. Proof of Proposition 4.1
Consider the version of L xi given by and . The condition in (12) on the estimators C [1] x and C [2] x of the partial derivatives of C Z→Y Next, introducing the linear functional Υ : x (u, v) δ(u, 1) − C [2] x (u, v) δ (1, v), and letting Now observe that for the term inside the brackets, Inference on local causality , an application of the Continuous Mapping Theorem entails that as n → ∞ Using the fact that S n,0 ( , see for instance [20],

B.3. Proof of Proposition 4.2
For ( where L * xh is as in Equation (14). Also define the diagonal matrix W with i-th as well as the matrix X ∈ R n×(p+1) whose entries are given by From p. 59 in [8], one has for [A] 3 being the third row of matrix A that where it is understood that a functional applied to a vector is taken componentwise. As a primary step, consider The following Lemma is helpful to derive the limiting behaviour of β 2,Λ . In the sequel, let Y(y, z) = (I(Y 2 ≤ y, Z 1 ≤ z), . . . , I(Y n+1 ≤ y, Z n ≤ z)) .

C.2. Proof of Lemma C.1
For u, v ∈ [0, 1], consider the versions of Z (j) kxn and Z (j) kxn given by where it is understood that | · | is the Manhattan distance. First, for any fixed u ∈ [0, 1] 2 , a consequence of Theorem 6 in [19] is that U xn (u) is asymptotically normal, and thus asymptotically tight in R. It then remains to show that (19) holds. To this end, let κ γ = (nh B ) 1/2+γ for some γ ∈ (0, 1/2) and define the product space T κγ = I κγ ×I κγ , where I κγ = {0, 1/κ γ , 2/κ γ , . . . , 1}. Next, for any u ∈ [0, 1], define u κγ = max{ζ ∈ I κγ : ζ ≤ u} and u κγ = min{ζ ∈ I κγ : ζ > u}, and for any u = (u 1 , u 2 ) ∈ [0, 1] 2 , let u κγ = (u 1κγ , u 2κγ ) and u κγ = (u 1κγ , u 2κγ ). As under Assumption (L) K is positive and vanishes outside of [−1, 1], one has for any z ∈ R that |K(z)z j | ≤ K(z). One can then write In order to bound the second term on the right hand side of (20), one has in view of Assumption (H B ) that a Taylor expansion of order p + 1 allows to write, uniformly in y ∈ R, where ξ w,x lies between w and x for any w ∈ J x . Under Assumption (H B ), there exists η > 0 such that for j ∈ {1, . . . , p + 1}, Also, from Assumption (L) and routine computations, one has for n → ∞ that Hence, Hoeffding's inequality combined with (21) and (22) yields where the last equality is a consequence of the definition of κ γ and Assumption (N B ). Hence, from (20) and (23), Since for δ > 2κ γ , |u − v| < δ implies |u κγ − v κγ | < 2δ, To this end, Lemma 2 of [1] will be used (see also Theorem 3 and the remarks on page 1665 in [3]). The first step is to show that there exists η > 0 and β > 1 such that for any > 0 and n sufficiently large, it holds that for any rectangle A ⊆ [0, 1] 2 whose corner points are all distinct and lie in T κγ , where χ h B (A) = μ(A) + χ h B (A), with μ being Lebesgue's measure and From the proof of Lemma 2 in [5] (see Equation (12) where From the definition of μ and computations similar as those used for the derivation of (23), one can show that for n sufficiently large and any A ⊆ [0, 1] 2 whose corner points are all distinct and lie in T κγ , Because κ γ = (nh B ) 1/2+γ , one has from (26) and the fact that Assumption (N B ) ensures that nh B > 1 when n is large, that for any β ∈ (1, 1 + a −1 ),

T. Bouezmarni et al.
Since a > 6, taking any γ ∈ (0, 1/2) entails that for n sufficiently large, Next, one obtains from the left hand side of (26) that . Therefore, For the first term on the right-hand side of the previous inequality, observe that as a > 6, taking γ ∈ (0, 1/8) and β ∈ (1, 1 + a −1 ) yields (1 + 2γ)β < 2; this implies that for n sufficiently large, n (β, A) ≤ 2. One can then invoke (25) and (27) in order to deduce that there exists β > 1 and η > 0 such that for n taken sufficiently large, Equation (24) then follows from an application of the Markov inequality. Since χ h B is a finite measure on [0, 1] 2 , one uses Lemma 2 of [1] to conclude from similar computations as those presented at the end of the proof of Proposition 1 in [5] that the first term on the right hand side of (24) is asymptotically negligible as δ → 0. This in turn entails (19), so the proof is complete.