Estimation of a bivariate conditional copula when a variable is subject to random right censoring

This paper is concern with studying the dependance structure of a random pair ( Y 1 , Y 2 ) conditionally upon a covariate X when the variable variable Y 1 is subject to random right censoring. The dependence structure is described by a conditional copula, deﬁned as the function C x : [0 , 1] 2 → [0 , 1] such that P ( Y 1 ≤ y 1 , Y 2 ≤ y 2 | X = x ) = C x { P ( Y 1 ≤ y 1 | X = x ) , P ( Y 2 ≤ y 2 | X = x ) } . In this paper, we propose a procedure to estimate the conditional copula when the variable Y 1 is censored. We establish the asymptotic properties of the proposed estimator. Its ﬁnite sample behavior is then investigate in a numerical study. The methodology is illustrated through a real data example featuring patients with malignant melanoma.


Introduction
Copulas have become a popular tool to model dependence. Recently, many works in this field have been concerned with capturing the influence of a covariate X ∈ R on the dependance structure of a vector of interest (Y 1 , Y 2 ) ∈ R 2 . An example is given in Gijbels et al. (2011), where a copula function is used to illustrate how the relationship between the life expectancy of men (Y 1 ) and women (Y 2 ) varies with the gross domestic product (X). To describe this copula function, consider the conditional joint distribution of (Y 1 , Y 2 ) given X = x, for a real number x, given by F x (y 1 , y 2 ) = P (Y 1 ≤ y 1 , Y 2 ≤ y 2 |X = x). The conditional marginal distributions of Y 1 and Y 2 given X = x are obtained from F x via F 1x (y) = lim w→∞ F x (y, w) and F 2x (y) = lim w→∞ F x (w, y). If F 1x and F 2x are continuous, then Sklar's theorem ensures that there exists a unique copula C x : [0, 1] 2 → [0, 1] such that F x (y 1 , y 2 ) = C x {F 1x (y 1 ), F 2x (y 2 )}. Conversely, the copula associated to the bivariate conditional distribution F x can be extracted from the formula The bivariate function C x is called the conditional copula and contains all the dependence feature of (Y 1 , Y 2 ) given a fixed value taken by the covariate. For that reason, it is an important task to be able to estimate C x . The topic of modeling and estimating conditional copula models have recently gained momentum, since the pioneer work of Patton (2006). For example, models specifying a functional connection between the covariate and a parametric copula were studied in Jondeau & Rockinger (2006) and Patton (2006). A nonparametric estimation procedure for this functional connection was proposed in Acar et al. (2011) while Abegaz et al. (2012) have considered an extension of this method to the case of unknown conditional marginal distributions. Assuming the availability of an i.i.d sample, a nonparametric approach has been investigated in Veraverbeke et al. (2011) and Gijbels et al. (2011), and a bootstrap method suitable for this estimation procedure was developed in Omelka et al. (2013).
However, all of the previously-mentioned estimation strategies rely on the full knowledge of the random variables (Y 11 , Y 21 , X 1 ), . . . , (Y 1n , Y 2n , X n ) and therefore reveal unsatisfactory when the data are incomplete. Amongst others, the right censoring scheme is a source of incompleteness that frequently appears in medical studies and clinical trials. On this matter, this occurs in a dataset, to be detailed later, featuring patients with malignant melanoma followed after their surgery for skin tumor.
For the unconditional case, the parametric and semiparametric estimation for the copula functions are studied by Shih & Louis (1995). A nonparametric estimation of the distribution functions for censored data are considered by Dabrowska (1988) and Akritas & Van Keilegom (2003) among others. The nonparametric copula estimation is investigated recently by Gribkova & Lopez (2015) under different scenarios of censoring. Also, they have proposed Goodnees-of-fit procedure for copula when the data are right censored data.
The purpose of this work is to propose a proper methodology designed to estimate the conditional copula when a variable is subject to random right censoring. In that case, the true event time is not recorded for one component but instead a smaller time is observed. To be more specific, assume Y 2 is completely observed, and that the observed random variables are and C 1 , . . . , C n are independent and non-negative censoring variables. Hereafter, the conditional distribution function of C i given X i = x i will be denoted where H x i (t, y) = P{T i ≤ t, Y 2i ≤ y | X i = x i } is the distribution function of the survival times with marginals H 1x i and H 2x i . This paper is organized as follows. In Section 2, we propose an estimator for the conditional copula in presence of censoring. This estimator relies on a nonparametric estimator for the joint conditional distribution, which is another original contribution of this paper. In Section 3, we investigate the asymptotic properties of these estimators by providing an asymptotic i.i.d. representation for the conditional distribution estimator, and by identifying the weak limit of properly re-scaled version of these estimators. A simulation study showing the performance of the conditional copula estimation procedure is presented in Section 4. In Section 5, we apply this methodology to the melanoma dataset to illustrate the influence of tumour thickness on the relationship between the survival time and the age of a patient after surgery. All the required assumptions and conditions for the theoretical validity of the results presented in Section 3 are provided in Section 6. The proofs are given in the Appendix.
2. An inverse-conditional-probability-of-censoring estimator for C x Let (Y 11 , Y 21 , C 1 , x 1 ), . . . , (Y 1n , Y 2n , C n , x n ) be a random sample of independent and identically distributed vectors. In this work, for the theoretical developments, we consider the fixed design, which means the case where X is not random. The results can be extended from a fixed design to a random design by using quite similar arguments in the appendix and simply replacing x i by X i and using O P and o P instead of O and o in the assumptions stated in the appendix.
As previously mentioned, the estimation of C x from i.i.d. observations has been considered by Veraverbeke et al. (2011) and Gijbels et al. (2011). In fact, from independent (and fully observed) random variables (Y 11 , Y 21 , x 1 ),. . ., (Y 1n , Y 2n , x n ), consider the estimator of the joint conditional distribution F x given by where w n1 (x, h), . . . , w nn (x, h) are non-negative kernel-based weight functions that smooth the covariate and h = h n is a bandwidth parameter that typically depends on the sample size. Popular choices for theses functions are the Nadaraya-Watson and the Local-Linear weights given respectively by S n,1 S n,0 S n,2 − S 2 n,1 , where K is a symmetric and continuously differentiable kernel density function on [−1, 1] and for j ∈ {0, 1, 2}, The conditional empirical marginal distributions extracted from F xh are simply F 1xh (y) = lim w→∞ F xh (y, w) and F 2xh (y) = lim w→∞ F xh (w, y).
From representation (1), a natural plug-in estimator of C x is given by where for j = 1, 2, F −1 jxh (u) = inf{y ∈ R : F jxh (y) ≥ u} is the left-continuous generalized inverse of F jxh .
The goal of this section is to propose an estimator for the conditional copula in order to take into account the presence of censoring on the variable Y 1 . To do this, we need an estimator for the conditional distribution function F x . In the unconditional context (i.e without a covariate), the nonparametric estimation of the bivariate distribution of (Y 1 , Y 2 ) in presence of censoring have been studied by many authors, see for example Dabrowska (1988), Akritas (1994) and Akritas & Van Keilegom (2003). However, to the best of our knowledge, the nonparametric estimation of F x have never been addressed and hence is an original contribution of the present paper.
To built our estimator for F x , we use a similar idea as the one originally exposed in Robins & Rotnitzky (1992). To compensate for the presence of censoring, each uncensored observation receives an extra weight equal to its inverse probability of failiure. This idea is motivated by the fact that As G x is unknown, we simply replace it with the conditional Kaplan-Meier estimator for the censoring variable C. This estimator is given by: where T (1) ≤ . . . ≤ T (n) are the ordered T i , and δ [i] and w n[i] (x, h) are respectively the corresponding δ i and w ni (x, h). Here, g = g n is an auxiliary bandwidth parameter that may differ from h. The resulting estimator for the conditional distribution function F x is then given by Although F (rc) xh depends on the bandwidth g, the latter is omitted for notational simplicity. This estimator can be seen as a conditional bivariate analogue to the inverse-probability-of-censoring estimator proposed in Robins & Rotnitzky (1992). Notice that when no censoring occurs, F (rc) xh (t, y).
When g = h, we can show that F Following Equation (1), a plug-in estimator for the conditional copula could be defined as However, this estimator does not properly take advantage of the fact that Y 2 is completely observed. Instead of using F (rc) 2xh , consider the estimator Note that this expression does not, in general, coincide with F (rc) 2xh . Nevertheless, F 2xh uses all of the available knowledge related to F 2x . Then, one proceeds to the estimation of C x with When all the survival times are observed, it follows that C (rc) xh is equal to C xh . Moreover, upon setting all the weight functions w ni (x, ·) equal to n −1 , we retrieve a very similar estimator as the one proposed in Gribkova & Lopez (2015) to estimate the (unconditional) copula. The only difference is in the estimation of the second marginal distribution.

Main theoretical results
The aim of this section is to investigate the large sample behaviour of the processes For any distribution function L, let τ L be the right endpoint of its support, i.e inf{t : L(t) = 1}, and write τ x = min{τ F 1x , τ Gx }. It is a well known problem in life time analysis that the tail support of the distribution of a random variable may not be identifiable due to right censoring (see Stute (1994)). This occurs when the support of the censoring variable is included in the support of the variable of interest, i.e when τ Cx < τ Fx . As a consequence, we cannot hope to infer on the conditional distribution beyond τ x . Nevertheless, we next establish the asymptotic behavior of F (rc) xh over any closed subset included in [0, τ x ] × R.
Hereafter, the sub-distribution function of the uncensored observations will be denoted by H u To identify the asymptotic i.i.d representation for F (rc) xh , we introduce the following random functions . The required assumptions for the next theorem can be found in Section 6.
Theorem 3.1. Suppose that nh 5 log(n) = O(1), max(g, h) → 0, ng 5 log(n) = O(1) and h g = O(1). Assume Conditions W 1 -W 5 are satisfied, and suppose that In some way, Theorem 3.1 decomposes the random function F (rc) xh into two components. The first component can be associated to the estimation of a conditional distribution function provided that the conditional probability of censoring is known. In other words, it appears as a mildly modified and properly re-scaled version of the random function presented in Equation (3). From this perspective, the second component appears as a consequence of estimating the conditional probability of censoring.
We note that when no censoring occurs, then J Next, as y goes to infinity, one has F (rc) 1xh is equal to the conditional Kaplan-Meier estimator introduced in Beran (1981). Hence, the random function √ nh{F (rc) 1xh − F 1x } reduces to the conditional Kaplan-Meier process studied in Van Keilegom & Veraverbeke (1997). As expected, it is shown, in Appendix D.2, that which coincides with the asymptotic i.i.d representation given in Van Keilegom & Veraverbeke (1997).

Weak convergence of F (rc) xh
In view of Theorem 3.1, the large sample behavior of F (rc) xh will essentially depend on the conditions imposed on the weight functions and on the bandwidth parameters h and g. To establish its weak limit, we consider the mean zero gaussian processes J (1) x and J (2) x with covariance function and In the latter, the constants K 2 -K 3 are given in Assumption W 2 -W 3 . On one hand, the deterministic function b (1) x will appears as the asymptotic bias of the process n i=1 w ni (x, h)J (1) ix . Therefore, it would have been present even if G xg was replaced with G x in the definition of F (2) x will emerge as the bias related to the estimation of the conditional probability of censoring.
Corollary 3.2. Suppose that the assumptions of Theorem 3.1 are met. For x over T t .
(b) If g = h, and in addition if √ nh 5 → K 6 for some K 6 > 0, then xh converges weakly to a gaussian process having the representation J x := J (1) Remark 3.3. In view of Part (a) of Corollary 3.2, the impact of estimating the conditional probability of censoring is negligible, provided that the bandwidth g is asymptotically larger than h. However, it is still required that ng 5 < ∞, which means that g must not exceed n −1/5 . Also, because h/g → 0, we have h ∼ o(n −1/5 ). Therefore, choosing g larger than h excludes the optimal bandwidth parameter order for h in terms of mean squared error.
When the probability of censoring is 0, the term J (2) x is not present in the weak limit of F (rc) xh . Therefore, the asymptotic covariance function of F which corresponds to the asymptotic variance of the process √ nh(F xh − F x ) in the context of complete i.i.d data (see e.g. Veraverbeke et al. (2011)). Moreover, the bias reduces to which matches the asymptotic bias of the process √ nh(F xh − F x ).
Remark 3.4. As mentioned in Remark 3.3, using two different bandwidth parameters in the estimation of the conditional distribution excludes the theoretical optimal order for h. Nevertheless, this implies that the bias related to the estimation of F x provided G x is known, namely b (1) x , becomes neglible. Hence, in some cases, one might obtain a bias reduction at the cost of excluding the optimal order for h. Note, however, that the same dilemma traditionally occurs in nonparametric density estimation, referring to the decision to under-smooth or not.

Weak convergence of B (rc) xh
The next result states the weak limit of the conditional copula estimator under random censoring.
Proposition 3.5. Suppose that the assumptions in Theorem 3.1 are satisfied, and assume Condition (D), given in the appendix, regarding the partial derivatives of the conditional copula, is satisfied. For any 0 < t < τ x and by xh converges weakly in l ∞ ( T t ) to a gaussian process with the following representation: where α (rc) xh converges weakly in l ∞ ( T t ) to a gaussian process with the following representation As pointed out in Section 3.2, when all the survival times are completely observed, the term J (2) x reduces to 0. In this case, the covariance structure of the limit process α

Simulation study
The nonparametric estimation of the conditional copula involves a choice for the weight functions w n1 (x, h), . . . , w nn (x, h) that fulfills the required assumptions listed in Section 6.2. It is shown in Omelka et al. (2013) that the requirements W 1 -W 5 are satisfied, among others, by the Nadaraya-Watson and local linear weights, given in Section 2.
The simulation results that will be reported here have been obtained using the local linear weights with the triweight function K(y) = 35(1−y 2 ) 3 I(|y| ≤ 1)/32. When it happens, negative weights are taken to be zero and the remaining weights are simply re-scaled in order that they sum to one. As pointed out in Omelka et al. (2013), this modification is asymptotically negligible. Finally note that all the numerical experiments were also run using the Nadaraya-Watson kernel. As the results were very similar, they are not presented here.
The primary aim of this section is to evaluate the performance of the proposed conditional copula estimator with respect to the percentage of censoring, the influence of the covariate on the dependance and the effect of the sample size. This performance is evaluated by considering the average squared bias (ASB) and the average variance (AV). To be specific, if C x is some estimator of C x , then The latter have been estimated from 1 000 replicates under each of the scenario considered for x = 0.5 with n = 250 and n = 1 000 and K = 15. Also, the nonparametric estimation of C x requires a choice for either one or two bandwidth parameters. Indeed, an interesting aspect of Proposition 3.5 is that the limiting distribution of the copula process B (rc) xh differs in the case where g = h and h g → 0. The secondary aim of this section is to evaluate the impact of using a single or two bandwidth parameters in the estimation of C x . In the following, we denote by C (rc,1) xh and C (rc,2) xh the estimators resulting from the choices g = h and g = h respectively. Upon The covariate is generated from the standard normal and the estimation of the conditional copula is evaluate at x = 0.5. The copula which joins the marginals is either a normal copula C N or a Clayton copula C CL γ . Theses are defined for −1 < < 1 and θ > 0 by where ϕ is the bivariate standard Normal density with correlation and Φ is the standard Normal distribution. Their parameters will be set to vary with X in the following way. In fact, since the copula parameters are sometimes hard to interpret, it is convenient to quantify the dependence in a bivariate random vector using its corresponding value of Kendall's tau. For any copula C, its associate Kendall's tau can be written as For a given conditional copula C x , the conditional Kendall's tau is simply From the relationships between Kendall's tau and the parameters of the Normal and Clayton copulas, this can be done by setting As discussed at the beginning of Section 3.1, the tail support of the distribution of a random variable may not be identifiable due to right censoring when the support of the censoring variable is included in the support of the variable of interest, i.e when τ Cx < τ Fx . To evaluate the impact on the estimation of C x , the cases where τ Cx = τ Fx and τ Cx < τ Fx are examined separately in the following two sections.

τ Cx = τ Fx
Here, we have considered the case where τ Cx = τ Fx = ∞. To do this, the marginal distributions of Y 1i and Y 2i are generated from the exponential distribution with mean given by The censoring variable C i is also picked as an exponential but with mean c{1 + Φ(x i ) + Φ(x i ) 2 }. Hence, the probability of censoring conditional on X = x, denoted θ thereafter, is simply a a+c . The results are reported for a = 5 and θ ∈ {.2, .4, .6} in Table D.1 and D.2.

τ Cx < τ Fx
Here, we have considered τ Fx = ∞ and τ Cx < ∞. In that case, the marginal distributions of Y 1i and Y 2i are generated from the exponential distribution with mean λ x i . The censoring variable C i is generated from the uniform distribution over [0, const×λ x i ]. We can show that in this case, the percentage of censoring is given by The constant is chosen so that θ ∈ {0.2, 0.4}. We have also cover the scenario θ = 0, which corresponds to the situation when all the survival times are observed. The results are reported in Table D.3 and D.4.

Comments on the simulations results
From the obtained results it can be seen that, globally, when the association between Y 1 and Y 2 increases, the bias increases whereas the variance decreases slightly. Another interesting finding is that there is no significant difference between the results obtained with a single bandwidth (g = h) and double bandwidth (g = h), except for a large sample size (n = 1000). In this case, double bandwidth reduces the bias substantially without increasing the variance of the resulting estimator. The results from the Normal and Clayton copula are quite similar. Also, and as expected, increasing the percentage of censoring decreases the performances of the copula estimator both in terms of bias and variance. The opposite is observed regarding the effect of the sample size. A larger (smaller) bandwidth is needed when censoring (sample size) increase. Notice that we obtain much more accurate information on the conditional copula when τ Cx = τ Fx . When τ Cx < τ Fx , one needs a large sample size to get accurate estimates otherwise the results should be interpreted with care especially when the percentage of censoring is high. Finally, as for any kernel based estimator, we can see that a large bandwidth, typically, leads to a larger bias and smaller variance. This becomes clear with large sample size (see the results for n = 1000).

Illustrative example
In this section we consider a dataset that was used in Andersen et al. (1993). This data contains information on 205 patients with malignant melanoma that were followed for a period up to 15 years. The main variable of interest is Y 1 : the survival times after surgery for skin tumour. Other measured quantities include Y 2 : the age of the patient and X : the tumour thickness in mm. 134 patients were alive by the end of the follow-up period and 14 patients died of causes unrelated to melanoma. These patients are censored (status = 0) at their last observed duration time or death time. All the reaming patients died from melanoma and so they are uncensored (status = 1). The typical objective of such studies is to asses the effect of risk factors (like age and tumour thickness) on survival time. This is done usually by constructing a regression model with Y 1 as response and Y 2 and X as covariates. Before attempting to model the relations between these variables, it may be helpful to measure the strength of the relationship between them using model-free tools.
Kendall's tau is a popular coefficient that measure the concordancediscordance between two random variables. This coefficient lies in [−1, 1] and is equal to zero for independent random variables. In contrast to the well-known Pearson correlation coefficient, Kendall's tau does not require knowledge of the parametric form of the marginal distributions. For more details, see Nelsen (2006). A conditional version of this coefficient was suggested by Gijbels et al. (2011). In terms of copula, the population version of the conditional Kendall's tau of (Y 1 , Y 2 ) given X = x is A natural way to estimate this coefficient is to replace the unknown quantity C x in the above expression by its nonparametric estimatorĈ x given by (4). This can be expressed as Except the case when H 1x (t) = 1, the truncation in the integral above is needed becauseĈ x is inconsistent outside [0, H 1x (t)] × [0, 1], see Proposition 3.5 above. Unfortunately the quantity H 1x (t) is unknown and there is no obvious way to estimate it without imposing some restrictive assumptions on the data generating process. In practice one may consider (6) without the truncation, but then the results should be interpreted with care. Figure 1 (a) shows the scatter plot of the observed survival times on the y axis and age values on the x axis using different symbols for censored/uncensored observations and different colors for tumour thickness. From this figure it can be seen that there is a relationship between time and age : the survival time has tendency to decrease with increasing age. This tendency is not very strong as the estimated unconditional (global) Kendall's tau is only of −0.13. Figure 1 (b) shows the estimated conditional Kendall's tau between time and age given thickness. The dashed curve corresponds to the estimator, say T(Ĉ ic x ), obtained ignoring censoring, i.e. we consider all observed times as exact, and the solid curve is the estimator T(Ĉ x ) obtained using our method that takes into account censoring. For both estimators a bandwidth h = 0.93 was used. We can see that while the estimated conditional Kendall's tau coefficients remain negative their magnitude changes with thickness. When the latter increases, the absolute value of T(Ĉ x ) slightly increases to reach its maximum value of 0.242 when tumor thickness is 2mm and then it starts decreasing rapidly to reach 0. So unlike the global Kendall's tau which measures only the "average" association between time and age, T(Ĉ x ) gives us a more precise picture about this association accounting for the effect of tumor thickness. From the figure it seems that, except for large tumor thickness, the "uncorrected" estimator T(Ĉ ic x ) underestimates the strength of association between time and age.

(sub-)distribution functions
Smoothness conditions over F x , H x , H u x , H c 1x and G x are needed in the proof of Theorem 3.1. We formulate the conditions for a general (sub-)distribution function L x .
x (t, y) = ∂ 2 ∂t∂y L x (t, y) and L (2,2) x (t, y) = ∂ 2 ∂y 2 L x (t, y) exist and are continuous over V (x) × T t ; x (t, y) = ∂ 2 ∂y∂x L x (t, y) exist and are con- The following assumption is needed to guaranties the weak convergence of B (rc) xh .
(D). The partial derivatives C

Weight functions
Assumptions W 1 -W 5 below are required to establish the asymptotic behavior of √ nh{F rc xh − F x } stated in Section 3.  (1996). Uniform strong convergence results for the conditional kaplan-meier estimator and its quantiles.

Appendix A. Proofs of main results
In this section, all the expectations of the form E{f (T i , Y 2i , C i )} have to be understood as taken conditional upon X = x i . Formally, for any 1 ≤ i ≤ n, whenever the left-hand side of the integral exists.
We start by observing that F (rc) where for any (t, y) ∈ T t , (v, v ) ∈ R 2 and for any function G : R → [0, 1): To provide an i.i.d representation for we apply the ideas of van der Vaart & Wellner (2007). To this end, we introduce the operator E(·) defined over the set of random variables of the form δ 1 f (t, y, T 1 , Y 21 , G), . . . , δ n f (t, y, T n , Y 2n , G) such that whenever the right-hand side of the integral exists. Observe that when the function G is fixed (non random), it follows that Then, we consider the following decomposition : The process Z xh can be viewed as a process indexed by the family of functions from R 2 × {0, 1} → R given by Hence, each function f ∈ F may be formally identified by a triplet (t, y, G). The introduction of the process Z xh is motivated by the fact that A xh (t, h) = Z xh (t, y, G xg ) − Z xh (t, y, G x ). While the -enlargement in the definition of the class G t might appear overdone, it is however required to guaranty that G xg asymptotically fits into G t .
Finally, we equip the index set F with a semimetric ρ Fx defined for f = (t, y, G) and f = (t , y , G ) as Moreover, F := 1 1−t is an enveloppe function for F.
In fact, as Assumptions W 1 -W 4 -W 6 and C 1 are satisfied, we conclude from Appendix C.1 that the process Z xh := Z xh −EZ xh indexed by (F, ρ Fx ) is asymptotically ρ Fx -equicontinuous. This implies that for any η > 0 and η > 0, there exists δ > 0 such that For a particular choice of weight system w n1 , . . . , w nn , it is shown in Van Keilegom & Veraverbeke (1996) that as long as conditions (C 1 ) and (C 3 ) are satisfied for H x , G x and H c x , then there exist constants C 1 , C 2 > 0 that depends on the weight functions such that whenver > max{C 1 g 2 , C 2 1 √ ng }, we have for some constants C 3 and C 4 that rely on t and the weights. It can be shown for instance by using Lemma 3 in Omelka et al. (2013) that this result still holds for general w n1 , . . . , w nn , at the cost of perhaps enlarging the constants provided this weight system satisfy assumptions W 1 -W 5 . Hence, we obtain In view of Equation (A.2), we conclude that for any > 0, G xg ∈ G with probability 1. Moreover, as ρ Fx {(t, y, G xg ), (t, y, G x )} = sup t∈[0,t] |G xg (t) − G x (t)|, we deduce that for sufficiently large n: which concludes the proof.
Note that the random variable D xh (t, y) = E{A xh (t, y)} can be rewritten as The asymptotic representation of D xh will follow from the representation of hg −1 Λ x (G xg ) and the asymptotic negligibility of the two terms D xh − hg −1 Λ x (G xg ) and D xh − D xh .
For the asymptotic representation of hg −1 Λ x (G xg ), let It can be shown for instance by using Lemma 3 of Omelka et al. (2013) that the result stated in Theorem 2.1 in Van Keilegom & Veraverbeke (1997) still holds as long as conditions (C 1 )-(C 5 ) are satisfied for G x and G u x , provided assumptions W 1 -W 5 on the weight functions are fulfilled. Hence, we conclude that, uniformly in t ∈ [0, t], hg −1 G xg = √ nh n i=1 w ni (x, g)g ix + o a.s (1) as long as ng 5 log(n) < ∞. As the map Λ x (·) is linear and continuous, the continuous mapping theorem implies that Furthermore, from switching the order of integration and further computations, we show that Λ x (g ix )(t, y) = J (2) ix (t, y).
To show the asymptotic negligibility of D xh − hg −1 Λ x (G xg ), we observe that for any (t, y) ∈ T t , Condition (C 6 ) together with W 5 allows the Taylor expansion where z i lies between x i and x. Hence, from Assumptions W 2 , W 3 , W 6 and (C 6 ), we obtain that In view of the previous discussion, we use Equation (A.2) to obtain that uniformly in (t, y) ∈ T t , D xh − hg −1 Λ x (G xg ) = o a.s (1)O(1).

Finally, notice that
Hence, using Equation (A.2), we deduce that the latter is o a.s (1).
We start by showing that the sequence ix is asymptotically gaussian. Then, regarding the assumptions over the bandwidth h and g, we discuss the asymptotic representation of Let's prove that the process J (1) ix converges to the gaussian process J (1) x over T t . First, the tightness of the sequence J (1) xh can be checked using similar arguments as in Appendix A.1.1. Second, from direct computation, where last equality follows from a Taylor expansion of the function z → 1−G z around z = x together with the fact that from Assumptions C 1 and C 3 , G x ,Ġ x andG x are uniformly continuous over T t . We calculate the first term. In fact, where last equality follows from a Taylor expansion of the function z → F z around z = x together with the fact that F x ,Ḟ x andF x are uniformly continuous over where the constants K 2 and K 3 are defined via Assumptions W 2 and W 3 . As √ nhh 2 → K, Equation (A.4) reduces to Next, for the second and third terms of Equation (A.3) , we denote Next, as Assumption W 4 is satisfied, we use a similar strategy to obtain

In view of the tightness of J
(1) xh , the fact that

Cov{J
(1) and using Theorem 2.11.1 of van der Vaart & Wellner (1996), we conclude that J (1) xh converges weakly to a gaussian process whose representation matches the one of J (1) From the definition of g ix and of Λ x (·) in Appendix A.1.2, we ca re-write From Van Keilegom & Veraverbeke (1997), using Assumptions W 2 -W 3 together with C 1 and C 3 , be obtain that Hence, integrating by parts leads to: Furthermore, provided Assumption W 4 is satisfied, Van Keilegom & Veraverbeke (1997) also gives us Then, From the fact that   x (t , y )}. Since √ nhg 2 < ∞, and h g = O(1), we deduce from Equations (A.6) and (A.7) that the sequence n i=1 w ni (x, g)J (2) ix is asymptotically tight on T t .
We obtain Finally, the asymptotic representation of E( A xh ) will be given.
First, a weak consequence of Corollary 3.2 is that uniformly in (t, y) ∈ T t : Second, it follows from assumption W 1 that, for any sufficiently small > 0, Now we recall the definition of Z xh in Lemma AppendixC.1, and we note that , it follows from Equations (A.2) and (B.1) that The negligibility of A xh − E(A xh ) and A xh − E(A xh ) is then ensured by Lemma Appendix C.1 and identical arguments as the ones use in the end of Appendix A.1.1.

Appendix B.2. Asymptotic representation of E( A xh )
For any small > 0 such that H 1x (t+ ) < 1, one obtains from Equation (B.1) that uniformly in u ∈ [0, H 1x (t)] and with probability 1, F , and we define y xh and y x analogously. Recall, from Appendix A.1.2, that Hence, from the mean value theorem, we have for some t xh between t xh and t x . Further, since the g ix 's are bounded random variables, and since h g = O(1), we can show that

Appendix B.3. Asymptotic representation of E(A xh )
From the definition of E(·), we can write Conditions C 1 , C 3 and W 2 -W 3 allow to mimick the proof of Theorem 1 in Veraverbeke et al. (2011) to obtain, uniformly in (u, v) ∈ T t , that Substituting v by 1 in the previous Equation with the asymptotic negligibility

Appendix C. Auxiliary Lemma
The following lemma is required in Appendix A.1.1 to establish the i.i.d representation for F (rc) xh . Lemma Appendix C.1. Recall the definition of Z xh , F and ρ Fx at the beginning of Appendix A.1.1. Suppose Assumptions W 1 ,W 4 and W 6 are satisfied, and that the maps z → F 1z and z → F 2z are uniformly continuous for all z in a neighborhood of x. Then, process Z xh := Z xh − EZ xh indexed by (F, ρ Fx ) is asymptotically ρ Fx -equicontinuous.
Proof. From Theorem 2.11.1 of van der Vaart & Wellner (1996), we can conclude that Z xh is tight if the following requirements hold: the covering number of the set F with respect to the random semimetric In the latter, · F stands for the supremum norm over F.
From Assumption W 1 , it follows that the latter is o(1). Hence, for any η > 0, one can find N > 0 such that for all n ≥ N : max 1≤i≤n Z hi (f ) < η, proving that requirement R 1 is fulfiled.
(R 2 ): Assume wlog that t ≤ t and y ≤ y . It is useful to note that when From the last equation, one directly obtains From Assumption W 4 , we obtain that nh n i=1 w ni (x, h) 2 = O(1). Moreover, as Assumption W 6 holds together with the uniform continuity of the maps z → F 1z and z → F 2z , we deduce that the latter display is bounded by O(1){2δ n + o(1)}. Hence, requirement R 2 is fulfilled as δ n → 0.
(R 3 ): To show the last requirement, the goal is to apply Lemma 2.11.6 of van der Vaart & Wellner (1996). To do this, three conditions must be verified. First, we rewrite Hence, the process Z n is measurelike with respect to the random measure µ ni (see van der Vaart & Wellner (1996), Section 2.11 ). Second, as Assumption W 4 holds, n Third, it is required to show that the class F satisfy the uniform entropy condition (2.11.5) of van der Vaart & Wellner (1996). To do this, let and F 2 be the class of monotone and bounded functions over [0, 1 1−t ]. Now we observe that F ⊂ F 1 F 2 = {f = f 1 f 2 , f 1 ∈ F 1 , f 2 ∈ F 2 }. As F 1 is a VCclass and F 2 is a VC-hull class for sets with enveloppe functions respectively F 1 = 1 and F 2 = 1 1−t , an application Lemma 2.6.20 of van der Vaart & Wellner (1996) allows to conclude that F 1 F 2 is VC-hull class for sets with envelope function F 1 × F 2 = F. Therefore, the uniform entropy condition is fullfiled. As a result, the conclusion Lemma 2.11.6 of van der Vaart & Wellner (1996) applies which proves the requirement R 3 .

Appendix D. Auxiliary results
The following Proposition establishes that F Then, for any t ∈ R, F 1xh (T (k) ). To do this, we use the following induction argument.
Hence, the Basis step is verified.
Induction step: Assuming that the equality F (rc) 1xh (t) holds for t = T (0) up to t = T (k) , let's show that the equality is verified for t = T (k+1) . From direct computations, , where the latter equation follows from the induction hypothesis. If δ [k+1] = 0, we use the induction hypothesis to obtain . .
The next Lemma shows that, upon setting y = ∞, the i.i.d representation for F (rc) xh found in Theorem 3.1 is the same as the one obtained in Van Keilegom & Veraverbeke (1997) in the case where the bandwidths g and h required for F (rc) xh are equal. Lemma Appendix D.2. The following identity holds : In order to show the result, we first deal with the terms that contains the T i 's in the sum lim y→∞ {J (1) ix (t, y)}. In view of Equation (D.1), we have Therefore, Then, from the identity H u Proceeding similarly, we obtain that Then, adding these terms leads to which concludes the proof.