Multidimensional hazard estimation under generalized censoring

: This paper focuses on the problem of the estimation of the cumulative hazard function of a distribution on d -dimensional Euclidean space when the data points are subject to censoring by an arbitrary adapted random set. A problem involving observability of the estimator proposed in [8] and [9] is resolved and a functional central limit theorem is proven for the revised estimator. Several examples and applications are discussed, and the validity of bootstrap methods is established in each case.


Introduction
In two recent papers, [8] and [9], the problem of survival analysis on a general complete separable metric space was approached from the point of view of set-indexed martingales (cf. [6]). The goal of these papers was to build a mathematical structure that would handle data on general spaces (including, but not limited to d-dimensional Euclidean space) subject to very general types of censoring mechanisms. In particular, the cumulative hazard function was defined on a class of sets, and a corresponding Nelson-Aalen-type estimator was proposed.
In [8], the censoring mechanism in a set-indexed framework was defined by introducing the concept of a stopping set, a random set with a particular kind of measurability. The stopping sets acted as windows, since we could observe the data points only through them, and they permit the consideration of more sophisticated censoring schemes than the usual multivariate generalizations of right-censoring; however, the stopping sets were still restricted to certain shapes.
The theory of stopping sets was expanded to more general random sets called anti-clouds in [9]. The name is inspired by the example of aerial photography, where pictures are taken from high in the air and clouds may interfere with the observation of whatever the subject of interest is. Anti-clouds, then, are the complement of the clouds: the regions where we can actually observe the data points. The difference between stopping sets and anti-clouds is, quite simply, that anti-clouds have virtually no restriction on the shape they can take; obviously, this makes for much more realistic censoring schemes.
The measurability requirement imposed on these random sets allow one to apply the theory of set-indexed martingales as developed in [6] to produce a setindexed Nelson-Aalen estimator for the cumulative hazard on a general complete separable metric space in the presence of generalized censoring. In the case of censoring by stopping sets, consistency and asymptotic unbiasedness of the estimator, as well as asymptotic normality of its finite-dimensional distributions, are proven in [8]. The Nelson-Aalen estimator in the presence of censoring by clouds presented more of a challenge, due to the general geometric nature of the clouds; nevertheless, the estimator was shown to be consistent and asymptotically unbiased in [9].
Unfortunately, there were still some gaps in the general theory. Aside from the lack of a functional central limit theorem for the general estimator of [9], there remained a critical problem limiting the applicability of the estimator. As will be seen subsequently, a process needed to construct the Nelson-Aalen estimator may not be observable under some common data structures. In practice, this could render the estimator unusable much of the time.
Our first goal is to address the observability problem and produce a working estimator for the cumulative hazard, while still preserving the general censoring model of [9]. It happens that the root of the observability problem lies in the measurability requirement of the anti-clouds. By asking for a slightly weaker measurability condition, it is possible for us to achieve our objective. We call the new kind of random set that came from this modification a * -anti-cloud. If the complements of these sets, the * -clouds, act as the censoring in the survival model, then we are always able to construct a Nelson-Aalen estimate that can be observed.
Our second goal is to prove a functional central limit theorem for the Nelson-Aalen estimator under generalized censoring on Euclidean space. In this case, we can also establish the validity of bootstrap procedures.
It should be pointed out that various estimators for the cumulative hazard have been proposed for bivariate data under right censoring. However, for the most part they have been designed en route to producing a Kaplan-Meier-type estimator for the survival function. Unfortunately, in higher dimensions the relationship between the survival and the hazard functions is not as straightforward as it is in one dimension. In two dimensions, for example, the survival function is determined by the hazard and both marginal distributions. For a good exposition of the representation of a bivariate survival function S in terms of the hazard and the marginals, we refer the reader to [5] or [10]. Usually, one-dimensional Kaplan-Meier estimates are used for the marginal survival functions; what will change is the way the hazard is estimated. As Kalbfleisch and Prentice explain in [10], the simplest of this type of estimator is due to Bickel, who uses the ratio of the number of failure points that were uncensored in both components at (t 1 , t 2 ) to the number of pairs at risk at that point. The Dabrowska estimator and the Prentice-Cai estimator represent attempts at improving this approach, by considering not only the ratio of the number of double failure times to the at-risk set, but also the ratio of the number of failures on one coordinate when the other is still alive to the at-risk set. Lastly, Pons proposed a martingale-based estimator of the cumulative hazard of two right-censored survival times in [11], which she then used to produce a test for independence of the two survival times by comparing it to the product of the usual one-dimensional Nelson-Aalen estimates of both marginal cumulative hazards. However, her model required independence of the components of the survival time.
All of the preceding estimators depend strongly on the structure of Euclidean space and as well on the censoring mechanism, which is generally assumed to be right censoring of each component of the data point separately. The advantage of the martingale methodology used here is that it provides a unified and versatile approach to hazard estimation; in particular it is applicable to both Euclidean and non-Euclidean spaces, and may be applied to very general censoring mechanisms.
In this paper, for clarity we will restrict our attention to d-dimensional Euclidean space. All of our applications and examples illustrating the problem of observability are for two-dimensional data sets, and our theorems on the asymptotic behaviour of the Nelson-Aalen estimator are stated for R d + -valued observations. However, even in this familiar framework our results are new, since the censoring mechanism is completely general. Details on the extension of the model to non-Euclidean spaces are available in [2]. Furthermore, while this paper focusses on the theoretical aspects of the estimation problem, the practical utility of our approach is illustrated in [3], where the set-indexed hazard estimator is used to analyze a bivariate medical data set involving subjects with cardio-vascular risk factors.
We will proceed as follows. In §2 we provide the mathematical framework for our model. We develop the notion of the * -anti-cloud and show that certain martingale properties are preserved under filtering by this more general censoring mechanism. In §3, the observability problem of the original Nelson-Aalen estimator is discussed. The results of §2 allow us to redefine the Nelson-Aalen estimator of the cumulative hazard under censoring by * -clouds. We present several examples to illustrate how our new estimator can be used to circumvent the observability problem through a transition from the anti-cloud model to the * -anti-cloud structure. The price we pay is a possible loss of previously uncensored data that will now need to be discarded. In §4 we show that the Nelson-Aalen estimator satisfies a functional central limit theorem. These results are applied in §5 to show how the Nelson-Aalen estimator can be used to develop various tests involving the dependence structure of the underlying distribution. The validity of bootstrap methods is established in each case.
Most of these results can be found in additional detail in [2] (the doctoral thesis of the first author), where estimation of the distribution and survival functions is also considered.

The set-up
By restricting our attention to Euclidean space, the framework we will be using is a particular case of that used in [2,6,8] and [9]. Let T = [0, r] = d 1 [0, r i ] denote a compact rectangle in R d + and let B denote the Borel sets of T . Without loss of generality, we will usually assume that T = [0, 1] d , For D an arbitrary subset of T , let T D be a countable dense subset of D. We will use '⊂' to indicate strict inclusion; moreover, D denotes the closure and D • denotes the interior of D. The usual partial order on R d + will be denoted by '≤': s = (s 1 , . . . , s d ) ≤ (t 1 , . . . , t d ) = t ⇔ s i ≤ t i , i = 1, . . . , d. If s i < t i ∀i = 1, . . . , d we write s< <t. The 'past' of t ∈ T is A t := {s ∈ T : s ≤ t}, the 'future' of t is E t := {s ∈ T : s ≥ t}, and the 'wide past' of t is D t := {s ∈ T : t < <s} = (E • t ) c . Definition 2.1. Let (Ω, F, P ) be any complete probability space. A filtration on T is a class of complete sub-σ-fields of F , {F t : t ∈ T } such that • F t = i F ti for any decreasing sequence (t i ) in T such that t i ↓ t. (Continuity from above).
We will refer to (Ω, F, P ) = (Ω, F, P ; F t , t ∈ T ) as a filtered probability space.
We can think of F t as the history at t (or the information available in the past of t). The 'wide' history (the information available in the wide past of t) is defined by F * t = ∨ s∈Dt F s . All processes will be indexed either by T or by B. We generally use the notation X(t) for a T -indexed process, and X A for a B-indexed process. Clearly, any B-indexed process X can be identified with a T -indexed process: X(t) := X At . Conversely, any T -indexed process Y has a unique extension to an additive process on the class of left-open right-closed rectangles C := {(s, t] = d 1 (s i , t i ]; s ≤ t ∈ T } via the usual sort of inclusion-exclusion formula. We say that Y is increasing if each sample path of Y is continuous from above and satisfies Y C (ω) ≥ 0 ∀C ∈ C for every ω ∈ Ω. If Y is increasing then Y can be uniquely extended to B as a measure-valued process; the same is true if Y is the difference of increasing processes.
• Given a filtered probability space, a T -indexed stochastic process Y = adapted and for any C = (s, t] ∈ C, E[M C |F * s ] = 0. If the process M is not adapted, it will be called a pseudo-strong martingale.
• A process X is called a *compensator of the process X if it is increasing and the difference X − X is a pseudo-strong martingale.
Before moving on to adapted random sets, we should note that although we are restricting ourselves to a compact set T , all of the definitions and developments throughout this work can easily be expanded to R d + using the structure found in Definition 2.1 of [8].

Clouds and * -clouds
We begin with the definition of an adapted random set, a notion that was first introduced in [9]. Recall that a closed set D ⊆ T is a domain if D = D • . Let K be the class of domains D in T whose boundaries ∂D have Lebesgue measure 0, and let L be the class of open sets that are complements of sets in K. Definition 2.3. A random set η : Ω → B is an adapted random set if for any t ∈ T , {ω : t ∈ η(ω)} ∈ F t .
• An adapted random set ρ taking its values in L is a cloud.
• An adapted random set ξ taking its values in K is an anti-cloud.
We now define a new class of random sets, the so-called * -adapted sets: Definition 2.4. A random set η : Ω → B is a * -adapted random set if for any t ∈ T , {ω : t ∈ η(ω)} ∈ F * t . • A * -adapted random set ρ taking its values in L is a * -cloud. • A * -adapted random set ξ taking its values in K is a * -anti-cloud.
The following theorems were proven in [9] for anti-clouds. We give the statements for * -anti-clouds; the proofs are analogous to their anti-cloud counterparts. For details, see [2]. We recall from the definition of an increasing process that if X is increasing, it can be extended as a measure to B. Therefore, X ξ (ω) := X ξ(ω) (ω) is well-defined for any anti-cloud or *-anti-cloud.
Theorem 2.5. Let X be an increasing process (or the difference of two increasing processes) and suppose that ξ is a * -anti-cloud. Then for any A ∈ B, both X A∩ξ and X A∩∂ξ are random variables. Now we deal with the filtered process X ξ A := X ξ∩A . Theorem 2.6. Let ξ be a * -anti-cloud and X = Y − W , where Y and W are increasing processes such that Y ∂ξ = W ∂ξ = 0 a.s.
1. If X is a (pseudo)-strong martingale, then X ξ is a pseudo-strong martingale. 2. X ξ will not generally be adapted, even if X is.

The Nelson-Aalen estimator
The model is the same as the one in [8] and [9]. Assume that (Ω, F , P ) is a complete filtered probability space. Let Y : Ω → T be a T -valued random variable, and µ(B) = P {Y ∈ B} its distribution. The survival function associated with Y is S(t) = µ(E t ). We assume that µ is absolutely continuous with respect to Lebesgue measure and denote by µ ′ the Radon-Nikodym derivative of µ.
where P 0 is the class of P -null sets. This filtration is in fact continuous from above, as was proved in [7]. N is increasing, and it was proved in [8] that it has a * -compensator: is a * -compensator of the process N with respect to its minimal filtration, where Suppose we have a T -valued random variable Y whose associated single jump process N is adapted to a filtration F , as well as an F - * -cloud ρ with corresponding * -anti-cloud ξ = ρ c . The filtered jump process N ξ · := I {Y ∈ξ∩·} corresponds to observing occurrences of Y only on the complement of the * -cloud; then we can say that N has been filtered by the * -cloud ρ. Example 3.3. This example shows that our framework includes the usual bivariate censoring model, as presented in [11]. We assume that T = [0, 1] 2 (or any bounded rectangle in R 2 Finally, the counting process of censored times is In what follows, if ξ : Ω → K, let F ξ denote the minimal filtration with respect to which ξ is a * -anti-cloud.
Definition 3.4. Let Y be a T -valued random variable and let F Y be the minimal filtration generated by its associated jump process N . Let F be a filtration such that F Y t ⊆ F t ∀t ∈ T and let ξ be an F - * -anti-cloud. ξ is 1. weakly independent of Y if the * -compensator of N with respect to F is the same as the * -compensator with respect to The following lemma is an immediate consequence of Theorem 2.6. Lemma 3.5. Suppose that ξ is a * -anti-cloud, that Y and ξ are weakly independent and that the filtration Now we are able to define the Nelson-Aalen estimator of the integrated hazard function using filtered data. We will use definitions analogous to those in [9].
Henceforth, we assume that we have a sequence of i.i.d. T -valued random variables (Y i ) with the same distribution as Y , as well as a sequence (ξ i ) of * -anti-clouds independent of the Y ′ i s. Define the following processes: for A ∈ B and t ∈ T , By independence and Lemma 3.5, N (n) is a * -compensator for N (n) ξ , and the B-indexed process is a pseudo-strong martingale with respect to F , the minimal filtration generated by the sequences (Y i ) and (ξ i ). Since regarding M (n) as noise, we come to a set-indexed version of the Nelson-Aalen estimator for H A :Ĥ We observe thatĤ Zn(t) ; the following is analogous to Proposition 4.5 of [9]; for details, see [2]. Proposition 3.6.Ĥ (n) − H is a pseudo-strong martingale.

The observability problem
The Nelson-Aalen estimator defined in (3) is identical to that introduced in [9], with the exception that in [9] it was assumed that ξ is an anti-cloud. We will now explain the reason for incorporating the more general censoring mechanism ( * -clouds) into the survival model. and we have to settle for somewhat more limited information. Specifically, if ξ is a fully observable anti-cloud, typically we may only be able to observe We need the event I {Y ∈Et} I {t∈ξ} to be observable in order to construct the estimator for the integrated hazard function. But the problem is that I {Y ∈Et} I {t∈ξ} is not H t -measurable, and so the estimator cannot be used. Indeed, suppose t ∈ ξ This situation is illustrated in Figure 1.
To correct this, we can define ξ * (ω) = {t : recalling that T Dt stands for a countable dense subset of D t . Figure 2 pictures the * -anti-cloud ξ * corresponding to the anti-cloud ξ in Figure 1.
This means that I {Y ∈Et} I {t∈ξ * } is H t -measurable and hence observable, and now we are able to calculate the estimator. Of course, there are instances where it is not possible to observe I {t∈ξ} whenever the observation Y happens before time t; in other words, when the anticloud is not fully observable and Y ∈ A t . In this case, the information available up to time t consists of This example demonstrates that an estimator can always be constructed, but in some sense it would be a worst-case scenario: we observe that the move from ξ to ξ * as defined above may entail articially censoring observed values of Y that lie in ξ \ ξ * . Therefore, in practice, we should examine each information structure closely to determine whether the Nelson-Aalen estimator can be constructed using the anti-cloud ξ (cf. Example 3.10), and if not, how best to define an appropriate anti-cloud that censors as few observations as possible. This will be illustrated in Example 3.9.
Example 3.8. We now consider a generalization of Example 3.3. Again, assume ti as in Example 3.3 for i = 1, 2; but instead of having a single two-dimensional censoring time, we will consider a finite sequence of Let s = (s 1 , s 2 ) and u = (u 1 , u 2 ) and suppose we can observe An illustration of this kind of situation can be seen in Figure 3. This scenario could arise in a laboratory, where test animals are under continuous observation during the day, but not at night.
, as well as F Then it is easy to see that ρ i is a sequence of * -clouds, since the κ j,i are stopping times. This kind of data structure can arise when, for example, Y 1 is the age of start of pregnancy and Y 2 is the age of onset of a disease such as tuberculosis. At κ 1 , it is found that the test used for diagnosing the disease is dangerous for a pregnant woman, so once the pregnancy starts it becomes impossible to diagnose that disease for The regions ρ and ρ * . the next nine months. At time κ 2 though, a new safe test becomes available and the times of onset of the disease are no longer censored. See Figure 4.
In this case, we would censor the pairs ( i } needed to obtain the Nelson-Aalen estimator, as illustrated in Figure 4. Note that ρ * i is also a sequence of * -clouds, since all the endpoints of the intervals in the definition of each ρ * i are stopping times with respect to both F (1) t1 and F (2) t2 ; hence t2) . The censored region defined by ρ * is clearly much smaller than the * -clouds of Example 3.7, and would result in less lost data. There are examples that fit the original censoring model of [9] when T = [0, 1] 2 . Suppose that ξ is an anti-cloud which is a lower layer (L is a lower layer if t ∈ L implies that A t ⊆ L) and that we can observe and the anti-cloud ξ itself may be used in (3). An application of this model to the analysis of a medical data set in given in [3].

A functional central limit theorem
In this section, for clarity of exposition we will assume that T = [0, 1] 2 , but all results can be extended to bounded rectangles in R d . Our objective is to prove a functional central limit theorem for the T -indexed process We will use the functional delta method, for which [13] is an excellent reference. Throughout this section, we will make extensive use of the following notation. If X is an arbitrary set, then the Banach space l ∞ (X) is the set of all The next lemma will be used in the proof of the functional central limit theorem. It is a two-dimensional version of Lemma 3.9.17 in [13], and since the proof is similar, it will be omitted.
1] 2 such that |dA| < ∞, and the derivatives are given by where Adβ is defined via the two-dimensional integration by parts formula found in Theorem 8.8 of [4] if β is not of bounded variation.
Before moving on to the central limit theorem, we have to make some assumptions that will allow us to apply the delta method to our processes. Assumption 4.2. P (t ∈ ξ) is continuous in t ∈ T and there exists ǫ > 0 such that P (t ∈ ξ) > ǫ for every t ∈ T . For all s, t ∈ T , P (s ∈ ξ) − P (s, t ∈ ξ) ≤ K |s − t|, where K is a constant and |·| denotes the Euclidean norm.
The preceding assumption is quite natural: any point has a positive probability of being uncensored, and given that a point is uncensored, it is likely that nearby points are uncensored as well.

The first term in G t is defined by integration by parts.
Before proceeding with the proof, we make a few observations and prove a lemma that will be required. The statement of the CLT is very similar to Example 3.9.19 in [13]. One of the differences, obviously, is that we are working on two dimensions instead of one. However, the major difference resides in our censoring mechanism and the information we are provided with; a consequence of this is that Z n , the survivor function process, will not be in D[0, 1] 2 . This lack of sample path regularity necessitates Assumptions 4.2 and 4.3 above, which yield the following lemma and its corollary: Then we have that for some constant K * , where the first inequality follows from the independence of the processes V and W and the observation at the beginning of this proof; the second is a consequence of Assumption 4.3. But by a nearly identical argument to that in Example 2.11.14 in [13] -we use two-dimensional blocks instead of closed intervals-this implies that the sequence of processes defined by converges in distribution to a tight Gaussian process ∆ on l ∞ [0, 1] 2 . Continuity of ∆ is a consequence of the fact that E|D n (s) − D n (t)| 2 ≤ K 1 |s − t| for some constant K 1 < ∞ (cf. [13], pg. 41): from Assumptions 4.2 and 4.3, we have that We are now ready to proceed with the proof of our main result.
Proof of Theorem 4.5. Since the ξ i 's are i.i.d., we have that the sequence of processes (C n ) defined by converges in distribution to a tight Gaussian process Γ on D[0, 1] 2 . This follows simply from the CLT for empirical processes, since we are working with a subdistribution. Combined with Lemma 4.6, this means that Recalling (3), we note that the estimator depends on the pair ( Zn n , N (n) ξ n ) through the maps It is a consequence of Lemma 4.1 that the map (5) Now we can apply the delta method to conclude that is again Gaussian. As in Lemma 4.1, the first term in the limiting process has to be defined by integration by parts, since Γ may not be of bounded variation. To complete the proof, we observe that dP (Y ∈ A u , Y ∈ ξ) = P (u ∈ ξ)µ(du) = S(u)P (u ∈ ξ)h(u)du.
The following proposition will allow us to identify the covariance structure of the limiting Gaussian process G.

Proof. For any Borel set
Using Corollary 4.7, we can show that (6) converges in probability to 0 with exactly the same argument as in the proof of Theorem 5.1 in [8]. is the normalized sum of n i.i.d. processes, as noted in [8]. This comes up short of giving us a functional CLT since we need to prove tightness. It is for this reason that we believe the use of the delta method is a more elegant approach to this particular problem. Furthermore, as will be seen in the next section, the delta method justifies the use of bootstrapping.
The next proposition is identical to its counterpart in [9] (Corollary 4.7).
Now that we have Propositions 4.8 and 4.10, we are in position to give the covariance structure for the limiting process in Theorem 4.5.
Lemma 4.11. The covariance structure for the process G in Theorem 4.5 is given, for C, D ∈ C, by Proof. From Proposition 4.8, we have that the only term that contributes to the covariance is (7), and so (9) follows by an application of Corollary 4.10.

Applications
For all our applications, we will assume that T = [0, 1] 2 , and that F t = F (t1,t2) is trivial if either t 1 = 0 or t 2 = 0. We recall that F t2 , and by triviality of F t on the axes, F * (t1,0) = F

Test of independence
We now concern ourselves with the construction of a test of independence of the components (Y 1 , Y 2 ) of the random vector Y ∈ T . We will follow closely the development of the ideas presented in [11].
As noted in [8] and [11], when Y 1 and Y 2 are independent, the hazard is the product of the marginal hazards. Therefore, we have to be able to estimate the marginal hazards in order to obtain a test of independence. We will review the four examples given in §3.2 in order to illustrate how this is done.
• Example 3.9: Referring to Figure 4, taken separately, Y 1 is not censored at all, since it is always possible to determine whether the individual is pregnant or not, but Y 2 is censored on the interval (κ 1,i , κ 2,i + c).
Likewise, Y 2 is censored on the right by the F (2) -stopping time In each of these examples, we end up with the same type of one and twodimensional structure: the pair (Y 1 , Y 2 ) is filtered on an F - * -cloud, and each Y j is filtered by an F (j) - * -cloud ξ c j , a union of random open intervals with endpoints that are F (j) -stopping times for j = 1, 2. We would like to note that in classical one-dimensional problems, the intervals are usually taken to be left-open and right-closed in order to ensure predictability, but since we are assuming that P (Y ∂ξ = 0) = 1, the endpoints of the intervals can actually be ignored. Now we are in position to define the analogous one-dimensional processes as well as the processes C n,j (t j ) = n −1/2 [N (n) j ξj (t j ) − E(N (n) j ξj (t j ))] and D n,j (t j ) = n −1/2 [Z n,j (t j ) − E(Z n,j (t j ))] for j = 1, 2. We will also make use of the one-dimensional analogues of the process M (n) . : The cumulative marginal hazards are estimated bŷ , j = 1, 2.
where the vector (G, G 1 , G 2 ) is jointly Gaussian with continuous sample paths.
Proof. We start very similarly to the proof of Lemma 4.1 in [11]. Recall that D n (t) := n −1/2 [Z n (t) − E(Z n (t))] and C n (t) . We already saw in the proof of Theorem 4.5 that (D n , C n ) converges to (∆, Γ) on l ∞ [0, 1] 2 × D[0, 1] 2 . We also know that (D n,j , C n,j ) converges to a joint Gaussian process (∆ j , Γ j ), j = 1, 2, on l ∞ [0, 1] × D[0, 1], since the same arguments for the convergence of (D n , C n ) still apply. Therefore, the joint sequence (D n , D n,1 , D n,2 , C n , C n,1 , C n,2 ) is tight in the product of topologies of the space This, coupled with the fact that its finite-dimensional distributions converge to those of a Gaussian process with continuous sample paths, give the convergence of the sequence.
Next, note that both the estimators for the cumulative marginal hazards depend on the pairs ( Zn,j n , N (n) j ξ j n ), j = 1, 2, through the map (5), where the integral is defined on R instead of R 2 . Using the same arguments as in Theorem 4.5,we can show that n 1/2 (Ĥ for every τ j such that S j (τ j ) > 0, where G j are mean-zero Gaussian processes for j = 1, 2. As a consequence of this fact and the joint convergence of (D n , D n,1 , D n,2 , C n , C n,1 , C n,2 ), we get the desired result.
As was the case in [11], in order to construct our test of independence between (Y 1,i ) and (Y 2,i ), i = 1, 2, . . ., we will take the difference between the twodimensional Nelson-Aalen estimator and the estimator under the hypothesis of independence. Define V n := n 1/2 (Ĥ (n) −Ĥ 2 ) on [0, τ ], where τ = (τ 1 , τ 2 ). We could calculate the covariance structure of V n , but since the statistic of interest will be seen to be sup t∈[0,τ] V n (t), we prefer to use a bootstrap test based on V n and our efforts will now turn towards that goal.
We shall use the delta method for the bootstrap, and for that we need to verify that the classes of functions needed are Donsker classes. Let X 1 , . . . , X n be a sample of random elements in a measurable space (X , A) with distribution P , let G be a collection of measurable functions g : X → R and let P n be the empirical measure of the X i , which induces a map from G to R defined by P n g := n −1 n i=1 g(X i ). Ivanoff and Merzbach observed in [9] that an anti-cloud is measurable as a random closed set as a consequence of part 2 of their Corollary 3.10; it is easy to see that the same is true of * -anti-clouds. Now assume that we can fully observe the pair (Y i , ξ i ); that is, suppose we can observe the point Y i even if it lies inside the censored region and, conversely, we are able to observe the whole shape of the anti-cloud ξ i regardless of the location of Y i . Then we can take (Y i , ξ i ) to be the random elements X i in the previous definition. Let P be the joint distribution of the pair (Y i , ξ i ), let P n denote its empirical measure,P n the bootstrap empirical distribution andG n the bootstrap empirical process defined byG n := √ n(P n − P n ). For a more complete description of these elements, as well as for the exact statement of the delta method for bootstrap in probability -which we will use a few lines ahead-refer to Sections 3.6 and 3.9 of [13], respectively. Let G 1 = {g 1 t : t ∈ T } and G 2 = {g 2 t : t ∈ T } be defined, respectively, by Then both G 1 and G 2 are Donsker classes, since both were shown to be convergent in distribution to tight Gaussian limits on l ∞ [0, 1] 2 in the proof of Theorem 4.5. We can apply Theorem 2.10.6 in [13] to find that the pair (G 1 , G 2 ) is also a Donsker class, since each one is uniformly bounded. It is also clear that G 1 and G 2 have finite envelope functions because both classes are comprised exclusively of indicator functions. Hence, by Theorem 3.6.1 in [13], the conditions (3.9.9) of [13] are met. Furthermore, we have shown that the map (5) is Hadamard-differentiable tangentially to C[0, 1] 2 ×D[0, 1] 2 on a certain domain. Then we only need to apply Theorem 3.9.11 in [13], the delta method for bootstrap in probability, to get the conditional convergence given (Y i , ξ i ) of the bootstrapped Nelson-Aalen estimator. Similar arguments lead to the same conclusion for the one-dimensional bootstrapped Nelson-Aalen estimators.
The above is summarized in the next lemma, which is an analogue of Lemma 5.1 for the bootstrapped estimators. We can deduce from Lemma 5.1 that on [0, converges in distribution to a mean-zero Gaussian process; indeed, we can rewrite V n = n 1/2 (Ĥ (n) −Ĥ (n) 1Ĥ (n) 2 ) as  (10) converges in distribution to the sup of the absolute value of that same Gaussian process. Then we can bootstrap from (q i , r i ) jointly and find c α so that (1 − α)100% of the absolute values of (10) fall below c α , which would give us

Test of hazard rate order
Let X 1 , . . . , X n and Y 1 , . . . , Y m be independent random samples from bivariate distributions F and G respectively, and suppose that there is a common censoring mechanism in the form of a sequence of independent * -clouds. We are interested in testing whether the distributions are equal on a particular set or, more generally, on a suitable class of sets, against the alternative that there is a difference in the hazard rates.
We start with the case where we look at the hazards on a fixed set A. The null hypothesis is H • : F = G, and we can test it either against a singlesided alternative such as H F A < H G A , or the double-sided alternative H F A = H G A . Tests using a one-sided alternative lead us to tests of hazard rate order such as It is natural to consider the difference between the Nelson-Aalen estimatorsĤ F andĤ G of the integrated hazards, which leads to the test statistic where N = n + m. By Remark 4.9, we still have that both U converge to independent mean-zero Gaussian limits U F A and U G A respectively. We can write Under the null hypothesis of equality of the cumulative hazards on the set A, and assuming n n+m → λ as n, m → ∞, we get that The limit variable has the same distribution as U F A under the null hypothesis. Then we have a test of asymptotic level α for a two-sided alternative if we reject the null hypothesis whenever |W (N) A test with a one-sided alternative would be handled similarly; we just have to remove the absolute values.
We can also consider testing for the hazard rate over a whole class, as long as it is Donsker in order to preserve the Gaussian limits. Az as the test statistics, as well as choosing w (N) → w F = inf{t : P (sup z∈[0,τ] |U F Az | > t) ≤ α} and w (N) → w F = inf{t : P (sup z∈[0,τ] U F Az > t) ≤ α} for two-sided and one-sided alternatives, respectively.
Assuming we can fully observe the pairs (X i , ξ i ), (Y j , ξ ′ j ) for i = 1, . . . , n and j = 1, . . . , m, we determine the appropriate critical value for w (N) by bootstrapping from the pooled sample S (N) = (X 1 , ξ 1 ), . . . , (X n , ξ n ), (Y 1 , ξ ′ 1 ), . . . , (Y m , ξ ′ m ), where we choose any of these observations with probability 1/N , as was done in Section 3.7.2 in [13]. Define J = λF + (1 − λ)G, and note that under the null hypothesis, H J = H F . We assign the first n elements from the resampling to F , and the rest to G. LetḦ (n,N)F andḦ (m,N)G be the Nelson-Aalen estimators for the pooled sample assigned to F and G respectively, andḦ (N) the estimator for the complete pooled sample. Seẗ Once again, as we did following the statement of Lemma 5.2, we can see that the bootstrapped version of our test statistic only uses the functions q i = I {Xi ∈At} I {Xi ∈ξi} , r i = I {Xi ∈Et} I {t∈ξi } , q ′ j = I {Yj∈At} I {Yj∈ξ ′ j } and r ′ j = I {Yj∈Et } × I {t∈ξ ′ j } ; then we can safely drop the assumption of complete observability, given that even without it we can still construct the appropriate estimators. Thus, the test can be performed by bootstrapping directly from the recorded values q i , r i , q ′ j , r ′ j and following the procedure described above.

The hazard of a copula
Suppose we have a sequence (X i , Y i ) of i.i.d. bivariate random vectors with continuous distribution J and continuous, strictly increasing marginals F and G, and let H denote the integrated hazard function. Our goal is to estimate the hazard of the copula C associated with J.
As usual, each pair (X i , Y i ) will be censored by an F - * -cloud ξ c i . Let C(p, q) = J(F −1 (p), G −1 (q)), the copula function for J, and let C be the survival function for the copula: C(p, q) = 1 − p − q + C(p, q). It is straightforward to verify that the integrated hazard of the copula, H C , satisfies H C (p, q) = H(F −1 (p), G −1 (q)), 0 ≤ p, q ≤ 1.
In other words, calculating the cumulative hazard of the copula at (p, q) is the same as calculating the cumulative hazard of the original distribution at the point (F −1 (p), G −1 (q)). We would like to estimate the cumulative hazard of the copula, and the equation above seems to suggest that we could do so by looking at the Nelson-Aalen estimator for the integrated hazard of the distribution J evaluated at the appropriate quantiles of F and G. If F and G are unknown, we would replace (F −1 (p), G −1 (q)) with the quantiles of the respective Kaplan-Meier estimates of the marginals F and G (See Section IV.3.1 in [1]). Now, suppose for the moment that F and G are known. Define the pseudoobservations (X # i , Y # i ) := (F (X i ), G(Y i )), so that the distribution of (X # i , Y # i ) is the copula C. The observable region then becomes ξ # i = {(F (x), G(y)) : (x, y) ∈ ξ i } , and the new filtration (now defined on [0, 1] 2 ) is F # (p, q) = F (F −1 (p), G −1 (q)). Using Equation (3), we havê But note that and hence we have an empirical analogue of (11). More generally, for D ⊆ [0, 1] 2 , let (F −1 , G −1 )(D) := {(F −1 (p), G −1 (q)) : p, q ∈ D}, in which case we can estimate H C D withH (n)C D =Ĥ (n) ((F −1 , G −1 )(D)). The asymptotic normality of √ n(H (n)C D − H C D ) follows from Remark 4.9. If F and G are unknown, the next step is to estimate the quantiles F −1 and G −1 . We will do so by taking the appropriate quantiles of the Kaplan-Meier estimatorsF ,Ĝ of F and G respectively; then our estimator will bê H (n)C =Ĥ (n) (F −1 ,Ĝ −1 ), whereF −1 (p) = inf{s :F (s) ≥ p} andĜ −1 (q) is defined in a similar manner. It is important to note that A (F −1 (p),Ĝ −1 (q)) is still a * -anti-cloud: indeed, sinceF (s) andĜ(t) are adapted to F (1) (s) and F (2) (t) respectively,