Semiparametric inference of competing risks data with additive hazards and missing cause of failure under MCAR or MAR assumptions

In this paper, we consider a semiparametric model for lifetime data with competing risks and missing causes of death. We assume that an additive hazards model holds for each cause-specific hazard rate function and that a random right censoring occurs. Our goal is to estimate the regression parameters as well as the functional parameters such as the baseline and cause-specific cumulative hazard rate functions / cumulative incidence functions. We first introduce preliminary estimators of the unknown (Euclidean and functional) parameters when cause of death indicators are missing completely at random (MCAR). These estimators are obtained using the observations with known cause of failure. The advantage of considering the MCAR model is that the information given by the observed lifetimes with unknown failure cause can be used to improve the preliminary estimates in order to attain an asymptotic optimality criterion. This is the main purpose of our work. However, since it is often more realistic to consider a missing at random (MAR) mechanism, we also derive estimators of the regression and functional parameters under the MAR model. We study the large sample properties of our estimators through martingales and empirical process techniques. We also provide a simulation study to compare the behavior of our three types of estimators under the different mechanisms of missingness. It is shown that our improved estimators under MCAR assumption are quite robust if only the MAR assumption holds. Finally, three illustrations on real datasets are also given.

Abstract: In this paper, we consider a semiparametric model for lifetime data with competing risks and missing causes of death. We assume that an additive hazards model holds for each cause-specific hazard rate function and that a random right censoring occurs. Our goal is to estimate the regression parameters as well as the functional parameters such as the baseline and cause-specific cumulative hazard rate functions / cumulative incidence functions.
We first introduce preliminary estimators of the unknown (Euclidean and functional) parameters when cause of death indicators are missing completely at random (MCAR). These estimators are obtained using the observations with known cause of failure. The advantage of considering the MCAR model is that the information given by the observed lifetimes with unknown failure cause can be used to improve the preliminary estimates in order to attain an asymptotic optimality criterion. This is the main purpose of our work. However, since it is often more realistic to consider a missing at random (MAR) mechanism, we also derive estimators of the regression and functional parameters under the MAR model. We study the large sample properties of our estimators through martingales and empirical process techniques. We also provide a simulation study to compare the behavior of our three types of estimators under the different mechanisms

Introduction
The competing risks models are useful in Survival Analysis and Reliability in order to take into account the different causes of death of a patient or the of failure of a device. The observations usually include the (possibly censored) lifetime and the indicator of cause of death. Sometimes, a vector of (time dependent) covariates is also available. Based on such data, one can carry out a statistical inference assuming a parametric, semi-parametric or nonparametric model. One can find an extensive review of these models and their statistical inference in, for example, Crowder (2001) or Andersen et al. (1993).
In practice, it may happen that the cause of death or failure is missing for some individuals. Many papers have considered the problem of missing information in competing risks models. We refer to Miyakawa (1984), Usher and Hodgson (1988), Lin et al. (1993), Schabe (1994), Goetghebeur  Many authors have developed methods for accurate modeling of the missingness mechanism. Some of these works are based on parametric models. When a latent variable represents the missingness mechanism, an EM-type algorithm can be used to estimate the model parameters. In Craiu and Duchesne (2004), such a procedure is proposed when the missingness mechanism depends both on the failure cause and the failure time. Recently, Craiu and Reiser (2007) considered a very complete parametric model including dependence with the failure causes.
The special case of a possibly censored single failure cause differs from the competing risks model by the fact that, in the former case, the censoring is not an event of interest and is (usually) supposed to be independent of the lifetime. However, when the censoring information is missing, we are close to the competing risks situation where failure causes are possibly missing. Some specific inferential methods have been derived for various models with or without covariates and several missingness mechanisms, see e.g. Gijbels et al. (1993), McKeague and Subramanian (1998), van der Laan and McKeague (1998), Zhou and Sun (2003), Subramanian (2004) and Song et al. (2010).
Other authors have developed estimation procedures in a semi or nonparametric framework for two or more failure causes with missing indicators (see e.g. Myakawa, 1984, Dinse, 1986, Lo, 1991, Schabe, 1994. In particular, the case of a proportional hazards model for the cause-specific hazard rate functions has been studied by Goetghebeur and Ryan (1995), Lu and Tsiatis (2001) and, more recently, by Lee et al. (2011). On the other hand, Gao and Tsiatis (2005) have considered a linear transformation competing risks model whereas Bakoyannis et al. (2010) have focused on the well-known Fine and Gray (2009) model.
As far as we know, Lu and Liang (2008) is the only paper which considers an additive risk model for competing risks data with missing indicator. The missing indicator situation is an interesting alternative to the proportional hazards model. Lu and Liang (2008) have assumed such a model on only one of the cause-specific hazard rate functions, namely the cause of primary interest. Using estimating equations based on inverse probability weighted (IPW) and double robust (DR) techniques, they are able to estimate the regression parameters. In their model, as in Bakoyannis et al. (2010), Lee et al. (2011) and Gao and Tsiatis (2005), the mechanism of missingness may depend on the failure time, which corresponds to the missing at random (MAR) assumption.
Our aim in this paper is to consider an additive risk model for competing risks data when indicators of cause of death are missing at random. Our model differs from the one of Lu and Liang (2008) in the sense that we assume an additive risk model for each cause-specific hazard rate function and not just for the cause of interest. We do not assume independence between the causes of death. It has to be noted that, unlike in Goetghebeur and Ryan (1995), these functions are not supposed to be linked in our model. Furthermore, we are interested in estimating not only the regression parameters but also the cumulative incidence functions and, finally, the overall survival function. Note that Lu and Liang (2008) do not consider the problem of estimating the cumulative incidence function and by providing an estimator of the survival functions associated to the cause-specific hazard rate functions of interest implicitly assume independence between the various causes of death. We believe that obtaining estimation of the cumulative incidence functions is also of interest due to the importance of these functions in a competing risks model. As far as we know, the only paper which considers estimation of the cumulative function when indicators are missing is Bakoyannis et al. (2010). But, as noted earlier, their work is under a different model, since they assume a Fine and Gray model.
In this paper, we first consider the case of the most restrictive mechanism of missing information, namely the missing completely at random (MCAR) mechanism. We show that the advantage of this type of missingness mechanism lies in the fact that it doesn't affect the additive shape of the cause-specific hazard rate functions. Thus, one can estimate the regression parameters following the idea of estimating equations introduced by Lin and Ying (1994). We also show that the observations with unknown failure cause are usable to estimate the sum of the previous parameters. Then, and we think that this is the main contribution of our approach, we develop a method that allows us to take into account this information in an optimal way in order to improve our preliminary estimators of the regression parameters. The improved estimators are built in order to attain an asymptotic efficiency criterion. Such an improvement does not appear to be possible with any other missingness mechanism. We are also able to derive improved estimators of the cumulative hazard rate and the cumulative incidence functions.
We also consider the case of the MAR mechanism. We show that the Double Robust technique, introduced by Robins et al. (1994) and later used by Lu and Liang (2008) in a context similar to ours, can be used to derive estimates of the Euclidean and functional parameters of our model. Our work in this part extends the results of Lu and Liang (2008) in the same manner as described earlier: incorporating additive risks for each cause-specific hazard rate function, not making any assumption of independence between the causes, estimating the cause-specific cumulative hazard rate functions, etc... The paper is organized as follows. In Section 2, the model and the MCAR assumptions are introduced and seen as a specific case of a nonhomogeneous Markov process. In Section 3, first estimators under MCAR of the Euclidean parameters are obtained using estimating equations. Then, we introduce our improved estimators, still under MCAR, which have minimum asymptotic variance. In Section 4, we show that our parameter estimators are consistent and asymptotically Gaussian. Section 5 deals with the estimation of the functional parameters under MCAR assumption. Initial and improved estimators are introduced and we prove their consistency and their asymptotic Gaussian behavior. Consistent estimators of the asymptotic variances are also provided in each section. Note that our proofs of the asymptotic results in Sections 4 and 5 are based on the martingale theory and are, thus, very different from the methods commonly used (see, e.g. Gao andTsiatis, 2005, andLiang, 2008) in the literature on this topic. Section 6 is devoted to the consideration of the MAR model. We introduce estimators of the Euclidean and functional parameters under this model and obtain their large sample behaviors. Finally, a Monte Carlo study is performed in Section 7 in order to assess the behavior of our estimators for finite sample sizes and to compare their properties under the MCAR and MAR models. In addition, three examples using real data sets illustrate our estimation methods.

Framework under MCAR assumption
Consider a population in which each individual is liable to die from any of p ≥ 2 causes. The causes are not necessarily independent but we assume that each death is due to a single cause. Let us denote by T the individual lifetime and d ∈ {1, . . . , p} its cause of death. Suppose that our interest focuses on the effect of a time-varying covariate vector Z(·) of dimension k. More precisely, let Z(t) = {Z(u); u ≤ t} be the history up to time t of this covariate process and assume that an additive hazard model holds on the cause-specific hazard rate imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 L. Bordes et al./SP inference with competing risks and missing indicators 6 function, that is: 1) for j = 1, . . . , p where λ 0j (·) is the baseline jth cause-specific hazard rate function and β j ∈ R k is the vector of regression parameters associated to the jth cause.
Suppose also that the lifetime T is right-censored by a random variable (r.v.) C and write where I(·) is the set indicator function. Let λ C (·) denote the hazard rate function of the r.v. C and assume that conditionally on Z, the r.v. C is independent of (T, d).
Of course, the future cause of death is not known if C is observed instead of T . But, in some situations, it may happen that d is also not known even if T is observed. Let R denote the missingness indicator, i.e. R = 1 if the cause is known and R = 0 otherwise. Thus, we are in a situation where the available observation for an individual is where D = δRd reveals the failure cause d when the failure time is uncensored (δ = 1) and R = 1 and is equal to zero otherwise. In the following, we assume that the missing mechanism is such that: where α is an unknown parameter, and P (R = 0|X, Z, d, δ = 0) = P (R = 0|δ = 0) = 1. This is the Missing Completely At Random (MCAR) assumption.
One can see the observation of the vector (X, δ, D), conditionally on Z, as the realization of a (p + 3)-states nonhomogeneous Markov process (see Fig. 1) with space set {0, 1, . . . , p, m, c}: state 0 is the initial state; state i, for i = 1, . . . , p, corresponds to the observation of the lifetime T with known cause of death i; state m to the observation of the lifetime with missing cause; state c to a censored observation. Except 0, all the states are absorbing states. From the assumptions on M and the independence between (T, d) and C, conditionally on Z, one can easily get that the instantaneous transition rates of this Markov process are, conditionally on Z: In order to simplify the notation in the following derivations, let us denote by p + 1 the index corresponding to a missing cause (previously denoted by m). It is important to note that, up to a multiplicative constant (α or 1 − α), the additive form of the instantaneous rates is preserved (except for the transition 0 → c). It is obvious for λ ′ j (t|Z) with 1 ≤ j ≤ p but also for λ ′ p+1 (·|Z) since where λ 0p+1 (·) = p j=1 λ 0j (·) and β p+1 = p j=1 β j . This will help us to improve the estimation of the regression parameters β 1 , . . . , β p . Now, let us suppose that we observe a sample (X i , δ i , D i , Z i (X i )) 1≤i≤n of (X, δ, D, Z(X)). Let τ < +∞ be the upper bound of the interval of study which means that individuals are only observed on the time interval [0, τ ]. Let N ij (·), for j ∈ {1, . . . , p + 1}, be the counting processes defined by: Finally write Y i (t) = 1(X i ≥ t) the individual risk process, for i = 1, . . . , n.
From Andersen et al. (1993) or Fleming and Harrington (1991), we know that the processes M ij (·), for j = 1, . . . , p + 1, defined by for t ≥ 0, are zero mean martingales with respect to the filtration (F t ) t≥0 defined by

Estimators
The finite dimensional parameters of our model are: the probability α of knowing the cause of death; and the regression parameters β 1 , . . . , β p of each causespecific hazard rate functions. Recall that α = P (R = 1|δ = 1). Thus, the maximum likelihood estimator of α is nothing but the proportion of lifetimes with known cause of death among the uncensored lifetimes, that is: Extending an approach proposed by Lin and Ying (1994) in case of a single cause of death, one can estimate β j , for j = 1, . . . , p, by the solutionβ j of the estimating equation .
In the sequel we writeβ the estimator of the regression parameters without taking into account the subjects with unknown cause of death (WMC abbreviates "Without taking into account the Missing Causes"). Note that it is a vector of dimension kp. Now, since it has been seen that the cause-specific hazard rate function λ ′ p+1 (·) associated with a missing cause has an additive form too, one can also estimate β p+1 by the solutionβ p+1 of the estimating equation U p+1 (β,α, τ ) = 0 where Closed-form expressions of these estimators are available and given below (see Equation (4.9)).
At this stage, we are in a situation where each parameter β j , for j = 1, . . . , p, has its own estimatorβ j . But we also have an estimatorβ p+1 of their sum β p+1 = β 1 + · · · + β p . This estimator only uses the information from the transitions 0 → p + 1 and hence is not equal toβ 1 + · · · +β p . It is, of course, of interest to use it in order to improve the estimation of the first parameters β j , for j = 1, . . . , p. To this end, we propose to find the linear transformation of our estimator (β T 1 , . . . ,β T p ,β T p+1 ) T which will lead to an estimator of (β T 1 , . . . , β T p ) T with minimum asymptotic variance. More precisely, let H be the family of all the block matrices where the H ij , for i = 1, . . . , p and j = 1, . . . , p + 1, are k × k real valued matrices, such that for all vectors β 1 , . . . , β p in R k and β p+1 = β 1 + · · · + β p . Writê whereΣβ ,∞ is an estimator of the asymptotic variance-covariance matrix of an estimator of (β T 1 , . . . , β T p ) with minimal asymptotic variance is given by: Thus, the optimal estimators of the regression parameters arẽ

4)
imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date:  and are called T -optimal estimators in the sequel. Let us denote bỹ this vector of dimension kp of T -optimal estimators. Such a way to improve estimators has been considered in Balakrishnan et al. (2010). Note that the constraints on H, given by (3.3), are linear and that they act separately on the rows of H. Indeed, denoting by I k and 0 k the identity matrix of order k and the null matrix of order k respectively, these constraints may be written On the other hand, we havê and H i• is the ith row block of H. Thus, it is sufficient to solve separately the following problems (P i ), for i = 1, . . . , p:

Solving problems (P i )
Since problems (P 1 ), . . . , (P p ) are identical, we only give the method to solve (P 1 ). Let us introduce some temporary notations. One can rewrite the functionq 1 (H) to be minimized like: Now, the constraints on H are .
These constraints can be rewritten CL = d where C is a (pk 2 ) × (k 2 (p + 1)) matrix and d a pk 2 -column vector. Thus, the Lagrange function for this optimization problem with linear constraints is given by where λ is a Lagrange multiplier vector. The optimal parametersL andλ necessarily satisfy the first-order conditions It has to be noted thatQ is invertible wheneverΣβ ,∞ is. Finally, havingL we obtainĤ (j) for j = 1, . . . , p + 1.

Additional notations and assumptions, preliminary results
Let imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 If z is a column vector in R k , let us write Finally, let us denote by S l (·), for l = 0, 1, 2, the processes defined by for 0 ≤ s ≤ τ and for b in R k the process S 3 (·; b) defined by From now on we make the following assumptions. Moreover, the function s 0 (·) is bounded below by a positive real number. A4. With the notations a(u) = s 2 (u) − s ⊗2 1 (u)/s 0 (u), the matrix A(τ ) is positive definite and the real number θ(τ ) is strictly positive. The matrix is also positive definite. Note that, from previous assumptions, it was already true asymptotically. A5. For all b ∈ R k , let S 4 (·; b) be the process defined by imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 There exist functions s 3 (·; b) and s 4 (·; b) such that, for all b ∈ R k , A6. The following functions are integrable on [0, τ ]: , s 3 (·; β j ) and s 4 (·; β j ), for j = 1, . . . , p.
Now, let us introduce, for all b ∈ R k , the processes U j (b, α j , ·) defined, for j = 1, . . . , p + 1 and t ∈ [0, τ ], by: Note that values of these processes at s = τ appear in the estimating equations of Section 3.1.
It is easily seen that these processes can be rewritten as for j = 1, . . . , p + 1. Hence, they are local square integrable (F t ) t≥0 -martingales as sum of stochastic integrals of predictable and bounded processes with respect to local square integrable martingales. Now, let us introduce two technical results useful in the following section. Let us note that all the functional convergence results of this paper are considered in the Skorohod space of cadlag functions D[0, τ ].  where α * j = 1 − α for 1 ≤ j ≤ p and α * p+1 = −α, converges weakly in D k(p+1)+1 [0, τ ], as n tends to infinity, to a zero mean multivariate Gaussian martingale U ∞ (·) with covariance matrix defined for all t ∈ [0, τ ] by where, for j = 1, . . . , p + 1, and A(·) as well as θ(·) are defined in (4.1).
Proof. As in Andersen and Gill (1982), the main idea is to apply Rebolledo's Theorem (see Rebolledo, 1980, or Andersen et al., 1993. We only derive here the limit of the predictable variation process associated to U n (·). This will give us the asymptotic variance-covariance matrix function Σ U∞ (·) of U n (·).
On one hand, straightforward calculations show that we have, for j = 1, . . . , p+ 1, which, by Assumptions A2-A6, converges in probability, as n tends to infinity, to Θ j (t) given in the Theorem. On the other hand, since the martingales M ij and M i ′ j ′ are orthogonal for all 1 ≤ i, i ′ ≤ n and 1 ≤ j = j ′ ≤ p + 1, we have: whenever j is different from j ′ and for all t ∈ [0, τ ]. This justifies the null terms in the asymptotic covariance matrix Σ U∞ (t).
imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date:  Now, thanks again to the orthogonality between martingales with different indices, it is easy to show that, for 1 ≤ l ≤ p + 1, we have which converges in probability to when n tends to infinity. Finally, with the same kind of arguments we have when n tends to infinity.
Remark. The second result of this lemma is straightforward and doesn't require the first step. It arises from an easy application of the central limit theorem. However the first result will be useful in the next section.
As said in the previous remark, the asymptotic normality ofα is straightforward.

Large sample behavior ofβ
asymptotically Gaussian, with zero mean and positive definite covariance matrix From their definition, the estimatorsβ j , for j = 1, . . . , p, are such that U j (β j ,α j , τ ) = 0. Thus, one can write : With the notationÂ

It follows thatβ
thanks to Assumption A4 which insures thatÂ(τ ) is invertible. Now, using Assumptions A3 and A4, Proposition 4.1 and Lemma 4.2, it is easily seen that the right-hand side of (4.5) converges to zero when n tends to infinity. Thusβ is consistent.
Furthermore, after some straightforward calculations on equation (4.5), one can write, for j = 1, . . . , p + 1: Using again Assumptions A3 and A4 (which ensure thatÂ −1 (τ ) converges in probability to A −1 (τ )), Proposition 4.1 and Lemma 4.2 as well as the consistency ofβ, one can prove that the second term of the right-hand side of this last equality is an o P (1). Thus we can write: Hence, from (4.3), (4.6) and (4.7) we finally obtain where Σ 1 (τ ) is given in Theorem 4.3. This and Proposition 4.1 complete the proof. Note that because Σ U∞ (τ ) is positive definite, the matrix Σβ ,∞ is positive definite too.
Moreover, we have seen in the proof of Lemma 4.2 that Thanks to Assumptions A3 and A4, the integral in the right-hand side of the last equation converges in probability to θ(τ ). This and (4.4) give the convergence in probability ofθ(τ ) to θ(τ ).
On the other hand, one can write, for j = 1, . . . , p: (4.10) With the notation A ⊗2 = AA T when A is also a matrix, it is easily seen that which converges to zero in probability when n tends to infinity, by Assumptions A2 and A3. This and Lenglart inequality yield the convergence in probability to zero of the first term of (4.10). Moreover, we have seen in the proof of Proposition 4.1 that the second term of (4.10) is equal to U j (β j , α, τ )/ √ n and converges in probability to Θ j (τ ), when n tends to infinity. These two last convergences prove thatΘ j (τ ) converges to Θ j (τ ), for j = 1, . . . , p + 1. Thus we get the consistency of the estimatorΣ U∞ (τ ) and finally the one ofΣβ ,∞ .
From the above and the continuous dependency ofĤ onΣβ ,∞ we deduce thatĤ converges in probability to the matrix H opt in H which minimizes trace(HΣβ ,∞ H T ). We recall that the existence of such an optimal matrix is ensured because Σβ ,∞ is positive definite. SinceĤβ = β * , we get from the above and Theorem 4.3 that √ n(β T opt − β * ) =Ĥ √ n(β − β) converges to a zero mean Gaussian distributed random variable with covariance matrix H opt Σβ ,∞ (H opt ) T which is optimal in the sense defined earlier.

Statistical inference on the functional parameters under MCAR
Even though it is not the model considered at the beginning of this paper, we will first consider the fully nonparametric case; that is, a model without covariates where only functional parameters have to be estimated. Of course, this model is also of interest for applications. Then we will come back to our semiparametric model and will see how to estimate its functional parameters.

Inference in the nonparameteric model
In this case, no parametric form is assumed on the cause-specific hazard rate functions λ j (·), for j = 1, . . . , p and we also do not take into account any covariate. Let Λ j (·), for j = 1, . . . , p, denote the cause-specific cumulative hazard rate functions defined by: With the assumption on M and the hypothesis of independence between (T, d) and C, the instantaneous transition rates of the Markov process with graph given in Figure 1 are: Contrary to the model of Goetghebeur and Ryan (1995), when there is no covariate in the data, the model is still of interest because it allows different failure rates for failure causes.
B2. There exists a function s 0 (·), defined on [0, τ ], and bounded away from 0, such that These assumptions are nothing but Assumptions A2 and A3 of Section 4 adapted to this new model. It is well known that Assumptions B1 and B2 are fulfilled whenever τ is such that S(τ )Ḡ(τ ) > 0, where S(·) andḠ(·) are respectively the survival functions of T and C.
One can get a consistent estimator of the covariance function Σ L (·) of the limit process L(·). To this end, let us define, for j = 1, . . . , p + 1: Define also: Finally, an empirical estimator of Σ L (·) is given by:
To this end, let us define H ′ as the set of p × (p + 1) real valued matrices such that Ha = a * for all a * = (a 1 , . . . , a p ) T ∈ R p and a = (a * T , as a new estimator of Λ * (·). The next theorem proves that the later estimator is asymptotically normal and T -optimal.
Moreover, let us assume that the matrix Σ L (t) is invertible, for all t ∈]0, τ ]. If, for all t ∈ [0, τ ], the matrixĤ(t) is the unique solution of (5.5), then the process √ n(Λ(·) − Λ * (·)) converges weakly in D p [0, τ ] to a centered Gaussian Proof. The proof is omitted since it follows the lines of the proof of Theorem 4.4.
Let us denote by L the column vector in R p(p+1) defined by L = (H 1 , . . . , H p ) where H i is the ith row of H ∈ H ′ . The link between L = (l i ) 1≤i≤p(p+1) and H = (h i,j ) 1≤i≤p;1≤j≤p+1 is therefore h i,j = l (i−1)(p+1)+j . One can see that the linear constraints on H may be written on L as CL = d where C and d are known. Indeed where the matrix C and the vector d are obvious. Moreover, letQ(t) be the p × p block diagonal matrix defined bŷ Thus, in order to find our optimal estimator, we have to solve the following optimization problems Following the method of Section 3.2, the solution of (P t ) is: Remark. SinceΣ L (t) is piecewise constant, it is sufficient to calculate the matrixĤ(t) at points t ∈ [0, τ ] where the counting process N ·· jumps, that is at points X i ∈ [0, τ ] such that δ i = 1.

Estimation of the cumulative incidence functions and the survival function of T
Our aim in this section is to introduce estimators of the survival function S(·) of the lifetime T as well as estimators of the cumulative incidence functions F j (·) defined, for all t and j = 1, . . . , p, by Let us recall that Λ · (·) = p j=1 Λ j (·) is the cumulative hazard rate function of the survival time T . It is well known that one can write the survival function in terms of the cumulative hazard rate function: where π denotes the product integral (see Gill & Johansen, 1990  On the other hand, it is also well-known that one can write, for t ∈ [0, τ ] and j = 1, . . . , p and that an estimator of this cumulative incidence function is given by the Aalen-Johansen estimator (see Andersen et al., 1993) The asymptotic behavior of these estimators is given in the following theorem.
is the jth component of the limit process L ′ (·) of Theorem 5.2 and Proof. This result is easily obtained from Theorem 5.2 and the functional δ-method (see e.g. van der Vaart & Wellner, 1996, for details on this method). Indeed, from the above, one can write . . .

With explanatory variables
Now, let us come back to the semiparametric model of equation (2.1) with explanatory variable Z(·). As we have done before, we will first introduce estimators of the cause-specific hazard rate functions which only use the lifetimes Without Missing Causes (WMC). Then, as previously, we will introduce the T -optimal estimators of these functions. First note that, even if we will only use the quantities with index j = 1, . . . , p in this paragraph, we will also introduce here the terms with index p + 1. These latter will be used in the following paragraph where we will get the T -optimal estimators which also use the lifetimes with unknown cause of failure. In the sequel, Λ 0p+1 (·) and Λ p+1 (·|Z) denote respectively p j=1 Λ 0j (·) and p j=1 Λ j (·|Z).

Improved estimators of the cause-specific cumulative hazard rate functions: the T -optimal estimators
The estimators of the cause-specific (baseline) hazard rate functions introduced in the previous paragraph can be improved thanks to the following two arguments: • replace in equation (5.7) and (5.8) the estimatorβ W M C by the improved estimatorβ T opt of the regression parameters; • apply the idea of Section 5.1.2, i.e. use the estimation of the sum of the cause-specific (baseline) hazard rate functions in order to obtain optimal estimators of each quantity.
Finally, let us define the block matrix Σ 6 (t), for all t ∈ [0, τ ], by: The large sample behavior of the above estimators of the cause-specific cumulative hazard rate functions is given in next Theorem.
Recall that estimators of the quantities which appears in the limiting covariance matrix have been introduced in previous sections (see in particular Section 4.4 and Section 5.2.1). Now, T -optimal estimators of the (baseline) cause-specific cumulative hazard rate functions can be obtained following the line of Section 5.1.2. For example, imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 from the estimatorΛ 0j (·), for j = 1, . . . , p + 1, given in equation (5.10), one can obtain T -optimal estimators of the baseline cause-specific cumulative hazard rate functions. Indeed, writeΛ . . .
whereΣΛ 0 (t) is the estimator of the asymptotic covariance matrix of the esti-matorΛ 0 (t). Then the T -optimal estimator of Λ 0 (·) = (Λ 01 (·), · · · , Λ 0p (·)) T is given by:Λ T opt 0 (·) =Ĥ(·)Λ 0 (·). (5.11) Finally, it has to be noted that, as in Section 5.1.3, it is also possible to get estimators of the survival function and the cumulative incidence functions as well as to get their large sample behavior (like in Theorem 5.3). But this is omitted here since it is very similar to what has been done in Section 5.1.3. Let us just note that, from a practical point of view, if for some t ∈ [0, τ ] the determinant of the matrixΣΛ 0 (t) is too small, then we setΛ T opt 0 (t) = (Λ 01 (t), . . . ,Λ 0p (t)) T .

Semiparametric inference under the MAR Assumption
Following the request of the Associate Editor and the referees, we have considered the possibility of weakening the assumption on the missingness mechanism. Indeed, one knows that the MCAR assumption can be too strong in some situations and it could be of interest to consider the case where the probability that the cause is missing depends on the time X as in the MAR case. The aim of this section is to consider such a situation. In this Section, we show that under the MAR assumption, one can obtain estimators of the Euclidean parameters (the regression parameters) as well as the functional parameters (the baseline cause-specific cumulative hazard rate function and, as a consequence, of the Cumulative Incidence Function and the survival function). Our approach is based on the double robust technique already used by Lu and Liang (2008) but it generalizes their work in the following ways. We do not assume independence between the causes of death; we assume an additive form on each cause-specific hazard rate function and, finally, we obtain estimators of the cumulative incidence functions as well as the survival function of the lifetime T (all causes confounded). As mentioned earlier in Section 1, there is, however, a disadvantage in considering such a more general situation since it won't be possible to improve the estimators as done in the previous section.

MAR Assumption
In this section we still assume that an additive hazard model holds on the causespecific hazard rate function, that is: , t ≥ 0, (6.1) for j = 1, . . . , p, where λ 0j (·) is the baseline jth cause-specific hazard rate function, Z(t) = {Z(u); u ≤ t} is the history up to time t of this covariate process and β j ∈ R k is the vector of regression parameters associated to the jth cause.
But we only assume a MAR assumption (Rubin, 1976) on the mechanism of missing information on the cause of death (and not a MCAR assumption as in previous sections). More precisely, we suppose that: where G is an auxiliary covariate collected for each individual and where O denotes (X, Z, G) to shorten notation. Thus, given δ = 1 and O, the probability that the cause is missing depends only on O (which is observable) and not on the cause of death (not always observed). It is also assumed that The above assumption of independence between R and d, given δ = 1 and O, allows us to write, for j = 1, . . . , p: Assuming that these probabilities don't depend on the auxiliary covariate G, let us denote them h j (Q), for j = 1, . . . , p, where Q = (X, Z).
In the following, we will assume that we observe a sample of (X, δ, D, Z(X), G), where we still write D = δRd. one can get estimatorsγ andζ of γ and ζ respectively by maximization of the likelihoods

Parametric model on the MAR mechanism
If their corresponding parametric models are correctly specifiedγ andζ are well-known to be consistent estimators of the true values of the parameters γ and ζ.

The double robust estimators of β j and Λ 0j (·)
Once estimators of γ and ζ are obtained, the question is how to estimate the regression parameters β j and the cause-specific cumulative hazard rate functions Λ 0j (·), for j = 1 . . . , p. One possible approach is to construct estimating equations based on the double robust technique introduced by Robins et al. (1994) and later used by Lu and Liang (2008) in a context similar to the present. To this end, let us introduce two new families of counting processes: for i = 1, . . . , n, j = 1, . . . , p and all t > 0. Note that, due to missingness of cause of death, the counting processÑ ij (·) are not always observable, but will always appear in the sequel multiplied by R i (itself given by the observation of δ i and D i ), which ensures that the product is observable. Estimators of the β j and the Λ 0j (·), for j = 1, . . . , p, will be obtained as solutions of the following estimating equations: Let γ ∞ and ζ ∞ be respectively the limit of the maximum likelihood estimatorŝ γ andζ introduced in previous subsection. As long as at least one of the two imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 models used on π(O) and on the h j (Q), for j = 1, . . . , p is well specified, one can show that we have, for all t > 0: This result is obtained thanks to consideration of the conditional expectation: , (6.5) for j = 1, . . . , p.
Remark. In the case where there is no covariate, one can fix the estimatorsβ j , for j = 1, . . . , p, to zero in (6.5) to obtain nonparametric estimatorsΛ j of the p cause-specific cumulative hazard functions.
Let us denote byβ the Double Robust estimator of the baseline cause-specific hazard rate functions.

Large sample behavior
Let us introduce the following notations: where ∇ u f (·) denotes the gradient function of a function f (·) computed at u, and finally where β 0 j and Λ 0 0j (·) are the true values of β j and Λ 0j (·). Note that S γ,i and S ζ,i are the score functions for individual i associated to the models on π(O) and the h j (Q), for j = 1, . . . , p, respectively. The matrices I γ and I ζ are nothing but the corresponding Fisher Information matrices. Theorem 6.1. Under Assumptions A2-A4, A6 and if at least one of the two parametric models assumed on the MAR mechanism is well specified, then the r.v. √ n β 1 − β 1 , . . . ,β p − β p T converges in distribution in R pk to a mean zero Gaussian r.v. with covariance block matrix: Proof. Since it is an adaptation of the proof of Theorem 1 of Lu and Liang (2008), only the skeleton of the proof is given. Fix j between 1 and p. The imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 solution of (6.2) in terms of Λ 0j (·), for all β j in R k is: , for all t > 0. Replacing Λ 0j (·) byΛ 0j (·, β j ) in (6.3) one has now to solve the following equation in terms of β j : that we will briefly write U j (β j ,γ,ζ) = 0.
A Taylor expansion of U j (β 0 j ,γ,ζ) around γ ∞ and ζ ∞ and rather standard arguments in empirical process and maximum likelihood areas enable us to obtain after some cumbersome calculus: On the other hand, since by definition ofβ j one has U j (β j ,γ,ζ) = 0, we can write: and the result of the theorem follows from an application of the Central Limit Theorem.
Note that the block (j, j ′ ), for 1 ≤ j, j ′ ≤ p, of the asymptotic covariance matrix given in the last theorem can be consistently estimated bŷ andB γ,j ,B ζ,j ,Î γ ,Î ζ ,Ŝ γ,i andŜ ζ,i are obtained from B γ,j , B ζ,j , I γ , I ζ , S γ,i and S ζ,i by replacement of γ ∞ and ζ ∞ by their maximum likelihood estimators, replacement of s 0 (·) and s 1 (·) by S 0 (·) and S 1 (·) and replacement of the expectations by their empirical counterparts.
Let us now consider the asymptotic behavior of the estimator of the baseline cause-specific cumulative hazard rate function Λ 0j (·). To this end, let us write: and Under Assumptions A2-A4, A6 and if at least one of the two parametric models assumed on the MAR mechanism is well specified, then the multivariate process converges weakly in D p [0, τ ], when n → +∞, to a zero mean Gaussian process with block covariance matrix functions given by E ψ 1,j (s)ψ T 1,j ′ (t) , for (s, t) ∈ [0, τ ] 2 and 1 ≤ j, j ′ ≤ p.
As in the case of the previous theorem, the blocks of the above asymptotic covariance matrix function can also be consistently estimated by whereψ i,j (·) is obtained from ψ i,j (·) with replacement of the unknown parameters by their estimators and the expectations by their empirical counterparts.
Proof of the Theorem. Again, the proof is on the same lines as in Lu and Liang (2008). We only mention here that using empirical processes arguments and a Taylor expansion around γ ∞ and ζ ∞ , one can write, for j = 1, . . . , p and t ∈ [0, τ ]: and the functional central limit theorem completes the proof.
imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 We are now in a position to consider the estimation of the cause-specific cumulative hazard rate function, given the covariate Z. Recall that the additive hazard model considered in this paper assumes that we have, for all t in [0, τ ]: Thus, an estimator of the conditional cumulative hazard rate function is given by: whereΛ 0j (·) andβ j are the double robust estimators introduced in Section 6.3. From the above, one can easily write, for all t in [0, τ ] and j = 1, . . . , p: Theorem 6.3. Under Assumptions A2-A4, A6 and if at least one of the two parametric models assumed on the MAR mechanism is well specified, then the multivariate process converges weakly in D p [0, τ ], when n → +∞, to a zero mean Gaussian process with block covariance matrix function given by The proof is omitted since the result immediately follows from equation (6.8) and the central limit theorem.
Again, the blocks of the above asymptotic covariance matrix function can be consistently estimated by: imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date:  We end this section by mentioning that it is also possible to get estimators of the survival function and the cumulative incidence functions as well as to obtain their large sample behavior (as in Theorem 5.3). To keep this paper shorter, we omit the details which require just an application of the approach of Section 5.1.3: one need only use the Gaussian process which appears in the limit of Theorem 6.3 as the process L ′ (·) in Theorem 5.3.

Numerical study
In this section we first present a simulation study to compare the properties of our estimators for different sample sizes, their robustness to model misspecification and their efficiency. Then we illustrate the use of our estimators through an application on three different real datasets.
The simulation work has been implemented with the R statistical software (see R Development Core Team, 2010).

Simulation design
We carried out Monte Carlo simulations with only two causes of death and a time-independent covariate Z of dimension 1 with continuous distribution. More precisely, let T 1 (resp. T 2 ) denote the continuous latent, or potential, survival time associated with risk/cause 1 (resp. 2). Hence, the individual lifetime T introduced in Section 2 is given by T = min(T 1 , T 2 ) and d = 1 (resp. d = 2) if T = T 1 (resp. T = T 2 ). Note that since the covariate Z is time-independent, the notation Z is useless here and won't be used in the sequel.
We assumed that the bivariate survival function of (T 1 , T 2 ), conditionally on Z = z, is given by where λ 3 is such that 0 ≤ λ 3 < min(λ 1 , λ 2 ). One can see that, conditionally on Z = z, the r.v. T 1 and T 2 are dependent if λ 3 > 0 and that the distribution of T j is exponential with rate λ j + β j z for j = 1, 2. It is also easy to check that the assumption of an additive hazard rate model is fulfilled in this case since, conditionally on Z = z, the cause-specific hazard rate function of T j is λ j (t|z) = λ j + λ 3 t + β j z, for t ≥ 0, for j = 1, 2. Finally, we assumed that the covariate Z is uniformly distributed on (0, τ Z ) and that the censoring time has an exponential distribution with rate λ C . Our computer program first simulates the covariate Z according to the uniform distribution, then simulates T 1 , given Z, with an exponential distribution and finally simulates T 2 , given (T 1 , Z), by numerical inversion of its conditional cumulative distribution function.
Then we considered separately two mechanisms of causes missingness: • the MCAR model with P (R = 1|X, Z, d, δ = 1) = α = 0.8 which leads empirically to 12,6% of missing cause of failure, 20.2% of failure from cause 1, and 29.0% of failure from cause 2; • the MAR model with a logistic regression model for the conditional probability π(X, Z), i.e.
Three sample sizes have been considered in our simulation study: n = 100, n = 400 and n = 1000. This simulation design was replicated 10 000 times.
Note finally that, under the MAR assumption, the estimates are obtained with a logistic regression model assigned to the conditional probabilities h j (x, z) = P (d = j|δ = 1, R = 0, X = x, Z = z), for j = 1, 2. More precisely: and h 2 (x, z) = 1 − h 1 (x, z).

Regression parameter estimators
Let us first study the behavior of the three different estimators of the regression parameters introduced in previous sections: • the WMC estimatorβ W M C given in (3.2) which generalizes an approach introduced by Lin and Ying (1994) and doesn't take into account the data with missing cause; • our T -optimal estimatorβ T opt introduced in (3.5) which uses in an optimal way the information carried by the missing causes in order to get an estimator of asymptotic minimal variance; • The DR estimatorβ DR introduced in (6.6) and obtained through a generalization of the Double Robust estimators of Lu and Liang (2008).
Recall that the two first estimators (WMC and T -optimal) are obtained under the MCAR assumption whereas the DR estimator is based on the MAR assumption.

Comparison of the three estimators under the MCAR and MAR models
imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 From the 10 000 times replication of the simulation design described in Section 7.1.1 we derived, for each estimator, the Monte-Carlo estimates of the following quantities: • the bias (Bias); • the variance (Emp. Var.); • the mean of the estimates of the theoretical variance given in the previous sections (Var. Mean); • the coverage probability of the 95% confidence intervals (Coverage %). Table 1 lists the results when the MCAR model was used for simulations whereas Table 2 gives the results for the MAR model.  Table 1 shows that, under the MCAR model, the T -optimal estimator has generally lower standard-errors than the others. Moreover it has the smallest bias in the case of a small sample size. For all the estimators, the larger is the sample size the better behaves the estimator. One can also see that, for all the estimators, the coverage probabilities are almost always close to the nominal value of 95%, even if for n = 100 the coverage probabilities are rather large for the DR estimator.
It is interesting to note in Table 2 that the DR estimator outperforms the two other estimators in terms of bias. The WMC estimator has a bad behavior with a significant bias and with a coverage percentage which deteriorates when the sample size increases. Contrary to the WMC estimator, the T -optimal estimator proves to be robust under the MAR assumption; its standard-error and its coverage probability appear to be competitive with respect to the ones of the DR estimator.
Efficiency with respect to the parametric model imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 Table 2 Simulation results for the three regression parameter estimators (WMC, T -optimal and DR) under MAR assumption. Monte Carlo estimates of the bias (Bias), the variance (Emp. Var.), the mean of estimated variance (Var. Mean) and the coverage percentage (Coverage %). Since the model used for simulations is fully parametric, one can get maximum likelihood (ML) estimators of the unknown parameters. Indeed, it is easy to see that under the MCAR assumption the likelihood function is given by where λ j (X i |Z i ) = λ j +β j Z j +λ 3 X i , for j = 1, 2, and S(X i |Z i ) = S T1,T2 (X i , X i |Z i ) is the conditional survival function of the lifetime T , given Z i . On the other hand, under the MAR assumption the likelihood function is given by where π(X i , Z i ) = exp(γ 0 + γ 1 X i + γ 2 Z i )/(1 + exp(γ 0 + γ 1 X i + γ 2 Z i )). Hence, we are able to obtain the ML estimator of the regression parameters (β 1 , β 2 ) by maximizing L (MCAR) n (resp. L (MAR) n ) under the MCAR (resp. MAR) assumption. To study the behavior of the ML estimator, we used the same criteria as for the three other estimators (WMC, T -optimal and DR). Thus, based on the 10 000 replications of the above simulation design, we obtained empirical estimates of its bias, variance, mean of the estimates of the asymptotic variance and coverage probability of the 95% confidence intervals. Note that the asymptotic variances are estimated by inverting the Hessian matrix of the log-likelihood function at its maximum.
We first provide a study of the behavior of the ML estimator under the MCAR and MAR assumptions. The results are summarized in Table 3. Irrespective of imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 the type of missingness, the ML estimator behaves correctly in case of moderate or large sample size but it has the weakness to provide conservative confidence intervals when the sample size is small. Let us now study the relative efficiency of our estimators (WMC, T -optimal or DR) with respect to the ML estimator. The relative efficiency is defined as the ratio of the empirical Mean Square Error (MSE) of one of our estimators with the empirical MSE of the ML estimator, both based on the same 10 000 simulated samples. These relative efficiencies are listed in Table 4. It is worth noting that the T -optimal and DR estimators have the same performance under MCAR while the WMC is less efficient. In case of MAR mechanism, the T -optimal estimator seems to have the better behavior. Finally, relative efficiencies of the WMC estimator are very large which confirms that this estimator should not be used in this case. Let us now study the behavior of the three different estimators of the baseline cause-specific cumulative hazard functions introduced in previous sections: • the estimatorsΛ 0j (·), for j = 1, . . . , p, given in (5.7), which use data Without Missing Causes and called WMC estimators in the sequel; • our T -optimal estimators of the baseline cause-specific cumulative hazard rate functions given in (5.11); • The DR estimators introduced in (6.7) and obtained through a generalization of the Double Robust estimators of Lu and Liang (2008).
Recall that, here also, the two first estimators (WMC and T -optimal) are obtained under the MCAR assumption whereas the DR estimator is based on the MAR assumption.
We have used the criterion of the Mean Integrated Square Error (MISE) in order to compare the different estimators of the baseline cause-specific cumulative hazard rate functions. More precisely, for cause j = 1, 2 and for each type of estimator (WMC, T -optimal and DR), based on N simulated samples we obtained estimatesΛ (1) 0j (·), . . . ,Λ (N ) 0j (·) of the baseline jth-cause-specific cumulative hazard rate function Λ j (·). An empirical estimate of the MISE of this estimator is then given by: where τ k is the ⌊0.95n⌋th order statistic of the observed durations in the kth simulated sample. Thus τ k is nothing but an estimation of the 0.95-quantile of the distribution of X. Table 5 lists the empirical MISE, based on N = 1000 samples, of the three types of estimators (WMC, T -optimal and DR) of the two baseline cause-specific cumulative hazard rate functions Λ 01 (·) and Λ 02 (·), under the MCAR model and also under the MAR model. One can see that the MISE is almost always lower for DR estimator than for the WMC and the T -optimal estimators. As expected, the difference in terms of MISE between the estimators is larger under MAR assumption than under the MCAR assumption where estimates of the MISE are similar. Under MAR assumption, the WMC has a bad behavior, even when the sample size increases. Again the T -optimal estimator outperforms the WMC estimator under the MCAR and MAR assumptions and, as for the regression parameter estimation, it reveals some robustness properties when the MCAR assumption is violated. It is not so for the WMC estimator.
imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 With the aim to apply our estimators, we have artificially created two new datasets with some causes of death missing. The first one is obtained from the WMC dataset after the application of a MCAR mechanism with α = 0.4 and is called "MCAR dataset". The second one is also obtained from the WMC dataset but after the use of MAR mechanism with probability of missingness given by the logistic regression model logit(π(X, Z)) = −1 + 0.01X + Z. Let us call it the "MAR dataset". The different percentages of events for the three datasets are listed in Table 6.
Except for the WMC dataset where only the WMC estimator is used, we have applied our three estimators (WMC, T -optimal and DR) on these datasets. Table 7 lists the estimates of the regression parameters, their standard-errors and the 95% confidence intervals. Our results agree with those of Pintilie (2006). We find that, for all datasets and all estimation methods, an age larger than 30 is a risk factor, whatever the event of interest. The age factor increases more significantly the specific risk of death without second malignancy than the specific risk of incidence of second malignancy. For the MAR dataset, the T -optimal estimates are close to those obtained through the DR method, while the WMC estimates are rather far from results obtained on the initial WMC sample. This last point confirms the fact that the use of the WMC estimator is risky under MAR assumption. We plotted in Figure 2 the different estimates we obtained, thanks of the WMC, T-optimal and DR methods under MCAR and MAR assumptions, of the two baseline cumulative risk functions associated respectively to the incidence of second malignancy and death without malignancy. The estimates of the cumulative incidence functions are presented in the book of Pintilie (2006). One can note for example that, the probability of a second malignancy for the younger subjects remains very low up to 15 years then it joins the value of the probability of death without second malignancy at about 25 years. This can explain the shape of the estimates of the baseline specific cumulative hazard functions presented in Figure 2. The estimate of the incidence of the second malignancy remains very low for a long period and then quickly increases and catches up or exceeds the estimate for the death without malignancy. Table 7 WMC, MCAR and MAR datasets on Hodgkin's disease (Pintilie, 2006). WMC, T -optimal and DR estimates of the regression parameters, their estimated standard error (within parenthesis) and a 95% confidence interval for the incidence of second malignancy (β 1 ) and death without malignancy (β 1 ).  Hodgkin's disease dataset (Pintilie, 2006). Estimates of the baseline incidence of second malignancy (solid) and death without malignancy (dotted) specific cumulative hazard functions using the WMC method (red), the T -optimal method (green) and the DR method (blue) under MCAR (left) and MAR (right) assumptions.

Time
patients had unknown cause of death. We use the same data than Cummings et al. (1986), Goetghebeur and Ryan (1995) and Lu and Tsiatis (2001) and, thus, we take into account two binary covariates: the presence of at least four positive nodes and the oestrogen receptor status (ER-negative) of their primary tumor, that were considered as significantly associated with overall survival. A cause-specific survival analysis study using the additive hazards model is carried out for these data. Of the 169 eligible patients, 79 had died at the time of analysis and cause-of-death information was incomplete. Of these 79, cause of death was available for 61, of whom 44 were classified as dying from their disease and 17 from other causes. The remaining 18 patients had unknown cause of death. The estimates (WMC, T -optimal and DR) for death due to breast cancer are shown in Table 8. It appears that, with the WMC and T -optimal methods, a negative oestrogen receptor status significantly decreases the risk of death due to cancer with very similar estimates. We do no found this result with the DR method although the regression parameter estimate is not too far. The three methods lead to the conclusion that a high number of positive nodes is not associated with death due to breast cancer. The three estimates (thanks to WMC, T -optimal and DR methods) of the baseline cause-specific cumulative hazard rate function associated to death due to cancer are shown in Figure 3. Time is the number of days from randomization.

Hard drive dataset
In Section 5.1 we have considered the nonparametric case (i.e. whitout covariate) and have obtained WMC estimators and T -optimal estimators of the functions imsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 Table 8 Breast Cancer dataset. WMC, T -optimal and DR estimation of the regression parameters for breast cancer data. For breast cancer cause of death are given WMC, T -optimal and DR estimates of the regression parameters, their estimated standard error (within parenthesis) and a 95% confidence interval.  Estimates of the baseline cause-specific cumulative hazard rate function for the risk of death due to breast cancer using the WMC method (red), the T -optimal method (green) and the DR method (blue). Time is days from randomization. Some of the causes of failure (such as "defective head") are related to components, but others (like "particle contamination") are not. In this dataset, there are three major causes of failures which, without going into details, are denoted as causes 1, 2 and 3. We assume that these causes act in series. Some of the failures were masked and a number of them were analyzed for complete resoluimsart-ejs ver. 2013/03/06 file: Bordes_Dauxois_Joly_R2.tex date: January 9, 2014 tion in a defect isolation laboratory. The observed masked groups were {1, 2, 3} and {1, 3}. Considering causes 1 and 3 as a single cause of failure, we obtain a new dataset with two competing modes of failure: mode 1 (corresponding to causes 1, 3 or masked group {1, 3} in the original data set) and mode 2 (corresponding to the cause 2 in the original data set). The failure cause is missing in this new dataset when none of the three original causes of failure is known (corresponding to the masked group {1, 2, 3} in the original data set). Finally, this new dataset contains 119 failures of type 1, 19 failures of type 2, 34 failures with missing cause and 9828 censored observations.

Method
Assuming that the MCAR assumption holds, the probability α that the cause of failure is missing is empirically estimated byα = 0.802. We plotted in Figure 4 the WMC and T-optimal estimates of the two cause-specific cumulative hazard rate functions as well as the 95% pointwise confidence intervals of the T -optimal estimators. We can see that our T -optimal estimators are slightly more regular (with smaller size jumps) than the WMC estimators; however the T -optimal estimators may not be fully legitimate (they can be locally decreasing). However, the latter drawback disappears when the sample size increases because of uniform convergence of our estimators.  (Flehinger et al. 2002). Estimates of the Cause-specific cumulative hazard functions for the two failure causes: WMC estimates (green lines); T -optimal estimates (red lines) with their 95% pointwise confidence intervals (blue lines).

Second cause
Time Cause-specific cumulative hazard erable improvement of an earlier version of the paper. Thanks also to Professor Syed Kirmani who checked the english writing of this paper. The authors are grateful to the Eastern Cooperative Oncology Group (ECOG) for providing the data on the breast cancer.