Modeling Data With Semicompeting Risks : An Application to Chronic Kidney Disease in Colombia

In this paper, the structure of semicompeting risks data, de ned by Fine, Jiang & Chappell (2001), is studied. Two events are of interest: a nonterminal and a terminal event, the last one, can censor the non-terminal event, but not vice versa. Due to the possible dependence between the times until the occurrence of such events, two approaches are evaluated: modelling the bivariate survival function through Archimedean copulas and a shared frailty model. A simulation is conducted to examine its performance and both approaches are applied to a real data set of patients with chronic kidney disease (CKD).

En este trabajo se estudia la estructura de datos con riesgos semicompetitivos denida por Fine et al. (2001).En esta estructura existen dos eventos de interés; uno intermedio y otro terminal, este último puede censurar el evento intermedio, pero no viceversa.Dada la posible dependencia, entre los tiempos hasta la ocurrencia de tales eventos, dos tipos de enfoques son evaluados: uno, modelando la función de supervivencia bivariada a través de cópulas Arquimedianas y el outro por medio de un modelo con fragilidad 1. Introduction Currently, chronic kidney disease (CKD) is considered a worldwide problem and a great increase of deaths caused by CKD has been observed from 1990 to 2010 (Martín-Cleary & Ortiz 2014).In Colombia, its prevalence has increased (Gómez 2006) and, according to Fondo Colombiano de Enfermedades de Alto Costo (Cuenta del Alto Costo 2014),975.479people had CKD in 2013.In December 1993, the social security system was reformulated (the law 100 was enacted) and this change increased considerably the coverage of patients with renal disease.CKD is a high cost disease and the excessive cost is a major concern for healthcare systems.Therefore, there is great interest in studying CKD and its progression, mortality and response to treatments, such as hemodialysis and peritoneal dialysis.In this work, we are concerned with the analysis of a dataset with end-stage renal disease (ESRD), stages IV and V, of CKD patients, followed from 2009 to 2011, from ve cities in Colombia: Manizales, Monteria, Rionegro, Sincelejo and Tunja.The data contains information about 1253 patients who had started dialysis treatment.The patients were periodically examined and tests were performed to obtain the levels of calcium, inorganic phosphorus, serum albumin, serum creatinine, among other blood tests.There is a major interest in studying survival times of those patients, but it is also of interest to study the disease's progression.Researchers involved in this study were particularly interested in studying the time until a progression of the disease occurs.This progression can be considered as a worsening in the patient's health, determined by the levels of some substances in the blood and by the clinical evaluation of the patient.Specically, in this study progression was dened by high levels of inorganic phosphorus (level over 5,6 mg/dl).Therefore, in this particular study, there are two main events of interest: time until progression and time until death.
Common survival analysis techniques are suitable when the interest is to study only the time until the occurrence of an event.When observations are subject to more than one type of event (e.g, dierent causes of death) and the occurrence of an event prevents the occurrence of the other, there is a competing risks structure.Our CKD data, nevertheless, cannot be viewed as competing risks data since some patients may experience the progression and also death.In our example, the main interest is to study two events, one of which (terminal event) prevents the occurrence of the other (non-terminal event), but not vice versa.This structure is known as Semicompeting Risks, dened by Fine et al. (2001).In cancer studies, typically the non-terminal event is the relapse of cancer and sometimes the nonterminal event is simply called progression.
Recently, increased attention has been given to semicompeting risks data, and there are two main diculties in addition to the existence of censored data: (i) the hypothesis that we have a pair of random variables with the restriction that one of them is always smaller than the other one (progression necessarily occurs before death) or one of them is not observed and (ii) the time until the non-terminal event and time until the terminal event cannot be considered as independent random variables.This possible dependence must be taken into account into statistical analysis.According to Hsieh, Wang & Ding (2008), if the relationship between the two events is completely unspecied, the marginal distribution of the time to a non-terminal event is not identiable due to possible dependent censoring.In this context, while the univariate Kaplan-Meier (K-M) estimator can be used to estimate the survival function of the terminal event, it cannot be used for the nonterminal event and even a descriptive analysis for semicompeting risks data must be done using appropriate methodologies.It can be found in the literature dierent generalizations to the Kaplan-Meier estimator for bivariate situations (see, for example, Campbell & Foldes 1980, Campbell 1981, Hanley & Parnes 1983, Tsai, Leurgans & Crowley 1986, Dabrowska 1988, Prentice & Cai 1992).Although we are interested in a bivariate survival function, semicompeting risks structure is dierent and it is not straightforward to adapt those estimators for our context.Therefore, the approach of Lakhal & Abdous (2008) using Archimedean copulas is a natural choice since it was proposed for semicompeting risks data.It also allows one to study explicitly the dependence between the time until the terminal and non-terminal events.Moreover, Archimedean copulas are quite referenced in the literature for bivariate survival models and they are associated with frailty multiplicative models (Romeo 2005).
In this work, we are focused on the statistical analysis of CKD data set using appropriate methods for semicompeting risks data.In order to carry a complete data analysis, we combined dierent new methodologies already available in the literature.Initially, we are concerned with methodologies for an appropriate descriptive analysis and we consider the estimation of the bivariate survival function and the non-terminal event survival function through Archimedean copulas, as suggested by Lakhal & Abdous (2008).Estimators for the survival functions are obtained for some Archimedean copulas (see Nelsen 2006, p.116) and we discuss those estimators in Section 2. In the sequence, we are interested in a regression model for both the time until progression and time until death.We focus on a three-state process, known as Illness-Death process, initially studied by Fix & Neyman (1951) and proposed by Xu, Kalbeisch & Tai (2010) for semicompeting risks data.In this case, the inclusion of covariates and a possible dependence between the two times (terminal and non-terminal events) is taken into account by a shared frailty and the model is presented in Section 3. We also present a short simulation study comparing the performance of this model and a naive regression model where the dependence is not included.Finally, in Section 4 we present the results of the analysis of CKD data and in Section 5 we discuss the methods and results.

Archimedean Copulas
In this section, we describe the methodology used for a proper descriptive analysis of semicompeting risks data.We briey introduce the concept of copulas and discuss the four copulas considered in this work.We then discuss estimation procedures for plotting graphics of the marginal survival functions of both events of interest.
Let X be the time to the non-terminal event and let Y be the time to the terminal event.For semicompeting risks data we have the restriction X < Y , thus, the bivariate survival function is dened in the upper wedge.Furthermore, the assumption of independence of X and Y is not reasonable and, therefore, we cannot use K-M estimator for the survival function of the non-terminal event X.It is then necessary to incorporate the dependence between X and Y in the estimator.In the bivariate case, let (X, Y ) be continuous random variables with marginal survival functions (S X (x), S Y (y)).The bivariate survival function based on a copula C α (•) for some α ∈ A is given by The copula is a natural function and an ecient way to construct multivariate distributions for random variables, since it couples the marginal distribution and captures their dependence.Archimedean copulas are an important family of copulas associated with bivariate survival models (Romeo 2005) and are dened by where φ α (•), known as the generator of C α (•, •), is a continuous strictly decreasing convex function from [0, 1] to [0, ∞) such that φ α (1) = 0 and φ −1 α (•) denotes the inverse of φ α (•) (Nelsen 2006).The parameter α is associated with the intensity of the dependence between the variables.Usually, this dependence is quantied by Kendall's tau measure of association, discussed in more details in Section 2.2.
Although the approach of Lakhal & Abdous (2008) for estimating survival and association in the semicompeting risks works for any Archimedean copula, we have implemented the methodology for the most referenced families.The Clayton family (which we call Family A) is associated to the multiplicative frailty gamma model and the dependence is (strongly) concentrated in the extreme lower left.The Frank family (family C) admits both positive and negative dependence and the Gumbel family (family D) has positive dependence strongly concentrated in the extremes (tail dependence).We have also implemented the family B corresponding to the family 4.2.2 dened on page 116 of Nelsen (2006) due to the simple expression of the copula.Table 1 shows the function C α (u, v) that denes each family and the corresponding generating function.

Censoring and Notation
We assume that our data is subject to right-censoring and therefore we may not observe X and Y for all subjects.The observable quantities are described as follow.Let Z = min(X, Y ) and δ x = I {X<Y } , where I {X<Y } is given by I {X<Y } = 1 if X < Y and I {X<Y } = 0 otherwise.Let C be a right-censoring variable independent of (X, Y ) and dene the random variables R = min(Y, C), S = min(Z, C), δ y = I {Y <C} , δ z = I {Z<C} and δ xz = δ x δ z .Thus, the data set consists of n independent replications of It is important to notice that, due to the right-censoring, for semicompeting risks four situations are possible: Case 1. Subject was observed to have progression at x i and death at y i , then, Case 2. Subject died at time y i without progression, D i = (R i = y i , S i = y i , δ yi = 1, δ zi = 1, δ xzi = 0); Case 3. Subject was observed to have progression at x i and right censoring at y i , so Case 4. Both events progression and death were not observed, and hence,

Measures of Association
As mentioned before, the parameter α in (1) and ( 2) is associated with the intensity of the dependence and its interpretation depends on the specic copula considered.A common association measure used in this context is the Kendall's tau coecient, dened by ) and (X 2 , Y 2 ) are two independent replications of (X, Y ) of any random variables.For Archimedean copulas, Kendall's tau coecient can be easily computed, see Nelsen (2006) for more details.
In semicompeting risks, when right censoring is observed, (X i , Y i ) and (X j , Y j ) can be compared only when the event . This leads to some diculties when working with Kendall's coecient and it is then useful to dene the conditional Kendall's tau (Oakes 1989): Another important association measure is a local counterpart of the Kendall's tau, which is a local measure of dependence, dened by Clayton (1978) as the cross-ratio function, given by where Oakes (1989) showed that θ * α (x, y) = θ(S XY (x, y)), then for the Archimedean copulas we have the result where k = S XY (x, y).
Table 2 shows the expressions for Kendall's tau coecient and the corresponding cross-ratio function for each of the four copulas considered in this work.
The cross-ration function and the conditional tau coecient are related and it can be shown (Patiño 2012) that In the next section, expression (3) will be used as motivation for an estimating equation for α.

Estimation
We are interested in estimating the marginal survival functions S X (x) and S Y (y) and also the bivariate survival S XY (x, y) based on our available data.Some conditional functions in which the copula parameter α is always involved may also be of interest and will be discussed later.To construct (1), we need the nonterminal and terminal survival function S X (x) and S Y (y), respectively, as well as the parameter α of the assumed copula.As mentioned before, the traditional K-M method (Kaplan & Meier 1958) can be used to estimate S Y (y).However, for the non-terminal survival function S X (x) it is not appropriate due to the possible dependence between X and Y .In this context, various estimators have been proposed: non parametric estimator by Fine et al. (2001), Pseudo Self-Consistent estimator by Jiang, Fine, Kosorok & Chappell (2005) and the copula-graphic estimator by Zheng & Klein (1982), which was adapted by Rivest & Wells (2001) for dependent censoring and by Lakhal & Abdous (2008) for semicompeting risks data.
We describe here the method proposed by Lakhal & Abdous (2008), know it as copula-graphic estimator for semicompeting risks.The authors rst consider the estimation of the parameter α.Motivated by expression (3), it is possible to show that This result motivates the following estimating equation: where w(•, •) is a random weight function associated to the pair (i, j).Fine et al. (2001) suggest for a > 0 and b > 0 constants.Note that equation (4) depends on the bivariate survival function, which it is what we want to estimate in (1).Nevertheless, a non parametric estimator may be used as a rst approximation and Lin & Ying (1993) suggested the following where ĜC (•) is the K-M estimator of censoring survival function C.
For each copula considered in this work, a dierent estimating equation is obtained and numerical procedures may be needed.For the Clayton copula, a solution for the equation may be obtained analytically.The estimation equations or solution for the four copulas considered are shown in Table 3.
After estimating α, we use the copula-graphic estimator adapted by Lakhal & Abdous (2008) for the survival function associated to the non-terminal event.
The idea is to construct an estimator for S X (t) with similar properties to the K-M estimator, i.e., that jumps when an non-terminal event is observed.In order to do so, we need to consider Z = min(X, Y ) and it is important to notice that If t 0 is an instant when a non-terminal event (X) was observed, then it is also an instant that Z was observed.Therefore, the Kaplan-Meier estimator of S Z (t), which we denote by ŜZ (t), should jump at t 0 and where ŜCG X (t 0 ) is the copula-graphic estimator.Assuming continuous X and Y , the two events cannot occur at the same time and For an arbitrary t, summing this expression for all t 0 < t, we have and the copula-graphic estimator is given by Finally, we can nd the semiparametric estimator of the bivariate survival function (1) with the marginals (5) for S X (x) and K-M Estimator for S Y (y).Large sample properties of the resulting estimator have been established (weak convergence to a a Gaussian process) using martigale representation (Rivest & Wells 2001).Furthermore, from de the bivariate survival function, some other useful quantities can be obtained (Lakhal & Abdous 2008), such as the probability that a patient survives at y given that he has not progression at time x, given by and the survival probability at y given that the patient has a progression at time x and is alive at time t > x: These estimators can be easily obtained for the four families in Table 1; for more details see Patiño (2012).
Regarding computational implementation, only the estimation of the dependence parameter is computationally intensive.As already mentioned, for the Clayton copula there is no iterative method because an analytical solution is available; for the other copulas, there is a sum over n 2 terms in the equations and this may be the computational lengthiest step in estimation for large n, but this does not represent a real problem in applications.

Illness-Death Model
Generally in applied survival analysis, it is of interest to determinate the eect of covariates on the risks occurrence of the event of interest.For this purpose we use the Illness-Death model with shared frailty, proposed by Xu et al. (2010).
In this model, the semicompeting risks data can be described by a three-state process, the illness and death process, which is a process widely discussed in the literature (see, Fix & Neyman 1951, Fiocco, Putter & van Houwelingen 2008).The state occupied by the individual at time t = 0 is called on study (state 0) and it is assumed the patient remains in this state until the intermediate event or the death occurs.When the intermediate event (not terminal) occurs, the individual condition is called relapse or progression (state 1) and the death is the nal condition or absorbing state (state 2).Notice that it is not possible a transition from the intermediate state to the initial state.The main interest is to model the transition intensities functions, i.e., the hazard rates associated with the transitions between states (See Figure 1).
Modeling Data with Semicompeting Risks 9 and the survival probability at y given that the patient has a progression at time x and is alive at time t > x: These estimators can be easily obtained for the four families in Table 1; for more details see Patiño (2012).
Regarding computational implementation, only the estimation of the dependence parameter is computationally intensive.As already mentioned, for the Clayton copula there is no iterative method because an analytical solution is available; for the other copulas, there is a sum over n 2 terms in the equations and this may be the computational lengthiest step in estimation for large n, but this does not represent a real problem in applications.

Illness-Death Model
Generally in applied survival analysis, it is of interest to determinate the eect of covariates on the risks occurrence of the event of interest.For this purpose we use the Illness-Death model with shared frailty, proposed by Xu et al. (2010).
In this model, the semicompeting risks data can be described by a three-state process, the illness and death process, which is a process widely discussed in the literature (see, Fix & Neyman 1951, Fiocco, Putter & van Houwelingen 2008).The state occupied by the individual at time t = 0 is called on study (state 0) and it is assumed the patient remains in this state until the intermediate event or the death occurs.When the intermediate event (not terminal) occurs, the individual condition is called relapse or progression (state 1) and the death is the nal condition or absorbing state (state 2).Notice that it is not possible a transition from the intermediate state to the initial state.The main interest is to model the transition intensities functions, i.e., the hazard rates associated with the transitions between states (see Figure 1).The model proposed by Xu et al. (2010) is based on the hazard ratios functions of transitions between the states 0 (under study), 1 (relapse/progression) and 2 (death) dened by Notice that the hazards rates λ 1 (x) and λ 2 (y) are not equal to the usual hazard rates dened in survival analysis; they correspond to the specic cause risk function used in the context of competing risks (Klein & Moeschberger 2003).
As mentioning before, in semicompeting risks, it is not appropriate to assume independence between the event X and Y .Therefore, in order to introduce a possible dependence of the transition times between the states progression and death, we consider the use of a shared frailty model, denoting the frailty by a random variable γ.The authors propose to model the conditional transitions hazard rates given γ and include this random variable multiplicatively.Since each transition hazard rate must be non-negative, we must have γ > 0 and it is convenient to assume γ ∼ Γ(1/θ; 1/θ) such that E(γ) = 1 and the variance is θ (which is unknown and must be estimated).The gamma distribution for the random eects is convenient for semi parametric proportional hazards models mainly due to mathematical convenience (for details see Wienke 2011) and it is, therefore, a natural choice for a rst approach.However, models with dierent frailty distributions should be considered in future research.
We are mainly interested in estimating the coecients β 1 , β 2 and β 3 , which contains information on the eects of covariates on the intensity functions conditional on frailty.Xu et al. (2010) do not make any assumption for the baseline functions λ 01 (x), λ 02 (y) and λ 03 (y) and non-parametric estimators are considered.

Estimation
Maximum likelihood can be used to obtain the estimators of the model parameters in (7).From this model, notice that the probability density function (and survival function) depends on the baseline transition intensity functions λ 01 (k), λ 02 (k) and λ 03 (k), which are unknown and must be estimated.Following the idea of several non-parametric estimators of cumulative hazard functions (see Breslow (1972), Breslow (1975)), we can approximate the cumulative hazard function by a function that jumps when a failure or event is observed.Assuming that we observe m intermediate events, estimation of λ 01 (x) reduces to the problem of estimation m parameters.Denoting by t r1 , t r2 , • • • , t rm the instants in which a non-terminal event was observed, t d1 , t d2 , • • • , t df the times when a terminal event without progression was observed and t rd1 , t rd2 , • • • , t rdg the times in which a terminal event occurred after the progression, we can write With this notation, the vector of parameters is For the estimation of η, it is necessary to obtain the joint survival function and the corresponding density probability function of (X, Y ) to determine the contribution of an individual in each case on the likelihood L(η).In semicompeting risks structure, it is not straightforward to compute the bivariate probability density function of (X, Y ) from the model specication (7) because we have the restriction that X < Y and X may not be observed.
In order to compute the bivariate density function, we assume conditional independence of X and Y given the frailty.Due to relation beetwen the hazard, density and survival functions h(•) = f (•)/s(•) from ( 7) we can write where y) exp{β 2 z}, otherwise (non-terminal event not observed) and x ≤ y, In expression ( 8), notice that the situation in which the non-terminal event is not observed depends on x, which is not appropriate.Xu et al. (2010) assume that this situation correspond to X = ∞ and compute Expressions (8) for x < y and ( 9) are the probability densities functions needed for the likelihood construction.Assuming that our data is subject to right censoring, we must compute the survival distributions for censored observations.As discussed before, due to censoring, we have four dierent possibilities and the contribution for the likelihood is dierent for observations in each case: for observations in case 1, the contribution to the likelihood is the density f XY (x, y), x < y; in case 2, the contribution is given by expression (9), i.e., f ∞ (y|z); for case 3, we must compute S Y |X=x (y)f X (x); and, nally, for case 4, we need the survival function S XY (x, y) (with x = y).Table 4 presents the expressions of the contribution for the likelihood of each case.
Finally, the estimators are solution of score equation U (η) = ∂ log(L(η)) ∂η = 0.The log likelihood function and the components of score vector are presented in the Appendix A. Xu et al. (2010) specify some conditions to establish asymptotic properties of the estimators.They discuss identiability of the model and asymptotic convergence of estimators.The numerical approach proposed by the authors can be summarized as follows: 0j (k), j = 1, 2, 3 be initial estimates.We can set θ (0) = 0, β (0) j = 0 and for Λ (0) 0j (k), j = 1, 2, 3 we use the Nelson-Aalen type estimate of the respective cumulative hazard functions.
For the estimation of parameters of this model, we use the R package nqslv for the numeric solutions of the score equations.For estimation of θ, β 1 , β 2 and β 3 we use a convergence criteria = 0.001 and for λ 01 , λ 02 , λ 03 we consider a less restrictive criteria δ = 0.01.We did not have convergence problems and the procedure is not much inuenced by initial values (for regression parameters, initial values were obtained by a Cox model).

Simulation Study
A simulation study was conducted mainly in order to compare the illness-death frailty model with a simple regression model that does not consider the dependence of the observed events.
Considering the model ( 7), we simulated the frailty with mean 1 and variance θ from a gamma distribution with shape and scale parameters 1/θ.We added a random variable Z ∼ Normal (0,1) representing a covariate.We generated n observations T 1i and T 2i from the Weibull distribution for both non-terminal and terminal events: T 1i ∼ Weibull with hazard function λ 1i = γ i e α+β1xi and T 2i ∼ Weibull with hazard function Since our data is subject to right-censoring, we added a proportion p c of censored observations.As it is usual in real data applications, we considered both type I and random censoring: we assume that the study ends at time τ (type I censoring) and, if p c is not achieved, we added a random censoring with C ∼ Unif(0, τ ).
We report results from 500 replications for n = 100, n = 500 and n = 800 with β 1 = 1, β 2 = 1, frailty variance given by θ = 0, 5, θ = 1 and θ = 2 and, nally, xed proportions of censoring p c equal to 40%, 60%, 70%.We chose a follow up time τ = 1095 days motivated by our real data application.All computer codes were developed for R software (for both simulation and application) and are available upon request.
Results of mean and standard error of estimated parameters from 500 replications are summarized in Figures 2 and 3.The horizontal line corresponds to true value of parameters β 1 and β 2 .In general, it is possible to observe that for most situations the illness death model was less biased than the naive Weibull model, and it is interesting to notice that, as the variance of frailty increases, the bias also increases.Therefore, one can conclude that the illness death model performs very favourably in terms of bias, although the variance of estimators (and hence the mean squared error) may be greater than the corresponding variance of the naive Weibull model for small samples.This is expected, since there are more parameters to be estimated in the frailty model considered.

Kidney Disease Data Analysis
As previously discussed, we are interested in analysing a dataset with 1253 end-stage renal disease (ESRD), stages IV and V, of CKD patients, followed from 2009 to 2011 by Fresenius Medical Care in Colombia and Laboratory Medical Echavarria.Patients included in the analysis started dialysis treatment and were periodically examined.Tests were performed to obtain the levels of calcium, inorganic phosphorus, serum albumin, serum creatinine, among other blood tests.It is of interest to study survival time of patients from beginning of dialysis treatment, but it is also of interest to study the disease's progression.In this rst approach, it was considered that a patient experiences progression when their level of phosphorus in the blood is greater than 5.6 mg/dl.
All patients are from ve cities in Colombia: Manizales (280 patients), Monteria (370 patients), Rionegro (131), Sincelejo (334) and Tunja (138) and 61.2% are males.Regarding age, people between 51 and 65 years represent 34% of the data and those over 66 years represent 30.6%.An initial descriptive analysis of data was done using copula graphic estimators.We computed the copula graphic estimator using the four copulas considered in this work.Table 5 shows the estimates of the copula parameter and Kendall's tau coecient.An important aspect in application is the choice of the copula for the copula graphic estimator.We suggest a scatter plot of time to progression and time to death of uncensored observations as a rst guide and also to use several copulas and compare the resulting plots.In our example, the estimated dependence is positive and small for most copulas; only family B copula had a dierent result.Although the estimated dependence for Family B copula is dierent form other copulas, copula graphic estimates for the survival function are very similar, as shown in Figure 4.
From the K-M plot of the survival distribution of patients and the copula graphic estimator of the survival function of the progression, we can conclude that after two years we still have around 60% of patients alive and also 60% of patients without progression of the disease.It is important to notice that we have only two years of follow up, and the graphics show that a longer follow up time would be valuable for a better understanding of disease progression.Family A -Clayton copula 0,586 (0,142) 0,227 (0,042) Family B 1,559 (0,022) -0,282 (0,018) Family C -Frank copula 1,387 (0,295) 0,151 (0,031) Family D -Gumbel copula 1,141 (0,038) 0,124 (0,029) For a multivariante analysis, we tted the model described in Section 3 and four covariates were included: treatment (peritoneal dialysis/hemodialysis), sex (male/female), age(≤ 30 years/ ≥ 31 and ≤ 50 years/ ≥ 51 and ≤ 65 years/ ≥ 66 years) and city (Manizales/ Monteria/ Rionegro/ Sincelejo/ Tunja).The results of estimated coecients shown in Table 6 and Table 7.The treatment was signicant for time to progession and time to death without progression.Patients treated with hemodialysis have a higher risk of progession, however a lower risk of death when compared to patients having peritoneal disease.This result seems consistent with the fact that there was a smaller proportion of patients with death without progression (67.2 %) than with progression (80.2 %) and death after recurrence (80.6 %) in the group treated with hemodialysis.
Age is an important covariate to be included (it is signicant), and the results are as expected.It is interesting to notice that age group 4 has a lower risk of progression, however a high risk of death.Regarding the city, Monteria and Rionegro showed signicant progression and death after progression eect, which show us that the local where the patient is treated is an important factor that must be included in the analysis.The covariate sex is not signicant in any transition rate and the estimated dependence is positive, but not strong (τ = 0.05).
It is important to notice that our approach was the use of hypothesis testing for covariate selection, however it would be useful to implement a discrimination criteria to help the decision of the best model, as pointed out by Lakhal & Abdous (2008).Also goodness-of-t methods need to be developed to both assess the overall quality of t in the regression model and assist the selection of a parametric family of Archimedean copulas (although in our application the conclusions were similar for dierent copulas).

Discussion
In this work, the semicompeting risks structure approach was used to analyse chronic kidney disease data.We focused on two events: the progression of disease (considered as a non-terminal or intermediate event) and death (terminal event).Although this is a preliminary analysis, the results will provide useful information for decision-making purposes and planning politics for patient care centers.
The rst diculty that arises in the analysis of semicompeting risks data is to obtain estimators of the survival function for a simple descriptive analysis or data visualization.Given the possible dependence between the two events, the Kaplan-Meier estimator is not suitable for estimating the marginal survival function observed when the intermediate event occurs.The approach proposed by Lakhal & Abdous (2008) using the copula-graphic estimator in this situation seems to have good results.The use of the Archimedean copula family proved to be quite convenient for estimating the bivariate survival function.In general, for an initial descriptive analysis data, the approach with copula is a very good alternative.The selection of a particular copula of the Archimedean family can indeed inuence the estimates and, although the dierences among dierent copulas observed in the application are not signicant, it would be interesting to develop a methodology to evaluate somehow which is the most convenient copula for a given application.
When the interest is focused on eects of covariates and a regression model is needed, the model proposed by Xu et al. (2010) is a very interesting alternative.The estimation by maximum likelihood and the iterative process is quite fast, obtaining convergence in few iterations.In this case, the baseline hazard rate functions are not specied and an usual approximation was used for estimation (the cumulative basal hazard function is approximated by a step function that jumps when an event is observed) which depends on the size of the sample.A parametric approach could also be developed in order to decrease the number of parameters to be estimated.Also, asymptotic properties of estimators have been established, however there are not any results for small sample sizes, so we do not recommend the use of this methodologies when the sample size is not large.
In general, the results obtained are consistent with expected by researchers and hemodialysis results to be a good alternative in many situations.Our data had patients form ve dierent cities and Manizales is known to have a wide trajectory in the treatment of renal patients.Our results showed that Manizales is the city with higher survival probabilities (for both death without progression and death after progression).Rionegro and Sincelejo renal units are newer, and our results found less favourable conditions.Regarding the fragility parameter (i.e., the variance of the random eect), the estimative obtained from the general model, θ= 0.109 (0.222), indicates a weak association between progression and death.It is also an expected result considering that the observation time of patients with this disease was short.This is an ongoing study and a study with longer follow up time is needed.Researches suggest to continue this study to evaluate the survival of patients with CKD with a follow-up longer than two years and including clinic covariates like co-morbidity, type of primary disease and manifestations or not of extrarenal complications.-0.193 -0.197 -0.220 -0.248 -0.244 -0.245 -0.244 -0.241

Bias
-Death model for Semicompeting risks data.The model proposed byXu et al. (2010) is based on the hazard ratios functions of transitions between the states 0 (under study), 1 (relapse/progression) and 2 (death) dened by Revista Colombiana de Estadística.En edición (2019) 123

Figure 4 :
Figure 4: Estimated marginal survival functions for disease progression (left) and death (right).

Table 2 :
Association measures of four Archimedean copulas families.

Table 3 :
The estimation equations four Archimedean copulas families.

Table 4 :
Contribution to the likelihood construction for each possible situation for right censored data.

Table 6 :
Estimates of Illness-death model for kidney disease analysis (death).

Table 7 :
Estimates of Illness-death model for kidney disease analysis (progression).