Conditional MLE for the proportional hazards model with left-truncated and interval-censored data

https://doi.org/10.1016/j.spl.2015.02.015Get rights and content

Abstract

We consider conditional maximum likelihood estimator (cMLE) for the proportional hazards model with left-truncated and interval-censored data. We show that when the covariates are discrete the cMLE is the MLE, and under some regularity conditions the cMLE for the regression parameter is asymptotically normal and efficient.

Introduction

In epidemiology and individual follow-up studies, the lifetime is often subject to censoring and truncation. Left truncation may occur if the time origin of the lifetime precedes the time origin of the study. Interval censoring arises when the lifetime cannot be observed, but can only be determined to lie in an interval obtained from a sequence of examination times. For example, in Acquired Immune Deficiency Syndrome (AIDS) cohort studies, we are interested in the incubation time (denoted by T) of AIDS. For transfusion-associated AIDS, the HIV (human immunodeficiency virus) infection time (denoted by Ts) can be quite accurately determined. The recruitment starts at τ0 and an individual is selected only when he (or she) is HIV positive and has not developed AIDS. Let V=τ0Ts if Ts<τ0 and V=0 if Tsτ0. Hence, the individuals who have developed AIDS at the time τ0 (i.e., TV) were excluded from the study, resulting in left truncation of the survival data. Furthermore, the calendar date of onset of AIDS is usually recorded in an interval. Thus, the sample consists of left-truncated and interval-censored survival times.

The most common approach to model covariates effect on survival time is Cox’s (1972) proportional hazards model. For left-truncated and right-censored data, there is a counting process martingale associated with the Cox model (Andersen et al., 1993) and one can obtain consistent estimates of the regression coefficients using partial likelihood that does not require estimation of baseline. However, with interval censored data, there are no obvious martingales associated with the Cox model, nor does partial likelihood work as simply. In literature, there exist many studies for Cox model with interval-censored data. Finkelstein (1986) considered a parametric method, where the baseline distribution is fit simultaneously along with regression coefficients. Huang and Wellner (1995) and Huang (1996) showed that the maximum likelihood estimator (MLE) is asymptotically normal and efficient for case 2 (LTIC2, i.e., two censoring variables) and case 1 interval-censored data (LTIC1, i.e., current status data), respectively. Based on Gibbs sampling, Satten (1996) considered a marginal likelihood approach. Satten et al. (1998) proposed a semi-parametric approach where a parametric form of the baseline distribution is assumed. Pan (2000) used a multiple imputation procedure to fill-in failure times. Goetghebeur and Ryan (2000) proposed an EM algorithm based on piecewise constant event times, which leads to an M-step that involves maximizing a standard Cox partial likelihood and an E-step that takes a particularly simple form. Betensky et al. (2002) discussed the use of local likelihood to jointly estimate the regression coefficient and the baseline function. Based on estimating equations, Heller (2011) proposed an inverse probability weight to select event time pairs where the ordering is unambiguous.

When truncation is present, Alioum and Commenges (1996) considered hypothesis testing for the Cox model with LTIC data. Pan and Chappell (2002) considered the estimation of the parameters in the Cox model with LTIC2 data. They showed that the estimates of the regression coefficients from the joint likelihood of the regression coefficients and the baseline survival function work well for the Cox model with LTIC2 data, but the baseline survival function can be under-estimated. For the Cox model with LTIC1 data, Kim (2003) established asymptotic properties of the likelihood estimates. For arbitrarily censored and truncated data, Huber and Vonta (2004) considered a frailty model to take into account a possible heterogeneity among the population. Shen (2014) extended the approaches of Pan (2000) and Heller (2011) to accommodate left-truncation.

In this article, we consider an experiment with mixed case interval-censored model (Shick and Yu, 2000), where the number of inspection time is random and a subject is evaluated at random successive inspection times. Let N denote the random number of examination times. Given N=k, let X={Xk,j:k=1,2,,j=1,,k} be a random array of intervisit time. Define Uk,1=V+Xk,1,Uk,2=Xk,2+Uk,1,Uk,3=Xk,3+Uk,2,,Uk,k=Uk,k1+Xk,k such that U={Uk,j:k=1,2,,j=1,,k} is an array of random variables with Uk,1<<Uk,k. On the event N=k, if T falls in the interval [Uk,l,Uk,l+1](l=1,,k1), then let [L,R]=[Uk,l,Uk,l+1] and γ=1. If T<Uk,1 then let [L,R]=[V,Uk,1] and δ=1, i.e.  T is subject to left censoring. If T>Uk,k, then let [L,R]=[Uk,k,] and 1γδ=1, i.e., T is subject to right censoring. For left-truncated and interval censored (LTIC) data, one observes nothing if T<V, and observes (L,R,V,δ,γ) if TV. In Section  2, we consider conditional maximum likelihood estimator (cMLE) for the Cox model with LTIC data. We show that when the covariates are discrete the cMLE is the MLE. Under some regularity conditions, we establish asymptotic properties of the cMLE according to results of Huang and Wellner (1995). In Section  3, a simulation study is conducted to investigate the finite performance of the cMLE.

Section snippets

The conditional MLE

We assume that T, L, R and V are all continuous and for each individual, data is available on a p×1 column covariate vector Z. Suppose the support of T and V does not depend on Z and let aF and bF denote the left and right support of T. Similarly, define aG and bG for V. Throughout this article we assume that aG=aF=0 and bGbF. Under this assumption, F(t)=P(Tt) and G(t)=P(Vt) are identifiable (Woodroofe, 1985). Let (L1,R1,V1,δ1,γ1,Z1),,(Ln,Rn,Vn,δn,γn,Zn) denote the LTIC data,

Simulation results

In this section, we conduct simulation studies to evaluate the finite performance of the cMLE βˆn. The lifetime follows the proportional hazards model with λ(t)=t and β=(β1=2,β2=3)T. The resulting T has the survivorship function P(T>t|Z1,Z2)=exp(et2Z13Z2), where Z1 is an ordinal variable with P(Z1=i)=0.25 for i=1,2,3,4 and Z2 is a Bernoulli random variable with probability 0.5. The V is exponential distributed with mean θg, i.e., G(v)=1exp(v/θg) for v>0. The values of θg are

Applications

The CDC AIDS Blood Transfusion Data described in Kalbfleish and Lawless (1989) were retrospectively ascertained for all transfusion-associated AIDS cases in which the diagnosis of AIDS occurred prior to the end of the study, τe, which was June 30, 1991. The data are subject to left truncation since HIV was unknown prior to 1982, any cases of transfusion-related AIDS before July 1, 1982 (τ0) were not included in the data. To introduce interval censoring, we artificially generate

Conclusion

We consider conditional maximum likelihood estimation for the proportional hazard models with LTIC data. When covariates are discrete, we showed that the cMLE is the MLE. We establish the asymptotic properties of the cMLE of the regression parameters. In some situation, the distribution function of truncation variables V can be parameterized as G(x;θ). Further research is required to develop a more efficient estimator for this case.

Acknowledgments

The author would like to thank the associate editor and referees for their helpful and valuable comments and suggestions.

References (22)

  • A. Alioum et al.

    A proportional hazards model for arbitrarily censored and truncated data

    Biometrics

    (1996)
  • P.K. Andersen et al.

    Statistical Models Based on Counting Processes

    (1993)
  • R.A. Betensky et al.

    A local likelihood proportional hazards model for interval censored data

    Stat. Med.

    (2002)
  • D. Cox

    Regression models and life tables (with Discussion)

    J. R. Stat. Soc. Ser. B

    (1972)
  • D.M. Finkelstein

    A proportional hazards model for interval-censored failure time data

    Biometrics

    (1986)
  • E. Goetghebeur et al.

    Semiparametric regression analysis of interval censored data

    Biometrics

    (2000)
  • G. Heller

    Proportional hazards regression with interval censored data using an inverse probability weight

    Lifetime Data Anal.

    (2011)
  • J. Huang

    Efficient estimation for the proportional hazards model with interval censoring

    Ann. Statist.

    (1996)
  • J. Huang et al.

    Efficient estimation for the proportional hazards model with case 2 interval censoring, Tech. Rept

    (1995)
  • C. Huber et al.

    Frailty models for arbitrarily censored and truncated data

    Lifetime Data Anal.

    (2004)
  • J.D. Kalbfleish et al.

    Inferences based of retrospective ascertainment: an analysis of the data on transfusion related AIDS

    J. Amer. Statist. Assoc.

    (1989)
  • Cited by (0)

    View full text