1 Introduction

This paper discusses regression analysis of current status data, a type of failure time data that arise when each study subject is observed only once (Jewell and van der Laan 1995; Sun 2006). In this case, the failure time of interest is known only to be either smaller or greater than the observation time, or is either left- or right-censored (Kalbfleisch and Prentice 2002). One field that usually yields current status data is cross-sectional studies and in these situations, the failure time of interest can be the time to the first marriage or birth, the time under unemployment, or the time to a certain disease. Other or more specific fields that often produce current status data include demographic studies, epidemiological surveys, social sciences and tumorigenicity experiments (Huang 1996; Lin et al. 1998; Sun 2006). Sometimes current status data are also referred to as case I interval-censored data.

Many procedures have been developed in the literature for regression analysis of current status data under various regression models (Chen et al. 2009; Finkelstein and Schoenfeld 1989; Huang 1996; Jewell and van der Laan 1995; Lin et al. 1998; Rossini and Tsiatis 1996, Sun 1999). For example, Huang (1996), Rossini and Tsiatis (1996), and Lin et al. (1998) considered the problem under the proportional hazards model, the proportional odds model and the additive hazards model, respectively, and Chen et al. (2009) also discussed the regression problem but for multivariate situations. A common feature of the methods mentioned above is that they apply only to independent current status data, meaning that the failure time of interest and the observation time are independent given covariates. Otherwise the data are usually referred to as dependent current status data and some methods have also been established for them too. Among others, for example, Zhang et al. (2005) and Chen et al. (2012) investigated regression analysis of dependent current status data arising from the additive hazards model and the linear transformation model, respectively, by using the latent variable approach.

For all of the statistical methods described above, a common assumption is that all study subjects will experience or are susceptible to the failure event of interest. That is, there exists a finite failure time variable associated with each subject that may or may not be observed exactly. In reality, however, there may exist some situations where the study population consists of both a susceptible subpopulation and a nonsusceptible subpopulation and only individuals in the susceptible subpopulation will experience the failure event of interest (Farewell 1982; Liu and Shen 2009; Lu and Ying 2004). A common type of examples is tumorigenicity studies or human cancer studies where the study subjects may never experience the type of tumors under the study. Such a tumorigenicity study is described in details below. For the analysis of such data, a cure model is often used and in this case, we usually say that a subject is cured if it is not susceptible to the failure event such as a disease. In practice, it is apparent that a subject’s cured status would be known if he or she is observed to experience the failure event or his or her failure time is known. On the other hand, for the subjects whose failure times have not been observed, the cured status will be unknown.

In addition to the references mentioned above, many other authors have discussed the situations with the presence of a cured subgroup and/or dependent censoring (Fang et al. 2005; Lam and Xue 2005; Li et al. 2007; Ma 2009, Othus et al. 2007). In particular, Li et al. (2007) and Othus et al. (2007) considered the fitting of right-censored failure time data to cure models in the presence of dependent censoring. Also some authors have pointed out that the omitting of the existence of a cured subgroup could yield misleading inference such as an overestimation of the survival of non-cured subjects (Rondeau et al. 2011) and the ignoring of the dependent censoring could cause serious estimation bias (Huang and Wolfe 2002). Moreover, sometimes it may be of great biomedical interest to estimate the probability of cure. Of course, in practice, it is important and may not be easy to identify the existence of a cure subgroup. One common approach is to look at the survival curve and the one with a long and stable non-zero plateau at the tail may be taken as the evidence for the existence (Rondeau et al. 2011; Chen 2013). More discussion on these will be given in the application below.

In the following, we will present a maximum likelihood estimation procedure for regression analysis of dependent current status data in the presence of a cured subgroup. Before presenting the procedure, we will first introduce some notation and models and describe the likelihood function in Sect. 2. Here we will focus on the data arising from the proportional hazards model and employ the latent variable approach to describe the dependent censoring. A sieve maximum likelihood estimation procedure will then be described in Sect. 3 and in the procedure, Bernstein polynomials will be used to approximate unknown functions. For the determination of the proposed estimators, an EM algorithm will be developed and the asymptotic properties of the estimators are established. Section 4 will present some results obtained from an extensive simulation study conducted to assess the performance of the proposed methodology and an illustrative example is provided in Sect. 5. The article will conclude with some discussion and remarks in Sect. 6.

2 Assumptions, models and likelihood function

Consider a failure time study in which there may exist a cured subgroup. Let T denote the failure time of interest and suppose that each subject is observed only once at time C and there exists a vector of covariates denoted by X. That is, T is either left- or right-censored and we only have current status data. To describe the possible cured subgroup, define the cure indicator variable \(U = 0\) if the subject is cured and nonsusceptible and \(U = 1\) otherwise, and suppose that we can write T as

$$\begin{aligned} T \, = \, U \, T^* \, + \, ( 1 - U ) \, \infty \, , \end{aligned}$$

where \(T^*<\infty \) denotes the failure time of a susceptible subject. It is apparent that covariates may have some effects on U also and for this, we will assume that U follows the logistic model

$$\begin{aligned} P ( U = 1 | Z ) = \frac{\exp (Z ' \alpha )}{1 + \exp ( Z ' \alpha )} \end{aligned}$$
(1)

(Farewell 1982). Here Z denotes the vector of covariates that may have effects on U, which may be the same as, a part of or different from X, and \(\alpha \) denotes a vector of regression parameters.

In the following, by following Chen et al. (2002), we will define \(\pi (Z) = P ( U = 1 | Z )\) as a function of Z. Note that under the assumptions, the survival function of the entire population can be written as

$$\begin{aligned} S(t)= & {} 1-\pi (Z)+\pi (Z)S(t|U=1) \, , \end{aligned}$$

where \(S{(t|U=1)}\) denotes the survival function of susceptible subpopulation. It follows that

$$\begin{aligned} S(t)= & {} 1 - \pi (Z) + \pi (Z)S(t|U=1) \ge (1-\pi (Z)) S(t|U=1) + \pi (Z)S(t|U=1)\\= & {} S(t|U=1) \, . \end{aligned}$$

In other words, the survival estimation omitting the presence of cured subpopulation would result in \(\hat{S}(t)\) that may overestimate the survival of susceptible subpopulation \(S (t|U=1)\).

In the following we will suppose that \(T^*\) and C may be related. In other words, we have dependent current status data. To describe covariate effects and the relationship between \(T^*\) and C, we will assume that there exists a latent variable b with mean zero and variance \(\sigma ^2\) and given X and b, the cumulative hazard functions of \(T^*\) and C have the forms

$$\begin{aligned} \Lambda _{T} (t) \, \exp (X ' \beta + b ) \end{aligned}$$
(2)

and

$$\begin{aligned} \Lambda _{C} (t) \, \exp (X ' \gamma + b ) \, , \end{aligned}$$
(3)

respectively. In the above, \(\Lambda _{T} (t)\) and \(\Lambda _{C} (t)\) denote unknown cumulative baseline functions, \(\beta \) and \(\gamma \) are the vectors of regression parameters. That is, both \(T^*\) and C follow the proportional hazards frailty model (Kalbfleisch and Prentice 2002).

Suppose that the study consists of n independent subjects and there exists an noninformative censoring time \(C^*\), such as the administrative stopping time, that is independent of both T and C. Define the observed censoring time \(\tilde{C}\) = min(\(C,C^*\)), \(\delta = I ( T \le \tilde{C} )\), the observed censoring indicator, and \(\rho \) = I (\(C\le C^*\)), the administrative censoring indicator. For \(i = 1 ,\ldots , n\), let \(T_i\), \(\tilde{C}_i\), \(b_i\), \(\delta _i\), \(\rho _i\), \(X_i\) and \(Z_i\) be defined as T, \(\tilde{C}\), b, \(\delta \), \(\rho \), X and Z, respectively, but associated with subject i. Assume that the observed data have the form \(\{ \, O_i =( \tilde{C}_i , \delta _i , \rho _i , X_i , Z_i) , \; i = 1 ,\ldots , n \, \}.\)

To derive the likelihood function, let \(\lambda _{C} (t)\) denote the first derivative of \(\Lambda _{C} (t)\) and \(f ( b_i ;\sigma ^2)\) the density function of the \(b_i\)’s, which is assumed to be known up to \(\sigma ^2\). Denote the survival functions of \(T^*\) and C by \(S_t(t)\) and \(S_c(t)\), respectively, and let the density function of C be \(f_c(t)\). For the derivation of the likelihood function, note the following facts. For the subjects whose have failed before the observed censoring time \(\tilde{C}_i\) (\(\delta _i=1\)), the subject has to be non-cured and thus gives the likelihood contribution \(\pi (Z_i)(1-S_t(\tilde{C}_i))\). If the failure has not been observed by the observed censoring time \(\tilde{C}_i\) (\(\delta _i=0\)), the corresponding subject could be either non-cured but has survived up to time \(\tilde{C}_i\) or cured with survival probability being 1 all the time. In this case, the likelihood contribution would be \(\pi (Z_i)S_t(\tilde{C}_i)+1-\pi (Z_i)\). It follows that the full likelihood function of \(\theta = (\alpha , \beta , \gamma , \sigma ^2 , \Lambda _T(\cdot ) , \Lambda _C (\cdot ))\) has the form \(L_f ( \theta ) = \prod _{i=1}^n L_{f} (\theta |O_i)\), where

$$\begin{aligned} L_{f} (\theta |O_i)= & {} \int _{-\infty }^\infty L_{f} (\theta |O_i, b_i)f( b_i ;\sigma ^2) d b_i\\= & {} \int _{-\infty }^\infty \left\{ \pi (Z_i)(1-S_t(\tilde{C}_i))\right\} ^{\delta _i} \left\{ 1-\pi (Z_i)\right. \\&\left. +\, \pi (Z_i)S_t(\tilde{C}_i) \right\} ^{1-\delta _i} \left\{ f_c(\tilde{C}_i)\right\} ^{\rho _i} \left\{ S_c(\tilde{C}_i)\right\} ^{1-\rho _i} f( b_i ;\sigma ^2) d b_i\\= & {} \int _{-\infty }^\infty \left\{ \pi (Z_i)\left[ 1 - \exp ( - \Lambda _{T}(\tilde{C}_i) \exp (X_i ' \beta + b_i))\right] \right\} ^{\delta _i}\\&\times \left\{ 1 - \pi (Z_i) + \pi (Z_i) \exp ( - \Lambda _{T}(\tilde{C}_i) \exp (X_i ' \beta + b_i)) \right\} ^{1-\delta _i}\\&\times \exp ( - \Lambda _{C}(\tilde{C}_i) \exp (X_i ' \gamma +b_i )) (\exp ( X_i' \gamma + b_i) \lambda _{C} (\tilde{C}_i))^{\rho _i} f( b_i ;\sigma ^2) \, d b_i \, . \end{aligned}$$

In the next section, we will discuss estimation of the parameter \(\theta \).

3 Inference procedure

3.1 Sieve maximum likelihood estimation

To estimate the unknown parameter \(\theta \), it is apparent that it would be natural to maximize the likelihood function \(L_f ( \theta )\). On the other hand, it is well-known that this may not be straightforward due to the dimensions of \(\Lambda _{T} (t)\) and \(\Lambda _{C} (t)\). For this, by following Huang and Rossini (1997) and others, we propose to approximate \(\Lambda _{T} (t)\) and \(\Lambda _{C} (t)\) by the Bernstein polynomials over \(I = [ a , b ]\), where a and b denote the lower and upper bounds of the observation times \(\tilde{C}_i\)’s. Denote the parameter space of \(\theta \) by \(\Theta \) and define

$$\begin{aligned} \Theta _n \, = \, \Big \{\theta _n=(\alpha ,\beta ,\gamma ,\sigma ^2,\varphi _n): \varphi _n = ( \Lambda _{1n} , \Lambda _{2n})\Big \} = {\mathcal {B}}\otimes {\mathcal {M}}_n^1\otimes {\mathcal {M}}_n^2 \, , \end{aligned}$$

where \({\mathcal {B}} = \left\{ (\alpha ',\beta ',\gamma ',\sigma ^2)' \in R^{2p+q+1} ,\Vert \alpha \Vert +\Vert \beta \Vert +\Vert \gamma \Vert +|\sigma ^2|\le M\right\} \),

$$\begin{aligned} {\mathcal {M}}_{n}^1= & {} \Big \{\Lambda _{1n}(t)=\sum _{j=0}^{m}\exp (\xi _j) B_j(t,m,a,b), \max _{0\le i\le m}\exp (\xi _j)\le M_n\Big \}, \\ {\mathcal {M}}_{n}^2= & {} \Big \{\Lambda _{2n}(t)=\sum _{j=0}^{m}\exp (\eta _j) B_j(t,m,a,b), \max _{0\le i\le m}\exp (\eta _j)\le M_n\Big \}, \end{aligned}$$

In the above, p denotes the dimension of \(\beta \) and \(\gamma \), q the dimension of \(\alpha \), the \(\xi _i\)’s and \(\eta _i\)’s are unknown parameters to be estimated, and

$$\begin{aligned} B_j(t,m,a,b) = \genfrac(){0.0pt}0{m}{j}\left( {\frac{t-a}{b-a}}\right) ^j \left( 1-{\frac{t-a}{b-a}}\right) ^{m-j} \end{aligned}$$

with m representing the degree of the Bernstein polynomials which is usually chosen as \(m = O( n^\nu )\) for some \(0<\nu <1/2,\) \(M_n=O(n^{a}),\) \(0<a<1/2\). Then it can be shown that \(\Theta _n=\mathcal {B}\times {\mathcal {M}}^1_n\times {\mathcal {M}}_n^2\) can be used as a sieve space of \(\Theta \) (Lorentz 1953), and thus it is natural to define the estimator \(\hat{\theta } = (\hat{\alpha }, \hat{\beta }, \hat{\gamma }, \hat{\sigma }^2, \hat{\Lambda }_{1n} (\cdot ), \hat{\Lambda }_{2n} (\cdot ) )\) as the value of \(\theta \) that maximizes the log likelihood function \(l ( \theta )\) over \(\Theta _n\).

In the following, we will first establish the asymptotic properties of \(\hat{\theta }\) and then discuss its determination. To describe the asymptotic properties, for any \(\theta _j^* = (\alpha _j^* , \beta _j^* , \gamma _j^* , \sigma _j^{2 *} , \Lambda _{T 1}^* (\cdot ) , \Lambda _{C j}^* (\cdot )) \in \Theta , j = 1 , 2 ,\) define the distance

$$\begin{aligned} d ( \theta _1^* , \theta _2^* )= & {} \Vert \alpha _1^* - \alpha _2^* \Vert + \Vert \beta _1^* - \beta _2^* \Vert + \Vert \gamma _1^* - \gamma _2^* \Vert + |\sigma _j^{2 *} - \sigma _j^{2 *} | + \Vert \Lambda _{T1}^* \\&-\Lambda _{T2}^* \Vert _2 + \Vert \Lambda _{C1}^* - \Lambda _{C2}^* \Vert _2 , \end{aligned}$$

and the \(L_2(P)\) norm \(\Vert g(Y)\Vert _2 = (\int |g|^2 dP)^{1/2}\) for a function g and a random variable Y with probability measure P. Let \(\theta _0 = ( \alpha _0,\beta _0,\gamma _0,\sigma _0^2,\Lambda _{T0}, \Lambda _{C0})\) denote the true value of \(\theta \). We will first describe the consistency and convergence rate of \(\hat{\theta }\) and then the asymptotic normality of \(\hat{\alpha },\) \(\hat{\beta },\) \(\hat{\gamma }\) and \(\hat{\sigma }^2.\) All limits below are taken as \(n \rightarrow \infty \).

Theorem 1

Suppose that Conditions A1–A4 described in the Appendix hold. Then \(\hat{\alpha },\) \(\hat{\beta },\) \(\hat{\gamma }\) and \(\hat{\sigma }^2\) are strong consistent estimators of \(\alpha _{0},\) \(\beta _{0},\) \(\gamma _0\) and \(\sigma _0^2\), respectively, and furthermore, we have

$$\begin{aligned} \Vert \hat{\Lambda }_{1n} - \Lambda _{T0}\Vert _2{\longrightarrow } \, 0 \; , \; \; \Vert \hat{\Lambda }_{2n}-\Lambda _{C0}\Vert _2{\longrightarrow } \, 0 \end{aligned}$$

almost surely.

Theorem 2

Suppose that Conditions A1-A4 described in the Appendix hold. Then we have

$$\begin{aligned} \Vert \hat{\Lambda }_{1n}-\Lambda _{T0}\Vert _2 + \Vert \hat{\Lambda }_{2n}-\Lambda _{C0}\Vert _2 = O_p(n^{-(1-\nu )/2}+n^{-r/2}) \, . \end{aligned}$$

Theorem 3

Assume that Conditions A1 - A5 described in the Appendix hold. Then we have

$$\begin{aligned} n^{1/2} \, ( (\hat{\alpha } \, - \, \alpha _0)', (\hat{\beta } \, - \, \beta _0)',(\hat{\gamma } \, - \, \gamma _0)',(\hat{\sigma }^2-\sigma _0^2) )' \, \rightarrow \, N ( 0 , \Sigma ) \end{aligned}$$

in distribution and furthermore, \((\hat{\alpha }',\hat{\beta }',\hat{\gamma }',\hat{\sigma }^2)'\) is semiparametrically efficient, where \(\Sigma \) is given in the Appendix.

The proofs of the theorems above are sketched in the Appendix. In the next subsection, we will discuss the determination of \(\hat{\theta }\) and present an EM algorithm.

3.2 EM algorithm

To determine \(\hat{\theta }_n\), we will employ or develop an EM algorithm. For this, first note that if the \(b_i\)’s were observed, the pseudo-complete data likelihood function based on \(\{ (O_i , b_i ) , \; i=1,\ldots ,n\}\) would be

$$\begin{aligned} L_c (\theta ) = L_1 (\theta _1) \, L_2 (\theta _2) \, L_3 ( \sigma ^2 ) \, . \end{aligned}$$

In the above, \(\theta _1 = ( \alpha , \beta , \Lambda _T (\cdot ) )\), \(\theta _2 = ( \gamma , \Lambda _C (\cdot ) )\)

$$\begin{aligned} L_1(\theta _1)= & {} \prod _{i=1}^n\{\pi (Z_i)[1-\exp (-\Lambda _{T}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))]\}^{\delta _i}\\&\times \{1-\pi (Z_i)+\pi (Z_i)\exp (-\Lambda _{T}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\}^{1-\delta _i} \, ,\\ L_2(\theta _2)= & {} \prod _{i=1}^n \exp (-\Lambda _{C}(\tilde{C}_i) \exp (X_i^\tau \gamma +b_i))(\exp (X_i^\tau \gamma +b_i)\lambda _{C}(\tilde{C}_i))^{\rho _i} \, , \end{aligned}$$

and \(L_3 (\sigma ^2) = \prod _{i=1}^n f (b_i;\sigma ^2)\).

Let \(\hat{\theta }^{(k)}\) denote the estimator of \(\theta \) obtained after the k-th iteration. Note that for the \((k+1)\)th iteration, in the E-step, we need to computer

$$\begin{aligned} E_b \{\log L_c(\theta )| O_i 's ,\hat{\theta }^{(k)}\}= & {} E_b \left\{ \log L_1(\theta _1)| O_i 's,\hat{\theta }^{(k)} \right\} \\&+\, E_b \left\{ \log L_2(\theta _2)| O_i 's ,\hat{\theta }^{(k)} \right\} {+} E_b \left\{ \log L_3(\sigma ^2)| O_i 's ,\hat{\theta }^{(k)}\right\} , \end{aligned}$$

where \(E_b \{ \, \log L_c(\theta )| O_i 's ,\hat{\theta }^{(k)} \, \}\) denotes the conditional expectation of \(\log L_c(\theta )\) with respect to b given \( O_i 's\) and the current parameter value \(\hat{\theta }^{(k)}.\) In the M-step, we need to maximize

$$\begin{aligned} E_b \{ \, \log L_1(\theta _1) | O_i 's , \hat{\theta }^{(k)} \, \} \; , \; E_b \{ \, \log L_2(\theta _2) | O_i 's ,\hat{\theta }^{(k)} \, \} \; , \; E_b \{ \, \log L_3(\sigma ^2) | O_i 's ,\hat{\theta }^{(k)} \, \} \end{aligned}$$

with respect to \(\theta _1\), \(\theta _2\) and \(\sigma ^2\), respectively. For all expectations above, it is clear that they do not have closed forms and for this, by following Chen et al. (2009) and others, we can employ the Monte Carlo method. Specifically, for any arbitrary function \(h(b_i)\) and sufficiently large L, we can approximate the conditional expectation \(E_b \{ \, h (b_i) | O_i 's , \hat{\theta }^{(k)} \, \}\) by

$$\begin{aligned} \frac{\int _{-\infty }^\infty h(b_i)\Psi _i(b_i;\hat{\theta }^{(k)}) f(b_i;(\hat{\sigma }^{(k)})^2)db_i}{\int _{-\infty }^\infty \Psi _i(b_i;\hat{\theta }^{(k)}) f(b_i;(\hat{\sigma }^{(k)})^2)db_i} \, \approx \, \frac{ L^{-1} \sum _{i=1}^L h(b_{il})\Psi _i(b_{il};\hat{\theta }^{(k)}) }{ L^{-1} \sum _{i=1}^L \Psi _i(b_{il};\hat{\theta }^{(k)})} \, . \end{aligned}$$
(4)

In the above,

$$\begin{aligned} \Psi _i(b_i;\theta )= & {} \{\pi (Z_i)[1-\exp (-\Lambda _{T}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))]\}^{\delta _i}\\&\times \{1-\pi (Z_i)+\pi (Z_i)\exp (-\Lambda _{T}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\}^{1-\delta _i}\\&\times \exp (-\Lambda _{C}(\tilde{C}_i)\exp (X_i^\tau \gamma +b_i))(\exp (X_i^\tau \gamma +b_i)\lambda _{C}(\tilde{C}_i))^{\rho _i} \, , \end{aligned}$$

and \(\{b_{il},l=1,\ldots ,L\}\) are generated from the density function \(f ( b_i ; \sigma ^{2 (k)} )\).

In the M-step, to obtain the updated estimator of \(\theta _1\) in the k-th iteration, we need the following score functions

$$\begin{aligned} S_\alpha (\theta _1)= & {} \frac{\partial E_b \{ \log L_1(\theta _1)|O,\hat{\theta }^{(k)} \} }{ \partial \alpha } = \sum _{i=1}^n \, \left\{ \delta _i \frac{Z_i\exp ({Z_i}^\tau \alpha )}{\pi (Z_i)(1 +\exp ({Z_i}^\tau \alpha ))^2} \right. \\&\left. +\, (1-\delta _i)\frac{Z_i\exp ({Z_i}^\tau \alpha )}{(1+\exp ({Z_i}^\tau \alpha ))^2} E_{b_i}\big [\phi _{i 2}(b_i;\theta _1)|O_i,\hat{\theta }^{(k)}\big ] \right\} \, ,\\ S_\beta (\theta _1)= & {} \frac{\partial E_b \{ \log L_1(\theta _1)|O,\hat{\theta }^{(k)} \} }{\partial \beta } = \sum _{i=1}^n E_{b_i} \left\{ \phi _{i 1}(b_i;\theta _1)|O_i,\hat{\theta }^{(k)} \, \right\} \, , \end{aligned}$$

and

$$\begin{aligned} S_{\xi _j}(\theta _1) = \frac{\partial E_b \{ \log L_1(\theta _1)|O,\hat{\theta }^{(k)} \} }{\partial \xi _j} = \sum _{i=1}^n E_{b_i} \left\{ \phi _{i3}(b_i;\theta _1)|O_i,\hat{\theta }^{(k)} \right\} \end{aligned}$$

for \(j = 0 , 1,\ldots , m\). In the above,

$$\begin{aligned}&\phi _{i 1} (b_i;\theta _1)=\delta _i\frac{X_i\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i)}{1-\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))}\\&\quad +\, (\delta _i-1)\frac{\pi (Z_i)X_i\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\Lambda _{1n}(\tilde{C}_i) \exp (X_i^\tau \beta +b_i)}{1-\pi (Z_i)+\pi (Z_i)\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))} \, ,\\&\phi _{i 2} (b_i;\theta _1)=\frac{\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))-1}{1-\pi (Z_i)+\pi (Z_i)\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))} \, , \end{aligned}$$

and

$$\begin{aligned}&\phi _{i3} (b_i;\theta _1)=\delta _i\frac{\exp (\xi _j)B_i(\tilde{C}_i,m,a,b)\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\exp (X_i^\tau \beta +b_i)}{1-\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))}\\&\quad +\, (\delta _i-1)\frac{\pi (Z_i)\exp (\xi _j)B_i(\tilde{C}_i,m,a,b)\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))\exp (X_i^\tau \beta +b_i)}{1-\pi (Z_i)+\pi (Z_i)\exp (-\Lambda _{1n}(\tilde{C}_i)\exp (X_i^\tau \beta +b_i))} \, . \end{aligned}$$

For the determination of the updated estimators of \(\gamma \) and \(\Lambda _C (t)\) in the M-step, we need the following score functions

$$\begin{aligned} S_\gamma (\theta _2)= & {} \frac{\partial E_b \{ \log L_2(\theta _1)|O,\hat{\theta }^{(k)} \} }{\partial \gamma }\\= & {} \sum _{i=1}^n \Bigg [\rho _i X_i - E_{b_i}(\exp (b_i)|O_i, \hat{\theta }^{(k)}) X_i \exp (X_i^\tau \gamma ) \Lambda _{2n}(\tilde{C}_i) \Bigg ] \, , \end{aligned}$$

and

$$\begin{aligned} S_{\eta _j} (\theta _2)= & {} \frac{\partial E_b \{ \log L_2(\theta _2)|O,\hat{\theta }^{(k)} \} }{\partial {\eta _j}} =\sum _{i=1}^n \Bigg [ \rho _i \frac{\exp (\eta _j)B'_j(\tilde{C}_i,m,a,b)}{\sum _{j=0}^m \exp (\eta _j)B'_j(\tilde{C}_i,m,a,b)}\\&-E_{b_i} (\exp (b_i)|O_i, \hat{\theta }^{(m)})\exp (X_i^\tau \gamma ) \exp (\eta _j) B_j (\tilde{C}_i,m,a,b) \Bigg ] \, , \end{aligned}$$

where \(B'_j(t,m,a,b)\) denotes the first order derivative of \(B_j(t,m,a,b)\) for \(j = 0 , 1,\ldots , m\). Note that in the sieve space \(\Theta _n\), \(\lambda _C (t)\) has the form

$$\begin{aligned} \lambda _{2n} (t)=\sum _{j=0}^m \exp (\eta _j)B'_j(t,m,a,b) \, . \end{aligned}$$

Finally in the M-step, the updated estimator of \(\sigma ^2\) has a closed form

$$\begin{aligned} \hat{\sigma }^{2^{(k+1)}}=\frac{1}{n}\sum _{i=1}^n E_{b_i}(b_i^2|O_i,\hat{\theta }^{(k)}) \, . \end{aligned}$$
(5)

By combining all discussions above, the EM algorithm can be described as follows.

  • Step 0. Choose m and initial values for all parameters or \(\hat{\theta }^{(0)}\).

  • Step 1. At the \((k+1)\)th iteration, generate a random sample \(\{ \, b_{il} , \; l = 1 ,\ldots , L , i = 1 ,\ldots , n \, \}\) from \(f ( b ; (\hat{\sigma }^{(k)})^2)\).

  • Step 2. Calculate the five conditional expectations \(E_{b_i} ( \exp \{b_i\} | O_i 's , \hat{\theta }^{(k)})\), \(E_{b_i} ( b_i^2 | O_i 's , \hat{\theta }^{(k)})\), \(E_{b_i} ( \phi _{i1} ( b_i ; \theta _1 ) | O_i 's , \hat{\theta }^{(k)})\), \(E_{b_i} ( \phi _{i2} ( b_i ; \theta _1 ) | O_i 's , \hat{\theta }^{(k)})\) and \(E_{b_i} ( \phi _{i3} ( b_i ; \theta _1 ) | O_i 's , \hat{\theta }^{(k)})\) at \(\theta = \hat{\theta }^{(k)}\).

  • Step 3. Determine the updated estimators \(\hat{\alpha }^{(k+1)}\) and \(\hat{\beta }^{(k+1)}\) by solving the equations \(S_\alpha ( \theta _1) = 0\) and \(S_\beta ( \theta _1) = 0\) with \(\xi _j = \hat{\xi }_j^{(k)}\), \(j = 0 , 1 ,\ldots , m\).

  • Step 4. Determine the updated estimators \(\hat{\xi }_j^{(k+1)}\)’s by solving the equations \(\{ \, S_{\xi _j} ( \theta _1 ) = 0 \, \}\) with \(\alpha = \hat{\alpha }^{(k+1)}\) and \(\beta = \hat{\beta }^{(k+1)}\).

  • Step 5. Determine the updated estimator \(\hat{\gamma }^{(k+1)}\) by solving the equation \(S_\gamma (\theta _2) = 0\) with \(\eta _j = \hat{\eta }_j^{(k)}\), \(j = 0 , 1 ,\ldots , m\).

  • Step 6. Determine the updated estimators \(\hat{\eta }_j^{(k+1)}\)’s by solving the equations \(\{ \, S_{\eta _j} ( \theta _2 ) = 0 \, \}\) with \(\gamma = \hat{\gamma }^{(k+1)}\).

  • Step 7. Determine the updated estimator \((\hat{\sigma }^{(k+1)})^2\) by the estimator given in (5).

  • Step 8. Repeat Steps 1–7 until the convergence.

To implement the EM algorithm described above, one needs to pay attention to several issues. One is the methods to be used for solving the equations involved in Steps 3–7 and for this, various algorithms can be employed. In the numerical studies below, we applied the Newton-Raphson algorithm in Steps 3 and 5 and the Broyden–Fletcher–Goldfard–Shanno method (Fletcher 1987) in Steps 4 and 6. The selection of initial values is another important issue and for this, we employed the following procedure that temporarily treats the current status data as right-censored data and ignores the dependent censoring. More specifically, for obtaining the initial values for \(\alpha \), \(\beta \) and \(\xi _j\), \(j=0,1,\ldots ,m\), we apply the method given in the data. The initial values for \(\alpha \) and \(\beta \) are set to be the corresponding maximum likelihood estimators and the initial values for \(\xi _j\) are obtained by fitting the estimated cumulative baseline hazard function of T using Bernstein polynomials. Similarly the initial value of \(\gamma \) is taken to be the maximum partial likelihood estimator and the initial values of the \(\eta _j\)’s are obtained by fitting the Nelson–Aalen estimator for the cumulative baseline hazard function of C using Bernstein polynomials.

For the convergence, we employed two criteria. One is to check \(\Vert \hat{\alpha }^{(k+1)} - \hat{\alpha }^{(k)}\Vert \le \epsilon \), \(\Vert \hat{\beta }^{(k+1)} - \hat{\beta }^{(k)}\Vert \le \epsilon \), \(\Vert \hat{\xi _j}^{(k+1)} - \hat{\xi _j}^{(k)}\Vert \le \epsilon \), \(\Vert \hat{\gamma }^{(k+1)}-\hat{\gamma }^{(k)}\Vert \le \epsilon \), \(\Vert \hat{\eta _j}^{(k+1)} - \hat{\eta _j}^{(k)}\Vert \le \epsilon \), and \(| \hat{\sigma }^{(k+1)} - \hat{\sigma }^{(k)} |^2 \le \epsilon \) for a given positive number \(\epsilon \), \(j = 0 , 1 ,\ldots , m\). The other is to stop the EM algorithm when the standard errors of the \(\alpha ^{(k)}\)’s, \(\beta ^{(k)}\)’s, \(\xi _{j}^{(k)}\)’s, \(\gamma ^{(k)}\)’s, \(\eta _{j}^{(k)}\)’s and \(\sigma ^{(k)}\)’s from 30 consecutive iterations are all smaller than \(\epsilon _1\), also a given positive number. Note that it is well known that the Monte Carlo method adopted for the approximation of the condition expectation in (4) could cause approximation errors (Chen et al. 2012). Also we observed that the estimators may not converge completely but rather fluctuate within a small range and these motivated the use of the two criteria above together.

A third issue for the implementation of the EM algorithm above is the selection of m, the degree of the Bernstein polynomials which controls the smoothness of the approximation. It is apparent that a simple approach is to try several different values that are in the order \(O(n^\nu )\) and compare the obtained results. By following the BIC criterion for model selection (Burnham and Anderson 2002), an alternative is to choose the value of m that minimizes

$$\begin{aligned} \text {BIC}= -2l_n(\hat{\theta }_n)+(m^2 +2m +p) \log n. \end{aligned}$$
(6)

For inference about \(\theta \), of course, one also needs to estimate \(\Sigma \) or the covariance matrix of \(\hat{\theta }_n\). For this, we employ the method given in Louis (1982) and estimate it by the inverse of the matrix

$$\begin{aligned} I(\hat{\theta }) = E_b \left\{ \frac{-\partial ^2 \log L_c (\theta )}{\partial \theta \partial \theta '}|O,\hat{\theta } \right\} -E_b \left\{ \frac{\partial \log L_c (\theta )}{\partial \theta }\cdot \frac{\partial \log L_c (\theta )}{\partial \theta '} |O,\hat{\theta } \right\} \, . \end{aligned}$$

4 A simulation study

Now we present some results obtained from an extensive simulation study conducted to assess the performance of the estimation procedure proposed in the previous sections. In the following, we will focus on the situations where \(X \ne Z\) and b follows the normal distribution. The results for the situation with \(X = Z\) gave similar results. For the generation of the covariates, we considered several distributions including the uniform distribution and the normal distribution, and assumed that they can be either one- or two-dimensional. To generate the data, we first generated the \(X_i\)’s and the \(b_i\)’s and then the \(C_i\)’s from model (3) with \(\Lambda _{C} (t) = 0.01 \, t^2\). of course, before the generation of the failure time \(T_i\), we also need to generate the \(U_i\)’s based on model (1) and generate the \(T_i^*\)’s based on model (2) with \(\Lambda _{T} (t) = 0.05 \, t^2\). The noninformative censoring time \({C}_i^*\) was set to be a constant \(\tau \), which was chosen to give the desired percentage of the right-censored \(C_i\)’s. The results given below are based on \(n = 200\) or 400 and with 1000 replications.

Table 1 Results on estimation of regression coefficients based on the simulated data with one covariate and \(\sigma =0\)
Table 2 Results on estimation of regression coefficients based on the simulated data with one covariate and \(\sigma =0.5\)

Tables 1, 2 and 3 give the results on estimation of regression parameters \(\beta \), \(\alpha \) and \(\gamma \) and variance parameter \(\sigma \) for the single covariate situation with the \(Z_i\)’s following the uniform distribution over ( 0 , 2), \(X_i\)’s following N(0, 1), and \(\alpha _0 = 1\), \(( \beta _0 , \gamma _0 ) = (0,0)\), (0, 0.5), (0.5, 0), or (0.5, 0.5). Table 1 considered the situation with \(\sigma _0 = 0 \), meaning no correlation between T and C, Table 2 studied the situation with \(\sigma _0\) = 0.5, and Table 3 assumed \(\sigma _0 \,=\, 1\). They include the estimated bias (Bias) given by the average of the estimates minus the true value, the sample standard deviation (SSD) of the estimates, the average of the estimated standard errors (SEE), and the 95 % empirical coverage probability (CP). The right-censoring rate (Cr) and the cure proportion (Cure%) for each scenario are also provided. For the degree of the Bernstein polynomials, similar to Zhou et al. (2016), we took \(m = [n^{1/4}]= 3\) for \(n = 200\) and \(m = [n^{1/4}]= 4\) for \(n = 400\).

Table 3 Results on estimation of regression coefficients based on the simulated data with one covariate and \(\sigma =1\)

One can see from the tables that the proposed estimator seems to be unbiased and the variance estimation is reliable as it is close the sample standard deviation. In addition, the 95 % empirical coverage probabilities are close to 0.95, indicating that the normal approximation to the distribution of the proposed estimator appears to be appropriate. Furthermore, as expected, both the bias and the estimated standard error decreased and the 95 % empirical coverage probability became closer to 0.95 as the sample size increased. Also the results seem robust for different magnitude of \(\sigma _0\) and especially it is interesting to note that under the situation that the censoring time is actually independent with the failure time (\(\sigma _0=0\)), the proposed estimation procedure provides robust inference for the regression parameters \(\alpha \), \(\beta \) and \(\gamma \). However, we found that the estimate for \(\sigma \) is sensitive to its initial value and the estimate for \(\sigma \) became unreliable if its initial value are chosen to be large.

Table 4 Results on estimation of regression coefficients based on the simulated data with two covariates and \(\sigma =0.5\)

The results obtained for the two covariate situation are presented in Table 4 with \(\beta _0 = ( 1 , 1.5 ) '\), \(\alpha _0 = ( 1 , 1 ) '\), \(\gamma _0 = ( 1 , 1 ) '\) or \(( 1 , -1 )'\), and \(\sigma _0 = 0.5\). Here we considered the case where \(Z_1\) and \(Z_2\) were generated from the standard normal distribution and the uniform distribution over \(( -1 , 1 )\), respectively, and \(X_1\) and \(X_2\) were generated from the Bernoulli distribution with \(p=0.5\) and the standard normal distribution, respectively. For the degree of the Bernstein polynomials, again we chose \(m = 3\) for \(n = 200\) and \(m = 4\) for \(n = 400\). It is apparent that they gave similar conclusions to those seen in the one covariate scenarios. To see the performance of the proposed method on estimation of the cumulative hazard functions \(\Lambda _T (t)\) and \(\Lambda _C (t)\), Fig. 1 displays the averages of the proposed estimators for the situation of \(\gamma = ( 1 , 1)'\) with \(n = 400\). One can see that they indicate that the proposed estimation approach seems to perform well. We also considered other set-ups, including using different values for m, and obtained similar results.

Fig. 1
figure 1

The estimated cumulative baseline functions with solid lines denoting true functions and dashed lines the estimates

5 An application

Now we apply the methodology proposed in the previous sections to a set of current status data arising from a 2-year tumorigenicity study conducted by the National Toxicology Program. The study consists of groups of 50 male and female F344/N rats and B6C3F1 mice and they were exposed to chloroprene at concentrations of 0, 12.8, 32, or 80 ppm by inhalation, 6 h per day, 5 days per week, for 2 years. Some animals died naturally during the study and others who survived at the end of the study were sacrificed. At the time of death or sacrifice, each animal was examined for the presence or absence of tumors in various organs. Therefore only current status data are available for the tumor occurrence times. In the following, we will focus on a liver tumor, hepatocellular carcinoma, from 200 male and female B6C3F1 mice in the control and the highest dose group. As it is well-known, as most tumors, the liver tumor is between lethal and nonlethal, or its occurrence can have some effects on the death rate. In other words, the tumor onset time (failure time) and death time (censoring time) are correlated and thus we have informative current status data.

Before applying the proposed inference procedure, to check the possible existence of a cured subgroup in the data, we first considered the model for the dependent current status data without incorporating the cure model. Figure 2 presents the estimated marginal survival functions for the time to tumor onset, and one can see that the estimated survival curves for both female and male control groups have a clear non-zero plateau at the tail, indicating the possible existence of a non-susceptible mice subgroup. In other words, it seems to be reasonable and more appropriate to incorporate the cure model for analyzing the data here.

Fig. 2
figure 2

The estimated marginal survival functions for the time to tumor onset assuming no cured subgroup

For the analysis, define \(X_1 = 0\) for female animals and 1 otherwise and \(X_2 = 0\) for the animals in the control group and 1 otherwise. That is, we have two covariates. For the cure model, we assume that it has the form

$$\begin{aligned} P( U=1 | X) = \frac{ \exp ( \alpha _0 + \alpha _1 X_1 + \alpha _2 X_2 )}{ 1 + \exp ( \alpha _0 + \alpha _1 X_1 + \alpha _2 X_2 )} \, . \end{aligned}$$

Table 5 presents the estimated covariate effects as well as the estimated standard errors and the p-values for testing no covariate effects. Here for comparison, we considered several different values for m and the results given in the table were obtained based on \(m = 3\), 4 and 5. By using the BIC criterion defined in (6), \(m = 4\) yields the best result .

Table 5 Results on estimation of the covariate effects for the tumorigenicity study

With respect to the dose effect on tumor growth, the results in Table 5 suggest that the mice in the higher dose group had a significantly higher rate or hazard of developing tumor. The male mice seem to be also more likely than the female mice to develop tumors, however, the effect is not statistically significant. The estimated non-susceptible (cure) probabilities are given in Table 6. They indicate that the animals in the higher dose group seem to be more susceptible to the tumor, but the gender did not seem to have significant effect on the cure rate. One possible reason for this could be that most health mice in normal environment do not develop liver tumor in their whole life. With respect to the dose effect on the animal death rate, the conclusion is similar to that for the dose effect on tumor onset, but the death rate did not seem to have any significant relationship with the gender of the animals. In addition, we obtained the estimate of \(\sigma ^2\) being 0.672, which seems to indicate the existence of some informative censoring. On the other hand, one should be careful about this as was shown by the simulation study, the estimate can be sensitive to the non-testable dependence assumption and initial value.

Table 6 Estimated cure (non-susceptible) probabilities
Fig. 3
figure 3

The estimated marginal survival functions for the time to tumor onset assuming the existence of a cured subgroup

To see the dose and gender effects on tumor onset graphically, Fig. 3 presents the separate, estimated marginal survival functions for the times to tumor onset and they gave similar conclusions. Note that the left plot of Fig. 3 presents the estimated survival functions for the entire population, while the right plot presents the estimated survival functions for the susceptible subpopulation. Here \(\hat{S}(t|U=1)\) is the estimated marginal survival function for \(T^*\) based on model (2) with \(b=0\). By comparing Figs. 2 and 3, one can see that as mentioned above, omitting the cured subgroup resulted in an overestimation of the survival of the susceptible mice.

6 Concluding remarks

In this paper, we discussed regression analysis of current status data in the presence of cured subgroup and informative censoring. For the problem, we employed the mixture cure rate model and frailty model approaches. To estimate regression parameters as well as other parameters, a semiparametric sieve maximum likelihood estimation approach was developed with the use of Bernstein polynomials. Also an EM algorithm was presented for the determination of the proposed estimators. The procedure has the advantage of reducing the dimensionality of the estimation problem, thus relieving the computation burden, and can be easily implemented. In addition, the proposed estimators were shown to be consistent and asymptotically normally distributed, and simulations studies suggested that the method works well for practical situations.

There exist several directions for future research. One is that in the preceding sections, it has been assumed that the failure time of interest follows the proportional hazards model marginally. It is apparent that this may be questionable in some situations and some other models such as the additive hazards model or the linear semiparametric transformation model may be more appropriate. Thus it would be useful to develop similar inference procedures for these models. Another direction for future research is that in the preceding sections, the focus has been on the mixture cure model and sometimes one may prefer promotion time model (Liu and Shen 2009; Zeng et al. 2006), which has the advantage of allowing one to model the underlying failure time variable uniformly. It is apparent that although the idea discussed above may still apply to the latter situation, it may not be easy or straightforward to establish the estimation procedure.

It is well-known that under right censoring, the independence censoring assumption cannot be tested without strong modeling and this is true for the situation discussed here. On the other hand, under the models described above, one may ask if it is possible to test the independence censoring assumption by using the estimator of \(\sigma ^2\). As discussed above, one may be careful about this as the simulation study suggested that in the case of dependent censoring with \(\sigma _0^2 > 0\), the estimator seems to perform well. On the other hand, for the situation of independent censoring with \(\sigma _0^2 = 0\), the estimator seems to be sensitive to the initial value of \(\sigma ^2\). In other words, more work is needed for this or one may want to develop some other more direct procedures for testing the assumption.