The Poisson Generalized Birnbaum-Saunders Cure Model and Application in Breast Cancer Data

The cure rate survival models are generally used to model lifetime data with long term survivors. We assumes the number of competing causes of the event of interest has the Poisson distribution and the time to the event of interest follows the Generalized Birnbaum-Saunders distribution. The Poisson GB-S distribution has been defined and useful representations for its density function have been presented which facilitates obtaining some mathematical properties. The parameters of the model with cure rate have been estimated using the maximum likelihood method. For different sample sizes and censoring percentages, several simulations have been performed and a real data set from the medical area has been analyzed. Citation: Meshkat M, Baghestani AR, Zayeri F (2018) The Poisson Generalized Birnbaum-Saunders Cure Model and Application in Breast Cancer Data. J Biom Biostat 9: 389. doi: 10.4172/2155-6180.1000389


Introduction
Survival analysis is generally defined as a set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest [1]. The time to event or survival time can be measured in days, weeks, years, etc. In survival analysis, subjects are usually followed over a specified time period and the focus is on the time at which the event of interest occurs. A number of models are available to analyze the relationship of a set of predictor variables with the survival time. Methods include parametric, nonparametric and semiparametric approaches. Parametric methods assume that the underlying distribution of the survival times follows certain known probability distributions. Popular ones include the exponential, Weibull, and lognormal distributions.
The Birnbaum-Saunders (BS) distribution has a close relation to the normal distribution and has been applied in a wide variety of fields and distribution is a non-negative support that has attracted considerable interest [2]. This distribution is positively skewed, unimodal and has two parameters [2,3]. Generalized Birnbaum-Saunders (GB-S) distribution is a flexible lifetime model that admits different degrees of kurtosis and asymmetry while possessing unimodality and bimodality proposed by Díaz-García et al. ., Owen and Leiva [4][5][6][7][8]. Also, GB-S distributions produce models whose parameter estimates are usually non-sensitive to outliers and robust to irregular data [8][9][10][11]. This interest in the GB-S distribution is due to its physical and theoretical arguments, relationship with the normal distribution and other attractive properties.
For t>0, the survival function, cumulative distribution function and probability density function of the GB-S distribution are [6,7]: Where Φ(.) is the standard normal cumulative distribution function, α>0, β>0 and λ>0 are shape scale and location parameters, respectively.
In survival analysis, there are situation which the studied population is a mixture of uncured (susceptible individuals -who may experience the event of interest), and cured (non-susceptible individuals -who will never experience the event), the standard survival models are usually not appropriate because they do not account for the possibility of cure. Many patients with disease like cancer can be long-term survivors, and thus cure models can be a useful tool to analyze and describe their survival data.
Cure rate models contain situations where subjects not sensitive to the occurrence of the event of interest there are situations where a fraction of individuals are not expected to experience the event of interest ; that is, those individuals are cured. These models have become very popular due to the significant progress in treatment therapies leading to enhanced cure rates. In the medical studies, many patients with disease like cancer can be long-term survivors, and thus cure models can be a useful tool to analyze and describe their survival data.
The most popular type of cure rate model is the mixture cure model (MCM) which was introduced by Boag to study cases where there was a proportion of cured patients among treatment receivers for mouth cancer [12]. This model was developed later by Berkson et al. [13][14][15][16]. In this distribution, it is assumed that a specific number of the patients, say p 0 , are cured, in the sense that they do not present the event of interest during a long period of time and can be observed as to be immune to the cause of death under study or cured. In this model, the survivor function for the entire population is given by where S (t) is survival function and p 0 is the cured fraction.
In this pepper, we have proposed a new distribution family here, the Poisson generalized Birnbaum-Saunders cure rate model (PGB-S), which is conceived inside a latent competing causes scenario with cure fraction where there is no information regarding the exact cause of the individual death or tumor recurrence and only the minimum lifetime value among all of the risks is observed and a fraction of the population is not susceptible to the event of interest. This paper is organized as follows: in Section 2 the PGB-S model is formulated by defining the density, cumulative distribution and hazard rate functions of the Poisson generalized Birnbaum-Saunders (PGB-S) distribution and some of its properties are also studied. In Section 3 we explain the maximum-likelihood estimation procedure and parameter inference. In Section 4 the performance of the parameter estimates using Monte Carlo simulation is evaluated. Then, an application to a real data set on breast cancer is given in Section 5. Finally, in Section 6 some conclusions are drawn.

Model formulation
We formulate the model within a biological context. Let M define the random variable unobservable that denotes the number of causes (risk factors) of the event under study for an individual in the population. We assume that M follows a Poisson distribution with parameter φ and mass probability The promotion time for the jth cause (for example tumor cell) to produce the event of interest is defined by Z j , j=1,…, M. We assume that, conditional on M, the Z j s are i.i.d. random variables having the generalized Birnbaum-Saunders cumulative function given by eqn. (2). Also, we assume that Z 1 , Z 2 ,… are independent of M. The observable time to event of interest is defined by the random variable T min (Z 1 ,…, Z M ), and T=∞ if M=0 with P(T=∞|M=0)=1. Under this setup and used the eqn. (5), the survival function for the population is The cure function is defined by p 0 =S POP (∞)=exp(-φ) and the corresponding density function becomes: where f GB-S (t) is the probability density function of the distribution given in eqn. (3). In addition, the population hazard function is given by: The function h POP (t) is multiplicative in φ and f GB-S (t) thus, it has the proportional hazard structure when the covariates are modeled through φ.
The distribution in eqn. (6) can be written as a mixture distribution [13]: Then, from eqns. (4) and (9), the survival function for the noncured population from the Poisson Generalized Birnbaum Saunders (PGB-S) survival function is hereafter given by It should be noted that S (PGB-S) (0)=1 and S (PGB-S) (∞)=0, so that it is a proper survival function. The density function of the PGB-S distribution is given by: From eqn. (11), the parameter β controls the scale of the density distribution while the parameters α, λ and φ control its shape. As φ approaches zero, the PGB-S distribution converges to a GB-S distribution. In Figure 1, the PGB-S density function has been plotted for selected values of the parameters. The PGB-S distribution can be used to model data in survival analysis.
From eqns. (10) and (11), it is easy to verify that the hazard rate function of the non-cured population is Numerical examples allow us to determine that the hazard rate function eqn. (12) is either increasing or unimodal (Figure 2). The first moment [27] of the PGB-S distribution is given by where f PGB-S (t) is the probability density function of the distribution given in eqn. (11).
The skewness and kurtosis of the distribution can be calculated from the ordinary moments given in (13) using numerically obtainable in common statistical software such as SAS university. Figure 3 shows graphical representations of kurtosis and skewness where it can be

Inference
The lifetimes of all the sampling units may not be observable, due to censoring that could be present in the data. We consider the case that the minimum lifetime of all the competing causes is not completely observed and is subject to right censoring. If we consider C i denote the censoring time, Then we observe δ i =I(T i ≤ C i ) and t i min{T i , C i }, where δ i 1 if T i is the observed time to the event defined before and δ i 0 if it is right censored, for i =1… n. Hereafter, we Consider γ (α ,β ,λ) T as the parameter vector of the distribution function of the time-to-event, from n pairs of times and censoring indicators (t 1 , δ 1 ), . . , (t n , δ n ), the observed likelihood function under non-informative censoring [28] is where S POP (t i ) and f POP (t i ) ,respectively, have been defined in eqns. (6) and (7). In much medical research, the lifetimes are usually affected by explanatory variables such as the sex, age, and many others. The parameter ϕ in eqn. (6) is now linked to a vector X i of explanatory variables by assuming a log link, ( ) 1, where η (η 1 ,… , η p ) denotes the vector of regression coefficients. However, we consider ω (γ,η) as the vector of model parameters. By substituting eqns. (6) and (7) into eqn. (14), and use of eqn. (15) the log-likelihood function is We can derived the model parameters by numerical maximization of the log-likelihood function obtained from eqn. (16) using the mathematical and statistical software's. The computational program is available from the authors upon request. Under suitable regularity conditions, the maximum likelihood (ML) estimator ω can be approximated by the multivariate normal distribution with mean vector ω and covariance matrix ( ) ˆêstimated at ∑ = ω ω , namely the information matrix which is as follows: The required second derivatives can be computed numerically. Different models may be compared by penalizing over-fitting using the Akaike information criterion (AIC) and the Schwartz Bayesian criterion (SBC). The model with the smallest value of any of these criteria is taken as the preferred model for describing the given data set. The quasi-Newton and Nelder-Mead methods (we can use option TECH in PROC NLP of SAS) were utilized to obtain the maximum likelihood estimates and asymptotic tests for the parameters were performed [29].

Simulation study
We considered the proposed model with the PGB-S distribution for the event times (T) with the parameters α 2, β=2 and λ (0.4, 0.5, 0.6). For each individual i, i=1,…, n, the number of causes (N i ) of the event of interest for the ith individual is generated from the Poisson distribution with parameter The censoring times are selected from the uniform distribution on the interval (0, τ), where τ was adjusted in order to control the proportion of censored observations. The proportion of censored observations was equal to 40% on average.
We took the sample sizes to be n=50, 100, 500 and 1000. For each sample, we calculated the average of maximum likelihood estimates of the cured fraction ( ) ( ) 0 1 0 0 ( p and p ) , standard deviations (SD) and square root of mean square errors (SRMSE) of the maximum likelihood estimates for 1000 conducted simulations. The simulated data of PGB-S cure models are shown in Table 1. As the sample size increases, the average of maximum likelihood estimates becomes closer to the true values of cured fraction, with the SDs and SRMSEs decreasing.

Results
We discuss an example employing the modeling approach presented in Section 2. The data set includes 557 women with breast cancer who were visited and treated at Cancer Research Center in Shahid Beheshti University of Medical Sciences, Tehran, Iran during 1992 to 2012; the patients were followed up until October 2014. The patients or patients' family members were contacted via phone calls to confirm their health status (i.e. whether they are still alive or not) and to fill any gaps in their medical records. We had to exclude some patients because of some reasons, first their medical records had incomplete information, second they were related to male and third, their cause of death was not breast cancer. The mean of observed time (t) is 4.07±0.94 years with ranges from 3.23 to 211.97 months, (from 0.2692 to 17.66 years) and refers to the time until the patient's death or the censoring time. The Patients who died from other causes, as well as patients who were still alive at the end of the study were censored observations (83%).Stage of cancer (I, n=126; II, n=252; III, n=163; IIII, n=16), grade of cancer (I, n=61; II, n=312; III, n=184), Estrogen Receptor (absent, n=169; present, n=388) and Progesterone Receptor (absent, n=369; present, n=188) are taken as covariates. The Kaplan-Meier estimate of the surviving function is given in Figure 4. The presence of a flat line above 0.4 indicates that the cure models are suitable choices for analyzing this data set.
In the next step the maximum likelihood estimates of the coefficients as well as AIC and SBC selection criteria the PGB-S model are given in Table 2. We also reported the results of Hashimoto. According to both criteria, the PGB-S model stands out as the best one. In order to compare the nested models, which is the case when comparing the PGB-S and PB-S models, one can compute the maximum values of log-likelihoods to obtain the LR statistics for testing H 0 :λ=0.5 versus H 0 :λ≠0.5. The result of this test shows that the null hypothesis can be rejected (p<0.05).
The quantile-quantile plot (QQ plot) of the normalized randomized quantile logarithm residuals from Dunn and Smyth [30], Rigby and Stasinopoulos [31] in Figure 5 suggests that the PGB-S model has yielded a good fit. From Table 2 for two models, the covariate stage of cancer had a significant effect (III and IV) on the reduction of the cured fraction. This estimate can easily be computed from Table 2 due to the parametrization of the cured fraction.      Table 2: Maximum likelihood estimates for the parameters (SD) of the PB-S and PGB-S models with the cure rate fitted to the breast cancer data set.

Discussion and Conclusion
In the analysis of lifetime data we could have the presence of cure fraction and covariates, especially in medical applications. The Birnbaum-Saunders distribution has been extensively used for modeling in several fields such as medical sciences, biological studies, engineering and insurance. Because the generalized Birnbaum-Saunders (GB-S) distribution is a highly flexible lifetime model, so it would be a better fit to the data model which was introduced by Owen [7].
In this paper, a PGB-S model for analyzing survival data with cure fraction was proposed which is conceived inside a latent competing cause's scenario with cure fraction. We introduced a four-parameter continuous model called the PGB-S distribution which extends the GB-S one. We proposed cure rate model has the structure of the mixture cure rate model, and the PGB-S distribution represents the distribution associated with the individuals at risk.
The interpretation of the covariates turns out to be straightforward due to the parameterization of the cure fraction. Looking at the strong points of the PGB-S model, it can be employed as an interesting option to explain/predict the survival time for long-term individuals.