The Generalized Gamma Shared Frailty Model under Different Baseline Distributions

In the analysis of clustered survival data, shared frailty models are often used when observations in the same group share common unknown risk factors or frailty. There is dependence in the event times belonging to the same group, while event times from different groups are conditionally independent given their covariates. In such models, the known effect on survival time is described using the baseline distribution and regression coefficients while the unknown effect is described through a frailty distribution. In this paper, the Gompertz, log-logistic, and generalized exponential distributions are studied as baseline distributions, under a shared frailty effect described by the generalized gamma distribution. Their hazard functions have been compared and their applicability under different settings and performance with generalized gamma frailty has been explored. These models are fitted to three real life datasets using Bayesian estimation methods and compared using the Bayesian Information Criteria (AIC, BIC, and DIC) and the Bayes Factor. KeywordsGompertz hazard, Log-logistic hazard, Generalized exponential hazard, Bayesian information criteria, Bayes factor.


Introduction
In survival studies, there are often observations that need to be grouped together on the basis of the study centre, hospital, city, etc. In all such cases, individuals belonging to the same group or cluster are exposed to the same environmental factors. Such factors are unknown and are depicted in the models as a common unknown risk or shared frailty of a group. This causes a dependence in survival times of a group while observations of different groups are conditionally independent. In shared frailty models, the effect on survival times due to known covariates or treatment effects is described using a baseline distribution and regression coefficients while effect due to unknown risk factors or randomness in the data is described using a frailty distribution.
The baseline distribution describes the overall common risk to all individuals in the study. The distribution of event times in survival studies is generally discussed in terms of the hazard function. The hazard function is the instantaneous probability of failure over time and hence the form of the baseline hazard function, ℎ 0 ( ) is of considerable importance. It may be increasing, decreasing or constant over time and thus, different distributions are applicable in different studies.
The frailty distribution in the shared frailty model is helpful in describing a dependence structure where the observations within the clusters are correlated as they share a common frailty, while clusters among themselves are conditionally independent given the covariates. In grouped data, it is expected that there will be high heterogeneity among clusters. However, as the factors affecting frailty are unknown, it would be more appropriate to describe the frailty effect using a flexible distribution whose probability density function (PDF) can take various shapes. The generalized gamma distribution (GGD) includes many distributions as its particular cases viz. Weibull, Gamma, exponential, and log-normal distributions (Khodabin and Ahmadabadi, 2010) and hence it is a convenient choice to model frailty (Sidhu et al., 2018).
Earlier work done on the generalized gamma shared frailty model (Balakrishnan and Peng, 2006;Chen et al., 2013) shows how this distribution can estimate the heterogeneity among clusters more efficiently than other distributions. However, there are computational problems due to the integral in the unconditional likelihood not being in a closed form, making it an unfavourable choice to model frailty. Bayesian estimation of the parameters of the model (Sidhu et al., 2018) can make estimation easier and faster as prior information can be incorporated in the model or the problems of multiple modes or non-convergence can be diagnosed quicker.
In this article, the applicability of the model under three different baseline distributions viz. Gompertz, log-logistic, and generalized exponential, have been explored. The distributions are widely applicable in survival studies for different types of datasets and have been modelled with a gamma frailty, inverse Gaussian frailty, and log-normal frailty (Manton et al., 1986;Klein et al., 1999;Wienke et al., 2003Wienke et al., , 2005Locatelli et al., 2004;Hens et al., 2009;Hanagal and Dabade, 2013;Hanagal and Sharma, 2013) among other distributions, but have not yet been used in conjunction with the generalized gamma frailty effect.
In Section 2, the generalized gamma shared frailty model with non-informative right censoring is described. This is followed by a discussion on some basic properties of the baseline hazard functions of Gompertz, log-logistic, and generalized exponential distributions in Section 3. In Section 4, an outline of the estimation procedure is given and these models are compared using real life datasets in Section 5.

Shared Frailty Model with Censoring
Consider G groups of survival times with ni observations (i = 1, 2… G) per cluster. For the j th individual in the i th cluster, the survival time, censoring indicator, and the vector of s known covariates are denoted by tij, δij, and Xij respectively.
The survival times considered in this paper are right censored, that is, tij is the true survival time (Tij) when the event is observed and is the censoring time (Cij) when the event is not observed due to any reason such as loss in follow-up or failure to complete the study. In other words, Associated with each cluster is the shared frailty effect zi, which acts multiplicatively on the hazard function, increasing or decreasing the hazard for the j th individual in the i th cluster. The conditional hazard function for j = 1, 2… ni, i = 1, 2… G, is given by where β is a vector of s regression coefficients corresponding to the known covariates in the study.
The corresponding conditional survival function is written as where 0 ( ) is the cumulative hazard function.
Under the assumption that the event times and censoring times are independent and that the censoring in the study is non-informative, the conditional likelihood for the frailty model is given as Integrating L over the entire range of the frailty variable Z, the unconditional likelihood function L, is obtained as where (. ) denotes the PDF of the generalized gamma frailty distribution (GGD (b, d, k)) given by Note that d and k are the shape parameters and b is the scale parameter of the GGD. To make the parameters of the model identifiable, we set the mean of the frailty parameter equal to 1. Hence, This gives

Baseline Distributions 3.1 Gompertz Distribution
The Gompertz distribution has been widely used in survival studies, especially in demographical or biological areas. In actuarial studies, it has been used to describe an exponentially increasing mortality with age. Its PDF, hazard function, and cumulative hazard function are given by The hazard function for this distribution increases from ω at t = 0 to ∞ as t → ∞ for α > 0 and is applicable in studies where the hazard is increasing at an exponential rate with respect to time.
For α < 0, the distribution has a decreasing hazard which converges to -ω/α as time increases. This implies that a certain proportion of the population never experiences the event. While this may be useful in certain cases like cure rate models, but in this paper, α > 0 has been considered.
Although, the Gompertz distribution provides a close fit to adult mortality in developed countries, this distribution cannot be used where the risk of failure is fairly constant as t → ∞. Such phenomenon is experienced often in clinical trials relating to the treatment of a disease or in reliability studies of machines.

Log-Logistic Distribution
The two parameter log-logistic (LL) distribution has a fairly flexible hazard function as compared to the Gompertz distribution as shown by Figure 1. The PDF, hazard function, and cumulative hazard function are given by Its hazard function is either decreasing or hump shaped as the shape parameter α varies. It fits datasets that have a hazard rate that is either initially high or increases rapidly at the beginning of the study to a finite maximum and later gradually reduces as t → ∞.
Even though it is one of the closest alternatives to the Weibull distribution, this distribution cannot be used in cases where the dataset has an increasing hazard throughout the study period.

Generalized Exponential Distribution
The generalized exponential (GE) distribution has been considered as the third baseline distribution in this article due to its simplicity in analysing data with an increasing, decreasing or flat hazard rate. The corresponding PDF, hazard function, and cumulative hazard function are given by The hazard function for GE distribution varies when the shape parameter α changes. At α = 1, the hazard is constant and equal to ω, while it increases from 0 to ω for α > 1 and decreases from ∞ to ω for α < 1. The main advantage of using this distribution to model the baseline hazard is that it incorporates a case of a constant hazard with respect to time, which may be true for the common effect in some survival studies.

Estimation Procedure 4.1 Likelihood Function
After replacing the conditional hazard and conditional survival functions (expressions (1) and (2) respectively) in the likelihood function given by (3), and using Gauss Laguerre quadrature rule, L reduces to The integral Ii is approximated as The ℎ 0 ( ) and 0 ( ) are defined in sub-sections 3.1, 3.2, and 3.3 for the three models, that is, Gompertz, log-logistic, and generalized exponential baseline, under a generalized gamma frailty distribution. In the sequel, these models shall be referred to as Models I, II, and III respectively.

MCMC Algorithm for Bayesian Estimates
In the Bayesian method of estimation, the parameters of the model are considered to be random variables. The dataset in the form of the likelihood function ( | ), is clubbed with the prior information (prior density) ( ), available about the parameter. This gives the posterior density Since we generally have limited information about the parameters of the model, prior distributions are taken to be flat by using distributions with high variances. The Normal distribution, N (0, 1000) is used for regression coefficients while Gamma (0.001, 0.001) is used for all other non-negative parameters.
Assuming that all parameters of the model are independently distributed and using the likelihood function given by (5)   In order to obtain the estimates of the parameters, a sample is drawn using the Metropolis-Hastings Algorithm (Metropolis and Ulam, 1949;Metropolis et al., 1953;Hastings, 1970) from each posterior density and conclusions are drawn.
Detailed derivation of the expression in sub-section 4.1 and the algorithm for estimation in subsection 4.2 have been given in Sidhu et al. (2018).

Model Comparison
In the presence of numerous choices to model datasets, it becomes important to be able to compare different models to choose the one that provides the best fit. For this purpose, the Akaike Information Criteria (Akaike, 1974), Bayesian Information Criteria (Schwarz, 1978), Deviance Information Criteria (Spiegelhalter et al., 2002), and Bayes Factor are used. The models with lower values of AIC, BIC, and DIC are preferred. Generally, these three methods give concurrent results. However, AIC penalizes the number of parameters less strongly than BIC and may not be preferable.
DIC incorporates the effective number of parameters and is frequently used when the estimates are obtained using the output from a Markov chain. However, DIC suffers from theoretical drawbacks and may not be a suitable choice (Spiegelhalter et al., 2014).

The Bayes factor (BF) as a model choice criterion is defined as
Although, (2 10 ) is approximately equal to the difference in the BIC values for the models, we use the method given by Kass and Raftery (1995), to compute ( | ) from the MCMC sample obtained for each of the parameters in the model. A value of more than 10 for (2 log 10 ) indicates an extremely strong positive evidence to favour 1 over 0 while a value between 0 and 2 is insufficient evidence to favour either model (Kass and Raftery, 1995). A value between 2 -6 or 6 -10 indicates a mild or a moderately strong evidence respectively, to prefer the numerator model.

Applications
To demonstrate the use of Models I, II, and III, the Metropolis-Hastings algorithm is applied to the Catheter Infection (CI) dataset by McGilchrist and Aisbett (1991), chronic granulotomous disease (CGD) dataset by Fleming and Harrington (2011), and the tumorigenesis study on rats (RATS) by Mantel et al. (1977).
CI dataset contains recurrent infection times from the use of catheters for 38 patients using a portable dialysis machine. The two infection times per patient are grouped together in a cluster. Other information available is the censoring status of each infection time, patient's age, gender (0 -male, 1 -female), and disease (Glomerulo Nephritis (GN), Acute Nephritis (AN), and Polycystic Kidney Disease (PKD)).
CGD dataset contains time to first serious infection from the use of gamma interferon in chronic granulotomous disease. There are 128 observations recorded at 13 centres along with their treatment status (placebo or rIFN-g), sex (0 -female, 1 -male), age, pattern of inheritance (0 -Xlinked, 1 -Autosomal Recessive), use of corticosteroids (0 -Used, 1 -Did not use), and use of prophylactic antibiotics (0 -Used, 1 -Did not use). We group the dataset on the basis of the institute of treatment, although this data can be grouped together on the basis of multiple other factors like institution category, recurrent times, or pattern of inheritance.
RATS dataset consists of times to appearance of the tumor in three rats (one treatment and two control) that belong to the same litter. The original dataset had grouping on the basis of gender as well, but the male population was heavily censored. Hence, we consider only the 50 female litters for analysis in this article. The data available are treatment status (0 -control, 1 -Treatment) and censoring indicator. To illustrate how different these datasets are, we obtain preliminary estimates of the parameters using "frailtypack" package in R (Rondeau et al., 2012). The built-in function "frailtyPenal" is used where Weibull distribution is taken to model the baseline hazard and the shared frailty is assumed to follow gamma distribution. Table 1 shows the estimated value of α, the shape parameter of the Weibull distribution and 2 , the variance of the frailty term. For the Weibull distribution, α > 1 indicates an increasing hazard while α < 1 points to a decreasing hazard with respect to time. As per Table 1, the estimated frailty variance signifies a low to moderate degree of heterogeneity among the clusters. Figure 2. shows the predicted hazard function for the three datasets. As per Figure 2, the hazard functions for CI data and RATS data are both increasing with respect to time, hence GE and Gompertz distributions can be used to model the baseline effect. CGD data indicates a decreasing hazard with time and ideally log-logistic baseline should give the best fit. In order to find the most appropriate baseline for these three datasets, they are fitted to Models I, II, and III using the estimation procedure mentioned in Section 4.   Table 2 lists the estimated parameters and the credible intervals of the covariates for CI dataset.
The value of ̂, shows that the population hazard increases with time. Gender is a significant factor affecting hazard as its credible interval does not contain zero. Negative value for β 2 indicates a lower hazard for the female population.  Table 3, presents the results from CGD dataset fitted to the three models. As per the results, the factors treatment, age, and inherit significantly affect the hazard rate. While the hazard is lower for individuals under treatment and older in age, there is an increased hazard rate for individuals with Autosomal Recessive Inheritance. Additionally, Models I and II also indicate a significantly higher risk for the female patients and those not using corticosteroids. As there is a difference in the results of these models, it becomes important to look at the results of the model comparison. Both AIC and BIC indicate that Model II is the best model for the dataset, hence the two factors sex and steroids should also be considered as significant factors affecting survival times. As per the results reported in Table 4 for RATS dataset, the treatment given significantly increases the hazard rate thus implying an increased risk of tumors with the treatment.
The best model as per AIC and BIC values for CI and RATS datasets (Tables 2 and 4) is the GE baseline while LL baseline is best for CGD dataset (Table 3). DIC values however, do not give us the same conclusions. Also, none of the AIC, BIC, or DIC values indicate whether one model is significantly better than the other. For this reason, Table 5 listing the values of (2logBF) for a pair of models is considered. The table lists only positive values as the reciprocal model is the negative of the value listed, that is, (2logBF01) = -(2logBF10). The values in Table 5 indicate that, for CI data, the GE and Gompertz baseline provides a significantly better fit than the LL baseline as the corresponding values are greater than 6 (6.62 and 9.87). However, GE baseline is only slightly better than the Gompertz baseline as the corresponding value (3.24) falls between 2 to 6.
For CGD dataset, all positive values are greater than 10 indicating a marked difference in the fit provided by the three baselines with LL being the best model with value of 68.58 when compared with Gompertz and 133.91 when compared with GE. GE provides the worst fit among the three with Gompertz being favourable among GE and Gompertz with a value of 65.33.
For RATS dataset however, GE and Gompertz baselines provide a slightly better fit than LL baseline with corresponding values of 3.52 and 3.46. However, both GE and Gompertz provide a comparable fit (value falling between 0 and 2) with GE/Gompertz = 0.06.
When the variance of the shared frailty term ( 2 ) under gamma distribution (Table 1) is compared with the best models suggested under the generalized gamma distribution (Models in bold font in Tables 2-4), it appears that the variance tends to be underestimated under gamma frailty. The models under different frailty distributions can also be compared using similar methodology to lend credence to an improved variance estimation under the generalized gamma distribution when using Bayesian methods, as has been performed earlier for the estimation under classical inference (Balakrishnan and Peng, 2006;Chen et al., 2013). Heterogeneity among the clusters indicates a source of variation in the hazard that has not been accounted for in the study. Since this can impact the results of the study, it is important for researchers to be able to diagnose the source of variation in order to have more meaningful results.

Conclusions
In this paper, the use of the Gompertz, log-logistic and generalized exponential distribution to model the baseline hazard under a generalized gamma shared frailty effect has been studied. Bayesian estimation is possible for all models and has been illustrated with three real life applications. Model comparison is performed using the Akaike Information Criteria, Bayesian Information Criteria, Deviance Information Criteria, and Bayes Factor and the best model is suggested for each dataset. The variance of the frailty term is also found to be underestimated when the shared frailty effect follows the gamma distribution.