Bayesian and Frequentist Analytical Approaches Using Log-Normal and Gamma Frailty Parametric Models for Breast Cancer Mortality

One of the major causes of death among females in Saudi Arabia is breast cancer. Newly diagnosed cases of breast cancer among the female population in Saudi Arabia is 19.5%. With this high incidence, it is crucial that we explore the determinants associated with breast cancer among the Saudi Arabia populace—the focus of this current study. The total sample size for this study is 8312 (8172 females and about 140 representing 1.68% males) patients that were diagnosed with advanced breast cancer. These are facility-based cross-sectional data collected over a 9-year period (2004 to 2013) from a routine health information system database. The data were obtained from the Saudi Cancer Registry (SCR). Both descriptive and inferential (Cox with log-normal and gamma frailties) statistics were conducted. The deviance information criterion (DIC), Watanabe–Akaike information criterion (WAIC), Bayesian information criterion (BIC), and Akaike information criterion were used to evaluate or discriminate between models. For all the six models fitted, the models which combined the fixed and random effects performed better than those with only the fixed effects. This is so because those models had smaller AIC and BIC values. The analyses were done using R and the INLA statistical software. There are evident disparities by regions with Riyadh, Makkah, and Eastern Province having the highest number of cancer patients at 28%, 26%, and 20% respectively. Grade II (46%) and Grade III (45%) are the most common cancer grades. Left paired site laterality (51%) and regional extent (52%) were also most common characteristics. Overall marital status, grade, and cancer extent increased the risk of a cancer patient dying. Those that were married had a hazard ratio of 1.36 (95% CI: 1.03–1.80) while widowed had a hazard ratio of 1.57 (95% CI: 1.14–2.18). Both the married and widowed were at higher risk of dying with cancer relative to respondents who had divorced. For grade, the risk was higher for all the levels, that is, Grade I (Well diff) (HR = 7.11, 95% CI: 3.32–15.23), Grade II (Mod diff) (HR = 7.89, 95% CI: 3.88–16.06), Grade III (Poor diff) (HR = 5.90, 95% CI (2.91–11.96), and Grade IV (Undiff) (HR = 5.44, 95% (2.48–11.9), relative to B-cell. These findings provide empirical evidence that information about individual patients and their region of residence is an important contributor in understanding the inequalities in cancer mortalities and that the application of robust statistical methodologies is also needed to better understand these issues well.


Introduction
Cancer of the breast known as breast cancer occurs when cells that are in the breast grow out of control, usually forming a tumour that can be detected by X-ray or felt as a lump. Malignant tumours can metastasize to invade surrounding tissues or spread to distant areas of the body, including the blood and lymph system. Several studies have identified breast cancer as one of the causes of death that occur in the Western world [1]. It is also observed as one of the common malignancies among the people of Saudi Arabia, where it affects 21.8% of women. e Saudi Cancer Registry at the King Faisal Specialist Hospital and Research Centre stipulates that about 930 cancer cases are diagnosed yearly in Saudi Arabia or around 19.5% of Saudi and non-Saudi women.
Al-Qahtani [2] found breast cancer to be among the top two common malignancies among Saudi Arabia women. In 2010, it was estimated to be the 9 th most common cause of women mortality in the Kingdom of Saudi. About 27.4% (1,473 out of 5,378) newly diagnosed cancers were female breast cancers [3,4]. Ibrahim et al. [5] predicted that in the next decade, there will be an increase due to population growth and aging. e 2012 report (annual) of the Saudi National Cancer Registry recorded about 26.4% of women less than 40 years compared to 6.5% the United States of America. e diagnoses of breast cancer before age 40 are linked to poorer prognoses, more aggressive cancers, and higher mortality and recurrence rates than diagnoses after age 40 [6,7]. Anders et al. [8] found that seven percent of female breast cancer are diagnosed before they reach the age of 40. Also, survival rates among the younger women are much worse than that for the elderly. ey also found that the incidence of breast cancer among the male population was estimated at 1/100 th of the rate for females.
Few studies have investigated factors that influence cancer progression among women in the Saudi population.
ose that have done so have used mostly standard survival models such as Cox without considering dependencies of the data collected neither do they account for variations that may exist among participants with either similar or different characteristics.
ough this approach may be viewed as being complex, it is necessary to help in determining significant factors of breast cancer.

Data Source and Variables.
We used data from the Saudi Cancer Registry (SCR) to conduct our study.
e Saudi Cancer Registry (SCR) is responsible for cancer data collected across the cancer registries from all the administrative regions in the Kingdom. ese regions include Riyadh, the Eastern and Northern regions, Makkah, Madinah, Qassim, Hail, Jouf, Tabouk, Najran, Baha, Asir, and Jezan. ese regions contain information of the people in the country. e data obtained and used in this analysis are from all the hospitals under the Ministry of Health and also from the private hospitals, clinics, and laboratories. e data are abstracted from patients who are categorised as cancer patients via clinical, histopathological, and radiological diagnoses. e SCR supervises all its offices across the regions to ensure accuracy and quality controls for data collection. e current data contain information on 8,312 patients (8,172 women and 140 men). ese patients were diagnosed with advanced breast cancer between the years 2004 and 2013. Data collected include patients' survival time, censorship, sex of participant, age of the participant, marital status of the participant, address of the participant, nationality of the participant, and tumour details of the participant, such as laterality (primary site), behaviour, grade, stage (extent), and topography of the participant. e primary site (topography) and histology (morphology) of the malignancies were identified using the International Classification of Diseases for Oncology 3rd Edition (World Health Organization, 2000).
Data entry was carried out using CanReg 4 (IACR) software. After removing 2,880 patients due to missing information, the remaining 5,432 patients were analyzed. We conducted a serious of steps, including verifying sites, morphologies, and staging, to ensure data quality and accuracy for all regions. We also verified case linkage (tumour and patient) and performed data consolidation. Other data collected included personal identification, demographic information, and tumour details. e outcome variable was defined as survival time in years for all the patients who were diagnosed with breast cancer. ose who died as a result of breast cancer were the patients classified as having had the event and assigned the number 1. Patients who either dropped out of the study, did not die within the study period, or died from other diseases unrelated to breast cancer were censored (using right censoring mechanism as illustrated in equation (8)) and assigned the number 0.

Statistical
Approach. Series of Weibull models were fitted, using both Bayesian and frequentist frameworks. e standard Weibull model was fitted first and denoted as Model 1. Random terms were added to capture variation at the regional level (Model 2), and extent was captured as Model 3. e decision to use the Weibull regression model was premise on the bases of previous knowledge on the distribution of breast cancer deaths [9]. e Weibull model [10] is a well-established model appropriate for modelling hazards that are either monotonically decreasing or increasing [11]. Application of parametric survival models to model breast cancer has been carried out [9,12] and elaborated on the advantages of using this model to account for censoring in data from administrative and historical databases [11]. Based on the model formulated by Nasejje et al. [13], which was parameterized from Martino et al. [14], we utilized the proportional hazards model of the form: h i (t) � h 0 (t)exp(η i ), t > 0, where the baseline hazard is represented by h 0 (·) and the predictors by η i . In this analysis, we assume that the data follow the right censoring mechanism, where individuals who die as a result of breast cancer are said to have had the event and all others (death not related to breast cancer, dropouts, and lost to follow-up) are said to have censored (have not had the event of interest). If we specify a Weibull model for the baseline hazard, then we have h 0 (t) � λt λ− 1 , λ > 0, while log likelihood for the observation (t i , δ i ) taking into consideration the right censored approach can be further formulated as Following the definition of Martino et al. [14], we let the predictors to be represented by η i , which assumes an additive structured form. We extend equation (2) to the Weibull regression model. Expressing the latent field x, in terms of the predictor η i , the standard Weibull regression is expressed as 2 Computational and Mathematical Methods in Medicine where β 0 ∼ N(0, 0.01 − 1 ) and β � βSex, β 2 Lat, β 3 Top, β 4 Marita, β 5 Grade, β 6 Ext, β 7 Cause, β 8 AddressCode} � N(0, I).
We specify a Gaussian distribution with hyperparameters θ � (θ 1 , θ 2 ) as priors for the Weibull regression coefficients and the gamma distribution Γ(a, b), with mean � a/b and variance � a/b 2 as priors for λ. Flat priors were also assigned to τ � Γ(1, 1).

Conditional Survival Models (Gamma and Log-Normal
Frailty Models). Implementing standard survival methodologies without accounting for the correlational structure has methodological and practical implications. ese approaches methodologically ignore the clustering effect among communities within the same region which has the potential to bias the estimation due to violations of the statistical assumptions of independence. It is for this reason that conditional (frailty) models which allow for the correlational survival experiences of patients and therefore provide better estimates of the coefficients and their standard errors were developed. Broadly, the importance of frailty models cannot be overemphasized as they are used to estimate the effect of unmeasured factors on the risk of cancer deaths. e frailty term in survival models is considered a random variable over/of the population and constitutes a frailty distribution.
In this work, we extend marginal survival models that have been extensively applied in modelling breast cancer risk factors elsewhere [15,16] by including the random effect (the frailty) which is acting multiplicatively on the hazards function, as demonstrated elsewhere [14,[17][18][19][20].
In conditional survival models, it is argued that individual risk of death is a function of the measured factors and that of a random perturbation on the baseline hazard as a result of the unobserved effects. In this work, the standard Weibull function was extended to include the frailty term, which was constructed as described below.
Let t ij be the survival time for the j th cancer patient in the i th region. In this work, say patients' j, where j � 1, . . . , n, are nested within regions (address code), that is, i � 1, . . . , s. We further assumed that these t ij follow the Weibull regression model. e Weibull model can mathematically be expressed as We conditioned the hazard function of t ij on the parameter η ij and the Weibull scale parameter λ.
is is mathematically expressed as where η ij � β 0 + Z T ij β in which β is a p × 1 vector of regression coefficients, β 0 is the intercept, and z ij is a p × 1 covariate vector.
A frailty model that is expressed in a univariate case introduces what is referred to as an unobserved multiplicative effect on the hazard [21] such that it can be conditioned on the frailty as where ψ i (for the i th group) is taking to be a random positive quantity which has a mean of 1 and variance say, θ. Individuals with the frailty parameter of ψ i > 1 are said to be frail and with an increased risk of failure (breast cancer mortality) due to unexplained reasons by the covariates while individuals with ψ i < 1 are said to be less frail and with a decreased risk of failure (tern to leave longer). Due to the multiplicative nature of ψ i , one can think of a frailty as representing the cumulative effect of one or more omitted covariates as illustrated in equation (6) (for details on frailty modelling, refer to [22][23][24][25][26][27] and [28]). Using the Laplace approximation modelling technique as detailed in Martino et al. [14] and Martino and Rue [29], we formulate the Weibull regression with the frailty parameter as where ψ i � log(μ i ) ∼ N(0, t − 1 ) with completed likelihood assuming a right censoring mechanism given by Log-normal frailty was used to establish the association that exists between survival times that maybe related to patients from the same region. Normal priors were assigned to the regression coefficients (betas) while flat priors were assigned to τ and λ parameters.
We allowed δ ij to represent censoring with a value 1 if the j th patient in the i th region dies and 0 otherwise with D � (t, x, δ, ψ).
However, fitting gamma frailty models in INLA, which is based on the Gaussian Markov field, requires employing a different technique.
us, we reparametrized the Weibull function as advised by Martins and Rue [30]. Parameterization of the Weibull survival model to include gamma frailty was achieved by imposing restrictions on the non-Gaussian components of the latent field so that this distribution could be very well approximated by a Gaussian density. is correction is appropriate, especially in light of adding flexibility around a Gaussian distribution, as advised by Martins and Rue [30].
Using equation (7), with η ij representing the linear predictors as described in Martins and Rue [30], the gamma frailty term ψ i was further reconstructed as log is the mean and τ ψ (k) is the precision parameter of the Gaussian approximation to the log gamma random term.

Bayesian Analysis Using a Deterministic Approach (INLA).
e Bayesian approach is implemented using integrated nested Laplace approximation approach (INLA). INLA provides approximations of simulation-free Bayesian inference using integrated nested Laplace approximations. e technique has found favour in modelling works that involve complex models, as it has been known to circumvent computational cost which is a big challenge in most samplingbased modelling techniques [14,30,31]. Modelling in INLA is based on directly approximating the posterior marginal; π(x i /y) and π(θ i /y) from the posterior distribution; e approximated posterior marginals are used for computing summary statistics of interest, including the posterior means, variances, or quantiles. e strategy involves three components: the data, the latent model, and the hyperparameter. e data in INLA are generated from the original components of the observation x and hyperparameter θ.
is is also part of the likelihood function defined as π(y/x, θ).
INLA is principally based on statistical inference for latent Gaussian Markov random field (GMRF) models. GMRF models are hierarchical models involving several steps [32][33][34]. e first steps include finding a distributional assumption for the observed y, which is usually conditioned on some latent parameters η and some parameters θ formulated as e latent model component is the most appealing part of the INLA framework, especially as it relates to how frailty modelling works. e latent modelling component is drawn from the unobserved components x. e unobserved component also represents the structured random effect (region effects) and the unstructured random effects representing effects at individual and group levels estimated together with predictors. e latent components are linked to the likelihood through linear predictors, as shown in equation (12) according to [30].
e third component is on the hyperparameters (θ � θ 1 , θ 2 ), which are very important components/elements/ constituents of the Bayesian framework. e hyperparameters are helpful in specifying the prior distribution and are important for determining precision for unobserved factors.

Description of Analysis.
Descriptive statistics such as means and standard deviations were used to analyze measures such as age of the patient. Frequencies as well as proportions were used to describe categorical variables. e Kaplan-Meier survival curve and the log-rank test were carried out to determine differences in survival by gender, extent, and address code. All variables that were included in the final analysis for each of the models were observed to be either significant at the simple regression modelling stage or were found to be a significant predictor of our outcome from previous findings (literature).
From the Bayesian perspective, associated factors of breast cancer mortality were modelled using the integrated nested Laplace approximation (INLA) approach via a shared gamma and log-normal frailty models. Estimates generated from INLA were exponentiated to obtain posterior Bayesian summaries which were reported as hazard ratios with their associated 95% credible intervals.
is approach enabled us to account for the unmeasured factors and further allowed us to compare the variances between patients, regions, and tumor extent. A frailty variance that is greater than one suggests a higher rate for the event (or shorter survival times) than would be predicted under the basic Weibull model, while that less than one suggests a lower rate (longer survival times). A 95% credible interval that excludes one is deemed to be significant. Comparison of models and their fit was assessed via the deviance information criterion (DIC) and Watanabe-Akaike information criterion (WAIC) in the Bayesian paradigm. Models based on the frequentist approach were compared using the Akaike information criterion and the Bayesian information criterion. For the Bayesian approach, marginal log likelihoods were extracted and used to compute the AIC outside the INLA environment. Using the Computed DIC from the Bayesian approach and the extracted AIC from a frequentist approach, we were able to determine the best fit model between the Bayesian and frequentist fitted models. All statistical analyses were performed following the R (frailtypack) package, Rondeau et al. [35] for the frequentist approach, while the Bayesian approach was implemented using a deterministic based approach, known as INLA, a variant of R open source software. Before running the models, we scaled the survival time to the maximum of 1 (one) similar to variable centring (time/time max). is was done in order for the INLA software to converge or else it crashes. We used the Bayesian statistical approach in order to obtain posterior estimates which are easy and straightforwardly interpretable mainly because of the random nature of the parameters in the Bayesian as opposed to the frequentist framework.

Results
In Table 1, we provide a summary of the covariates included in this analysis. ere are evident disparities by regions (address code), with Riyadh, Makkah, and Eastern Province having the highest number of cancer patients at 28.39%, 26.30%, and 20.47%, respectively. Grade II (5.86%) and Grade III (4.63%) are the most common cancer grades. Left paired site laterality (50.92%) and regional extent (52.32%) were also most common characteristics. In general, there were 13.19% cancer deaths and 86.81% noncancer deaths. e majority of the participants were females (98.73%) with 1.27% being males. Quite a number of the participants were divorced (84.28%) with only 2.51% being married.  Table 2). We feel it is the parametrization via the INLA software that might have inflated the BIC and DIC values under the Bayesian approach. Due to the large values obtained for the BIC and DIC via the Bayesian fitted models, we anticipate model fitting issues, and this also reveals lack of data support especially for the proposed gamma frailty distributed model.

Fixed Effect Model (Standard Weibull Model Results).
We estimated two models that assume that cancer patients from same region have uncorrelated survival risk. Furthermore, two additional models were constructed to control for frailty effect at the regional level. is strategy enabled us to examine the effects of unobserved factor estimates of the known determinants for cancer. Presented in Table 3 are the fixed effects results from the two standard Weibull models constructed to analyze cancer patients' data. Overall marital status, grade, cause of death, and cancer extent increased the risk of a cancer patient dying. Respondents who are married had a hazard ratio (HR) of 1.36 and a 95% CI (1.03-1.80) and that of the widowed had a HR of 1.57 and a 95% CI (1.14-2.18). Both of them were at a higher risk of dying with cancer relative to those respondents who were divorced. For grade, the risk was higher for all the levels i.e Grade I (Well diff) had an HR of 7.11 and a 95% CI   Table 3 provides more information about the determinants included in this analysis. Tables 4 and 5 provide results from the two conditional models fitted to control for unobserved (regional) effects. e estimated variance  associated with the frailty effect in the Weibull log-normal frailty model presented in Table 4 was less than 0.01 in Model 2a and 5.2 in Model 2b. For Models 3a and 3b as presented, which included the gamma frailty parameter, the frailty variance was 0.98 for Model 3a and negligible for Model 3b. e frailty parameter values for Model 2b and Model 3a are statistically significant indicating that survival risks among cancer patients vary as a result of unobserved effects which is shared by members of the same region. Following the work of Govindarajulu et al. [22], the estimated frailty as shown in Model 2b indicates that a cancer patient's death in a particular region increases the risk of the death for the index cancer patient by (exp( ��� 5.2 √ ) � 9.78) times relative to the overall risk of breast cancer patient's death. Also, the estimated frailty variance value of 0.98, for Model 3a, denotes that every death is associated with (exp( ��� � 0.98 √ ) � 2.69) increased risk of the indexed cancer patient dying relative to the average risk of death. ough the standard errors for the hazard ratios of the covariates are generally higher with the frailty Weibull models as compared to the standard model with higher confidence intervals for grade, for instance, the impact of the frailty term on parameter estimates is small and does not change the significance of any of the parameter estimates in the model. Similarly, the magnitude of the effect of the determinants included as covariates in the model is largely unchanged in the presence of the random term known as frailty.

Comparing Conditional Models.
However, though the two modelling strategies produced similar results, conflicting results were also observed. For instance, marital status and cancer extent exhibited consistently reduced risk of cancer deaths compared to what was reported using frequentist approach. ere were also considerable changes in the effects of selected covariates used in the frequentist when analyzed using the Bayesian approach. For example, the effect of age, paired site as a measure of literality, and gender on the risk of death became apparent as the confidence intervals of the two variables did not include 1 (one) in the Bayesian approach. Overall, the magnitude effect of topography and grade became nonsignificant when analyzed using the Bayesian approach. We observed that the

Discussion and Conclusion
We have examined, in this paper, the effects of demographics, extent, topography, grade, and region on breast cancer patient risk of dying in Saudi Arabia. e study compared two advanced statistical approaches that have been used to correct for dependencies in data structures. As demonstrated in this work, marginal survival models (fixed effects models) can be extended to conditional survival models by introducing a random term in the regression model which is later modelled, either through a semiparametric or a parametric approach [19,36]. ese types of models are used more, particularly in medical studies where the patient's levels are recurrent in nature and have correlated biomedical data. e history of application for these types of models in medical statistical studies include under five studies [19], cancer studies [37,38], kidney transplants [39], and genetic studies [22].
Our results suggest that there is evidence of some variation among regions in the risk of cancer mortality that may not be accounted for by the measured factors. e magnitude of effects of the covariates is smaller, and the standard errors are slightly inflated with the frailty models compared to the standard models. As noted, the frailty term addition in the model did not change the direction of the effect of the covariates and at best the estimates remained the same throughout the models, and substantive conclusions from the Weibull Standard models remain valid. Consistent with findings from other studies that applied both frequentist [18,40] and Bayesian approach [36,41] separately, the presence of heterogeneity in our study acting at the region  [18] in Iran reported the frailty relative risk of 7.2 in patients with breast cancer, while Gorfine et al. [42] established that accounting for heterogeneity improved the ability of the model to predict the risk of developing a disease over time based on carrier status. Bakhshi et al. [40] proved that cure and frailty models were better than the Cox model to estimate patient survival probabilities.
On the other hand, the presence of unobserved effects in this study reflects a different level of factors that can be categorized as genetic or behavioral, occurring at individual and regional levels, which may not have been measured. We are, however, confident that the other factors that were not considered but helped to reduce the unobserved household effect among cancer mortality risk may have details of specific family history, education background, type of treatment, and progesterone receptor (PR) [40,43] among other factors.
is modelling approach is an extension of the generalized mixed effects model and can be classified as a generalized survival linear mixed model. ese types of models have a complex structure which is easily exploited using the frequentist approach and now the Bayesian approach as demonstrated in this work due to the availability of statistical software.
ough our approach is complex, the results obtained are consistent with what has been reported elsewhere. Similar to this study, metastasis increased the effects on the hazard of the event in a study conducted in Iran [40].
We have also demonstrated that some of the effects of the region level variables are difficult to understand. For instance, in this study, topography and grade were found not to be statistically significantly associated with cancer mortality. is is in contrast to what is biologically known and reported elsewhere, where lesion [44] and grade [45] were reported to be strong breast cancer risk factors.
However, this study shows that marital status is predictive of breast cancer mortality risks, and it appeared likely to be a surrogate for cluster effect at regional level including other contextual factors that may not have been examined in this study.

Strength and Limitations
is research has a number of limitations that need to be pointed out. e power of the study would have improved if spatial temporality was incorporated in the analysis strategy to account for possible spatial effect. Conflicting evidence in case of age, paired site as a measure of literality, and gender on the risk of death revealed inherit differences that could be attributed to differences in computational strategies. INLA being a deterministic strategy, though fast, required reparametrization of the likelihood and the prior distribution so that gamma frailty which is a non-Gaussian distributed parameter can be specified and handled appropriately. We feel that it is this parametrization that could have affected the results especially in instances where ambiguous values were obtained. Due to the large values obtained for the BIC and DIC via the Bayesian fitted models, we anticipate model fitting issues, so our estimates should be interpreted with caution. Furthermore, ambiguous and unrealistic estimates obtained in the Bayesian approach revealed lack of data support to the proposed gamma frailty distributed model contrary to what is assumed elsewhere [19,46]. In this work, the gamma distribution with parameters (kappa, kappa) assumed a fixed expected value of 1 (one) and variance 1/ kappa. However, more discrepancy was also noted between log-normal frailty models fitted using the two approaches.
Notwithstanding the above limitations, the findings as presented show a further step toward an improved understanding of the risk of breast cancer survival among the people of Saudi Arabia.
is study has provided further evidence for the argument that information about individual patients and region is important in understanding inequalities in cancer mortalities and the application of robust statistical methodologies is also possible. Unlike the Bayesian approach, the frequentist approach via both the gamma and the log-normal frailties did not exhibit any problem with the fit of the data. is shows that the frequentist approach exhibits a good fit to the breast cancer data when modelled through the Weibull proportional regression with frailty.

Data Availability
Data were obtained from the Cancer Registry in King Faisal Specialist Hospital (KFSH) and the Research Centre (RC) upon request. To have access to the data, a formal application of request must be submitted to info@shc.gov.sa or through the cancer registry website: https://nhic.gov.sa/eServices/ Pages/manageregisters.aspx.

Conflicts of Interest
e authors declare that they have no conflicts of interest.