Bayesian Generalized Linear Mixed Modeling of Breast Cancer

Background: Breast cancer is one of the most common cancers among women. Breast cancer treatment strategies in Nigeria need urgent strengthening to reduce mortality rate because of the disease. This study aimed to determine the relationship between the ages at diagnosis and established the prognostic factors of modality of treatment given to breast cancer patient in Nigeria. Methods: The data was collected for 247 women between years 2011–2015 who had breast cancer in two different hospitals in Ekiti State, Nigeria. Model estimation is based on Bayesian approach via Markov Chain Monte Carlo. A multilevel model based on generalized linear mixed model is used to estimate the random effect. Results: The mean age of the patients (at the time of diagnosis) was 42.2 yr with 52% of the women aged between 35–49 yr. The results of the two approaches are almost similar but preference is given to Bayesian because the approach is more robust than the frequentist. Significant factors of treatment modality are age, educational level and breast cancer type. Conclusion: Differences in socio-demographic factors such as educational level and age at diagnosis significantly influence the modality of breast cancer treatment in western Nigeria. The study suggests the use of Bayesian multilevel approach in analyzing breast cancer data for the practicality, flexibility and strength of the method.


Introduction
Breast cancer refers to a malignant tumor developed from cells in the breast.Breast cancer is the most common cause of malignancy among women worldwide (1)(2)(3) and is a public health challenge among Nigeria women.Some years ago, breast cancer was not common in African countries especially Nigeria and was thought to be a leading course of death in the developed countries (4).Thirty Nigerian women die of breast cancer every day in 2008 and this had risen to 40 women in 2012.Over 508000 of Nigerian women died in 2011 as a result of breast cancer (5)(6)(7)2).In a study about prevalence of breast cancer in western Nigeria found that breast cancer alone accounted for 37% of all the cancer cases present in western Nigeria (8).With advancement in medical technology, advances have been made in the treatment of breast cancer among Nigeria women.Better treatment for breast cancer patients is difficult to define and older women are sometimes excluded from clinical treatment trials, probably because of their age (9).Since breast cancer biology differs from patient to patient with respect to factors like age, variation in response to treatment, and substantial competing risks of mortality (10)(11)(12), the ex-clusion of some patients might not be valid.This implies that those aged women included in trials are probably not a true representation for the general older population (13,14).Because of this, treatment of cancer in women with concise strategy is urgently needed.Therefore, there is need for modeling of breast cancer using evidencebased strategy technique.Prognostic factors for breast cancer especially in Western Nigerian had been well studied; there is paucity of data on population-based research.In addition, few studies have used Bayesian to establish the prognostic factors associated with the disease.Hence, we sought to determine risk factors for treatment given to female breast cancer in western Nigeria using generalized linear mixed models.This research focuses on the justification for the use of the conventional surgery method in the treatment of cancer, based on data from two understudied hospitals in western part of Nigeria, one federally owned, and the other state-owned.Hence, we aimed to provide knowledge on risk factors for breast cancer treatment modality, using both the classical and Bayesian approach via generalized linear mixed model.

Ethics
Ethical clearance to conduct the study was sought from the Ethical Review Committee of the Federal Teaching Hospital, Nigeria.

Generalized Linear Mixed Model
Generalized linear mixed models (GLMMs) are an extension of linear mixed models to allow response variables from different distributions, such as binary responses.Alternatively, GLMMs is an extension of generalized linear models (e.g., logistic regression) to include both fixed and random effects (hence mixed models).The general form of the generalized linear mixed model is expressed: ( | , , ) ' where  is an Nobs × 1 column vector, the out- come variable;   ′ is a N x p matrix of the p pre-dictor variables;  is a p x 1 column vector of the fixed-effects regression coefficients; B is an Nobs × Nr design matrix for the random effects (the random complement to the fixed τ); k  is a q x 1 vector of the random effects (the random complement to the fixed  ); and is a N × 1 column vector of the errors.The present study deals with a specific case of generalized linear mixed model where the response variable is binary and two levels of analyses (hospitals) are considered which is the best model to apply when we have this type of situation.Only hospital is considered at two levels in the model, then the B component is removed from the model in equation (1).For a given observation i within a hospital k, the generalized linear mixed model excluding the error term becomes: where k  is the random effect representing the influence of hospital k on its within observation.Hence, the model utilized a logistic link function . Therefore, the final model is now expressed as: [4] where i  represents the predictors for an individ- ual i in an hospital k which is categorical.
i  de- note the response variable,  is the vector of the fixed regression coefficients and k  is the ran- dom effect component (14).The estimated values of the regression parameters that maximize the probability of obtaining the observed data in classical statistics (15,17) can be obtained by maximum likelihood.Likelihood of k independent measurements, given vectors of parameters  and explanatory (predictors) variables i  is repre- sented as (15): In case of binary logistic regression where response (outcome) variable i  = 1 or 0, the likeli- hood function is estimated as (15): The above expression can be expressed as follows The estimate of  that maximizes the likelihood function can be computed using advanced calculus.This is achieved by taking the logarithm of the likelihood function: Hence, the confidence interval of the parameters is then computed using Wald test and is represented as µ ¶ µ where µ j  is fixed effect coefficient associated with j t h covariate, ¶ SE is the estimate of the standard error while ¶ The classical statistics fit the logistic regression by means of an iterative procedure like maximum likelihood.In many situations, as a result of the assumptions underlying this iterative procedure, the estimation in classical statistics may result in non-convergence.These shortcomings as a result of non-convergence can be addressed using Bayesian inference as an alternative approach.Markov chain Monte Carlo algorithm approach can be used to provide a very general recipe for estimating properties of complicated distributions in Bayesian statistics without any difficulties.There obviously remain, however, some challenges in concluding whether Bayesian methods perform better than classical statistics in modeling breast cancer data.In this paper, we compared the result of classical statistics and Bayesian method.As a requirement of the Bayesian approach, several diagnostics tests were performed to ensure convergence of the Markov chain Monte Carlo and the true reflection of the posterior distribution.

Bayesian Method
The methodology employed in this paper describes the Bayesian inference with emphasis on the posterior distribution, prior distribution as well as likelihood function for the Bayesian logistic regression.In the context of Bayesian statistics, data are regarded as fixed and unknown parameters as random variables.Considering a given parameter  and a set of observed data, the interest of Bayesian technique lies on the probability of the parameter  given the set of data  , i.e. ( | ) p .The main object of interest is the posterior distribution of the unknown parameters given the data.Mathematically, this can be written as Equation 8 is known as the principle of Bayesian method.Hence, () p  is the prior distribution while the likelihood ( | ) p  is represented as: Hence, the likelihood function is expressed: The preferred prior for logistic regression parameters is a multivariate normal distribution and is given as (15,17,18,20): [11] Hence, we have: From equation [8], replacing  with k  result in Therefore, the posterior distribution for logistic regression (fixed effects only) is represented as: Posterior distribution is usually of high dimension and analytically intractable which sometimes required knowledge of powerful integration.In order to overcome this complex nature of posterior, we employed Markov chain Monte Carlo (MCMC) algorithm.MCMC technique is among of the technique employed to generates the estimates of unknown parameters  and corrects the values generated in order to have a better estimate of the desired posterior distribution (21,31).All the analysis in this study was carried out using WinBUGS (21).There were 1500000 iterations, with the first 5000 discarded to cater for the burn-in period.We assessed MCMC convergence of all models parameters by checking Gelman Rubin plot (17,18).The scale reduction factor, also known as Gelman-Rubin convergence diagnostic (39) was calculated for model parameter to assess convergence and adequate mixing of the chains.The test statistics for the Gelman Rubin diagnostic test can be estimated as where  is the number of iterations of the chains.

Socio-demographic characteristics of participants
Out of 237 patients, 192 cases accounting for (81.01%) were malignant breast lesions, while 45 cases (18.99%) were benign giving a ratio of 4.3:1 for malignant to benign breast lesion.The mean age of the respondents was 42.2 ± 16.6 yr with 52% of the women aged between 35-49 yr.93.67% of those who participated in this study were Christians while 6.33% were Muslims.Various prognostic factors are considered which include: intercept (λ0), breast cancer types: malignant (λ1), age: 35-49(λ2), 50-69(λ3),70+ (λ4), level of education: at least high school (λ5), religion: Christian (λ6), tribe: Yoruba (λ7), occupation: site of the cancer (λ8), hospital (λ9).Table 1 gives a view of the sampled data with the following attributes: type of breast cancer, age, educational status, race, ethnicity, hospital and site of the breast cancer.The association between the variables of interest and the treatment of breast cancer patients received was calculated for the entire study (Table 1 model 3).Breast cancer (malignant) type was associated with the type of treatment given to patients, while other socio-economic factors were not significantly related, (model 3).The treatment given to breast cancer patients should be first based on the diagnosis of the breast cancer type.Hospital type was significant when breast cancer type was not included in the model (model 2).Hospital was significant but it's influence passes through the type of breast cancer a patients was diagnosed.
We can then say that the hospital a patient attended after being diagnosed with breast cancer played an important role in their chance of survival.Model 1 in Table 1 shows that the age and the hospital type were both significant with re-spect to the modality of treatment given to patient.The exclusion of type of breast cancer and educational status as a determinant for patients' possibility of receiving surgical treatment makes age and hospital to be significant (model 1).

Assessing the performance of Markov Chain Monte Carlo (MCMC) chains in WinBUGS
The performance of a diagnostic test can be examined in several ways.The diagnostics examined in this paper are those of Geweke (15), Heidelberger and Welch (16), and Raftery and Lewis (17), which look at convergence of an individual chain, and that of Gelman and Rubin (18), which bases convergence on analysis of multiple chains.In order to assess whether a chain has converged or not, we plot the sampled value against its number in the chain.Figure 1 displays the representation of the parameter behavior after 1,500,000 Monte Carlo repetitions.It was found that the kernel densities for shape and scale parameters exhibit approximately symmetric distribution.When the time series centered around a constant mean, it implies that the chain has reached conver-gence as reported in Fig. 2 and have the chains converged to the same solution.

Discussion
Our results confirm findings from previous studies conducted in Nigeria on prognostic factors and breast cancer among women in Nigeria (1, 8, 2-13).Age could be a risk factor for breast cancer.Similar studies have been documented in other parts of Africa and the rest of the world (28,29).Compared to the literature, the main contribution of the present study is the analysis of risk factors associated with modality of breast cancer treatment in western Nigeria in addition to the use of Bayesian approach.Previous studies that focus on the pattern of treatment given to breast cancer patients across time frame (years), age at diagnosis and educational categories are non-existent in western Nigeria.To the best of our knowledge, this could be the first study to address such scenario.For breast cancer treatment in Nigeria, the reason for surgery treatment on cancer patients is not clear (26).The present study showed prognosis factors for breast cancer treatment in western Nigeria.In this study, patient with at least high school education is significant with treatment modality given to breast cancer, which is a new and unexpected finding in this part of Nigeria.From our analysis, the higher proportion of those treated with surgical treatment is found among educated and younger women.One of the possible explanations for these results may be that women with at least high school education presented their breast cancer cases to a medical practitioners than the less educated women.The result of Bayesian also reveals that those women with at least high school education are 2 times more likely to take surgery treatment than others.Other studies have shown similar result but not with treatment modality as investigated in the current finding (28)(29)(30)32).We can infer from our findings that literacy level may influence treatment-seeking behavior.This finding could be used as an avenue to create awareness and advocacy campaigns that target less educated women in western Nigeria.Therefore, in the context of Nigeria, this is a good contribution in the area of breast cancer modeling as previous studies only found age at diagnosis as the risk factor that influence treatment pattern.However, our study did not observe the influence of education on breast cancer patients, a recent study by (33) observed the impact of educational level on breast cancer risk using Swedish data.Women with higher education were more likely to be diagnosed with in situ breast cancer.In addition, patients who had low level of education were at higher risk of late presentation of their breast cancer (34,35) .67 times more likely to receive surgical treatment than their counterparts.This suggests that younger patients in western Nigeria were more likely to be treated with surgery, compared to the older patients, an observation supported by previous studies (3,29,30).Some studies attributed poor prognosis of breast cancer with age and the report indicated further that younger age is affected most due to an increase in invasive breast disease among this age group (3,30,32,36).Our result also shows that malignant breast lesion was a predictor of breast cancer treatment and it appeared to have higher distribution among those who had at least high school education, an observation that supports previous studies (3,(37)(38)(39)(40)(41)(42) that malignant breast lesion is predominant in this part of Nigeria.

Conclusion
We have demonstrated the socio-demographic factors associated with treatment modality of breast cancer in western Nigeria.Since understanding the risk factors associated is necessary for curbing the menace of breast cancer prevalence, these results provide a rationale for the need to create more awareness campaigns, strategies and sensitization that target less educated women to enhance patronization of breast cancer screening in Nigeria as a whole.

Table 1 :
Classical logit model for independent variables

Table 2
is the results for the model with Bayesian non-informative prior.From the 95% posterior intervals of λj, we observe that only the posterior distribution of the age and malignant are away from zero, indicating a significant effect of these variables on the modality of treatment patients received from this part of Nigeria while other predictors considered in this study are not significant.

Table 2 :
Posterior Summary for Bayesian multilevel approach with non-informative . An observation attributed to low level of awareness and knowledge about breast cancer treatment and screening.Another finding from this study is that age at diagnosis is associated with treatment modality.Majority of the previous studies have indicated that age is an important prognostic factor for breast cancer screening but not many studies have examined age at diagnosis as a factor that influences the modality of breast cancer treatment in Nigeria particularly in western region.Therefore, this study adds to the large body of research indicating that age at diagnosis influence the pattern of treatment modality for breast cancer patients in Nigeria.Considering the effects of age in the treatment modality of breast cancer patient, Bayesian findings highlight that patients aged 35-49 yr are 0