Bayesian Cox Model with Categorical Predictors for Time to Event Breast Cancer Data

Survival analysis has become a standard tool for modeling cancer trial data when the event of interest is “time to event”. Cox regression, which implements the proportional hazards model, is designed for analysis of time until an event or time between events, introduced by Cox (1972) in order to estimate the effects of differ- ent covariates influencing the time-to-event data. This model has been used extensively in time to event of cancer trial data for given categorical predictor variables. The Bayesian analysis has advantageous in dealing with small sample of censored data more than a frequentist method. The main objective of this article is to apply a Bayesian Cox model and is being compared with a frequentist method. Gibbs sampling technique is used to assess the posterior quantities of interest and to avoid the complexity in calculations. The posterior is arrived using SAS package.


INTRODUCTION
In many cancer trials, the applications of proportional hazards model is often a more realistic model than the other survival models in the analysis of time to event data. The Cox proportional hazards model constructs an analytical model for time-to-event breast cancer data. The model produces a survival function that predicts the probability that the event of interest has occurred at a given time t for given values of the predictor variables. The predictors are in nature of categorical type, shape of the survival function and the regression coefficients for the predictors are estimated from observed subjects.
To describe the distribution of survival time has assumed that the hazard function is completely specified given the baseline hazard function and the values of the covariates. In cancer studies, there may be factors other than the measured covariates that significantly affect the distribution of survival time. This condition is often referred to as heterogeneity of the subjects. Among the early papers of Vaupel, Manton and Stallard (1979) who used the concept of to describe the differences in survival time apparently among similar individuals. Hougaard (1995) presented an overview of the models, proposed for the use in time to event data. Aalen (1994) also provides a relatively nontechnical summary with a focus on fully parametric models. Klein and Moeschberger (1997) presented methods based on incorporating in proportional hazards models and its technical details.
There are two approaches to a Bayesian analysis of the Cox Proportional Hazards Model (PHM) .This is based on the partial likelihood L(β; y) combined with a prior π(β) which produces the posterior π(β|y). The baseline hazard is left unspecified. Inference is made from samples drawn from π(β|y). Thus the partial likelihood is treated as a likelihood function just as in the classical analysis. The basic idea of a model is to incorporate a comparison between Bayesian Cox PHM with non Bayesian approach of Cox PHM.
which may be re-expressed as (1) (2) Using the relationship between the survival and hazard function, it has the conditional survival function as In the Bayesian point of view, the variance "t", would be expressed as a hyperparameter and prior knowledge concerning its value which will be summarized in a hyper prior distribution.

GIBBS SAMPLER ALGORITHM
The Gibbs sampler technique is one of the best known MCMC sampling algorithms in the Bayesian computational methods. The Gibbs sampler by Grenander (1983), . The performance of a Metropolis-Hastings algorithm depends on the choice of a proposal density q. The Metropolis-Hastings algorithm can be used within the Gibbs sampler when direct sampling from the full conditional posterior is difficult.

PRIOR
Prior elicitation perhaps plays the most crucial role in Bayesian inference. Survival analysis with covariates, the most popular choice of informative prior for b is the normal prior, and the most common choice of non informative prior for b is the uniform prior. The non-informative and improper priors may be useful and easier to specify for certain problems, but they cannot be used in all applications, such as model selection or model comparison, as it is well known that proper priors are required to compute Bayes factors and posterior model probabilities (Ibrahim, et al., 2004). Also non informative priors may cause instability in the posterior estimates and lead to convergence problems for the Gibbs sampler. Moreover, non informative prior do not make use of real prior information that one may use for a specific application.

APPLICATION
We consider the database consisting of 368 breast cancer women patients diagnosed at Cancer Institute (WIA), Chennai, India and follow-up period up to 180 months, represented by the variable Time. The event of interest was time to death in months. A censoring indicator variable, Status, is created from the data, with the value 0 indicating a censored time and the value 1 indicating an event time.
Overall 187(51%) cases have experienced the event and 63% of 130 are of stage 3B cases.
The demographic and disease characteristics of the patients are given in table 1 From the table1, we see that death increases with the severity of stages and age. The event experienced cases among age group in more than 50 years is higher than the less than 50 years (Pari Dayal et al., 2013). The linear predictor is set equal to the intercept in the reference group (stage = 3); this defines the baseline hazard. The corresponding distribution of survival time is Gamma distribution (Cox and Oakes, 1984).
The Cox model with a parameter for each individual is using for identifying the risk variables for breast cancer patients. Here, the age and stages are considered as risk factors of categorical predictor variables. We analyzed the data assuming a Weibull distribution for the survivor function, and including random effect for each patient. The hazard model is as follows where AGE i has two levels(Age <50yrs(=1 as reference) and Age>50yrs), and stage has 3-level of categorical predictor covariates (stage2B = 1(as reference), stage 3A = 2 and stage 3B = 3) 1,2,3) (k STAGE ik = are dummy variables representing the 3-level factor for underlying stage.

Cox Proportion Hazards regression
The Cox proportional hazards model to these data, variables age group and stages, which are categorical variables, are also take part as stratified aspects. By default, they categorized by using the reference coding with the last category as the reference category. However, it can explicitly specify the reference category of our own preference. Here, Age Group(less than 50 years=0) is chosen as the reference category for age, stage (stage2B=1) is chosen as the reference category for stages of cancer. Coded variables are resulted (table below "Class Level Information") with the reference coding has zero and the rest are having one for all variables. Since age group variable is binary, the variable has a value of 0 for the reference category. The variable stage has three categories and is represented by two dummy variables.

Class Level Information Class
Value Design Variables Age group 0 0 Reference 1 1 The test results of individual model effects are shown in ta-ble2. There is a strong prognostic effect of stages on patient's survivivorship (p0.0007), and the survival times for patients of different age groups play a role other way that it differ non significantly (0.4083). In the Cox proportional hazards model, the effects of the covariates are to act multiplicatively on the hazard of the survival time, and therefore it is a little easier to interpret the corresponding hazard ratios than the regression parameters. For a categorical variable parameter, the hazard ratio is the ratio of the hazard rates between the given category and the reference category. The hazard rate of age group of less than 50 years is 1.137 times more that of Age Group more than 50 years, the hazard rate of stage of 3A is 0.951 times more that of stage2B, and the hazard rate of stage3B is 1.715 times more that of stage2B. Moreover there is evidence that the stage influences the event of interest.
Table4: Initial Values of the Chain on each Parameter The further illustrate the use of the backward elimination process to identify the effects that affect the survivorship of the breast cancer patients. It is specified to carry out the backward elimination which specifies the significant level for retaining the effects in the model. Hence, it results of the backward elimination process that it retained the stage and eliminated the age group subsequently.

Bayesian Cox Proportion Hazards Regression
Cox Proportion Hazards Regression uses the partial likelihood of the Cox model as the likelihood and generates a chain of posterior distribution samples by the Gibbs Sampler.
The Bayesian Cox PH model invokes the Bayesian analysis with generating samples by Gibbs sampler to maintain reproducibility to accumulate the posterior distribution samples using the data. By default, a uniform prior distribution is assumed on the regression coefficient Group. The uniform prior is a flat prior on the real line with a distribution that reflects ignorance of the location of the parameter, placing equal probability on all possible values the regression coefficient can take. Using the uniform prior, it would expect the Bayesian estimates to resemble the classical results of maximizing the likelihood (see: Table2). It should make sure that the posterior distribution samples have achieved convergence before using them for Bayesian inference.
This analysis generates a posterior chain of 10,000 iterations after 2,000 iterations of burn-in and it also displayed the names of the parameters and their corresponding effects and categories. Further it computes the maximum likelihood estimates of regression parameters are depicted in table3. These estimates are used as the starting values for the simulation of posterior samples.   Trace, autocorrelation and density plots are produced for each of the four parameters θ in figure1 also confirm the convergence of the Markov chain. It is imperative that these are examined before any conclusions are drawn from the simulated posterior samples. The results shown on the table6 are almost perfect. The trace plot show excellent mixing, the autocorrelation decreases to near zero, and the density is bell-shaped. The trace plots are centered near their respective posterior mean and later the posterior space with small fluctuations. For the theta which corresponds to the all group, the trace plot is centered near the posterior mean. Samples in both tails are covered. These results exhibit convergence of the Markov chain to its stationary distribution.

Figure1: Confirmation of convergence by Trace, autocorrelation and density plots
The first hazard ratio table compares the hazards between the age group less than 50 years and more than 50 years. Summaries of the posterior distribution of the correspond-ing hazard ratio are shown in Table 6. There is a 95% chance that the hazard ratio of the age group less than 50 years and more than 50 years lies between 0.655 and 1. The results which are presented in this paper followed the same trend and in fact it showed the reality. Results in all tables and all visual approximate estimates which are presented in this paper are consistent. The DIC is a suitable device to draw conclusions. Before drawing inferences from the posterior sample, we should examine the trace, autocorrelation and density plots for each parameter to be content that the underlying chain has converged. The plots for the two parameters shown the mixing in the chain is acceptable, although we notice long correlation times. The hazard ratio statement delivers the Bayes solution corresponding to the previous classical analysis in Table 2.
These results we can also use post sample to assess the posterior probability that the HR for age vs stage after the event of interest is <1. The probability is over 99%.
However, an enormous statistical knowledge is required for it to be used correctly. This approach provides an alternative validation that could be used to confirm results of 'frequentist' approach. Bayesian inference has a number of advantages over the frequentist approaches, mostly in the flexibility of model-building for time to event breast cancer survival data. In addition, for many models, 'frequentist' inference can be obtained as a special case of Bayesian inference with the use of non-informative priors (Ibrahim et al., 2001). The Bayesian approach enables us to formulate accurate inference based on the posterior distribution for any sample size, whereas the 'frequentist' approach relies heavily on the large sample approximation. The most important concern is that there is a risk involved in the erroneous usage of the Bayesian methods which could lead to improper data analysis