Risk Factors Affecting Survival Time of Breast Cancer Patients: The Case of Southwest Ethiopia

Background: Breast cancer is one of the non-communicable diseases and the main origin of the loss of life in the world. In Ethiopia, breast cancer is the second common cancer health problem for women. The main objective of this study was to identify the potential risk factors affecting the survival time of breast cancer patients in Southwest Ethiopia. Study design: A retrospective study design. Methods: The data were taken from the patients’ medical records that registered from January 1, 2015, to January 31, 2020. A retrospective study design was used in this study. Different shared frailty survival models were employed to analyze the dataset. Results: Out of 642 recorded breast cancer patients, 447(69.6%) cases died during the study period, and 195 (30.4%) patients lost follow-up for unknown reasons. The median time to death for breast cancer patients was 10 months, and hospitals were used as a cluster effect. The result revealed that women with no smoking habit had about 3.35 times higher survival time than patients who had a smoking habit, and as breast cancer patients age increased, the survival time decreased by 0.99. Moreover, breast cancer patients in rural areas had about 0.14 times lower survival time, compared to breast cancer patients who were urban residents. Conclusions: Age, place of residence, treatment taken, stage, histologic grade, tumor size, oral contraceptives, and smoking habits led to a shorter survival time. To reduce the burden of breast cancer, awareness should be given to the community.


Introduction
reast cancer is amongst the category of noncommunicable diseases 1 . It is one of the root origins of loss of life, the highest commonly analyzed cancer, and the top cause of cancer death in women all over the world 2,3 . Globally, approximately 24.2% of new cancer cases and 15% of deaths occurred in 2018 4,5 . From this, 60% of deaths were observed in low-and middle-income countries 3,6 . By 2040, the projection of cancer is expected to be 28.4 million cases and 47% exceedance from 2020 with a larger increase in transitioning versus transitioned countries due to demographic variations though this may be further worsened by increasing risk factors associated with globalization and a rising economy 7 .
Breast cancer is held responsible for 28% of total cancer, and more than 24% of the incidence of breast cancer was recorded in Africa. The highest incidence rate of breast cancer was observed in North Africa, followed by East Africa 8 . Furthermore, Sub-Saharan African countries had the highest incidence rate with the highest age-standardized breast cancer death 2 . Nowadays, most countries of Africa face a double burden of cervical and breast cancer, which embodies the top cancer killer in women who are at least 30 years old 7 . Generally, in developed countries, breast cancer is a prominent source of loss of life among females 2 .
In Ethiopia, breast cancer is found to be the major cause of death 9 . Approximately, 22.6% of the breast cancer incidence and 17% of breast cancer deaths were observed 5,10 . Most of the women living in rural areas regularly pursue remedies from old-style therapists earlier than seeking help from the government health organization 11 .
In 2018, the estimated prevalence of breast cancer cases in Ethiopia was 13,987 with a crude incidence rate of 28.2 per 100,000 population, and it accounts for 33% of all cancer cases among women 10 . Studies showed that breast cancer is often diagnosed at an early stage, and patients have a good prognosis in developed countries. However, it is more often diagnosed at B The main objective of this study was to identify risk factors affecting the survival time of breast cancer patients. In this study, time to death of breast cancer patients' datasets were collected regularly until the patients died, fully recovered, or lost to follow-up due to breast cancer. Therefore, the data constitute survival data structures, and when the patient died, it was considered the main event of interest 19 . Accordingly, for such type of data, it is necessary to apply survival models. Kaplan Meier non-parametric survival models were used to estimate the survival time of patients 20 or parametric models, such as the parametric shared frailty models, were applied in this regard [21][22][23][24] . In this study, the parametric shared frailty models were proposed to apply for modeling and inference of time to death of breast cancer patients.

Data Source
A retrospective study was conducted in four randomly selected hospitals in Southwest Ethiopia. These four hospitals include Jimma Medical Center, Bedelle, Mizan-Aman, and Mettu Karl. The hospitals were selected by a simple random sampling method. The study population included all breast cancer patients who were registered at the selected hospitals with regular follow-up from January 1, 2015, to January 31, 2020. However, women who had cancer from another site, and those with insufficient information in the registration books were not eligible for the study. Therefore, women who were identified with confirmed breast cancer clinically and histologically, and those with full information in the registration books were eligible for inclusion. The starting point was when the women received treatment or were diagnosed at the hospital, and the ending point was when they died from breast cancer. A total of 642 cases were obtained using simple random sampling techniques with 95% confidence intervals.

Ethical Clearance
The Research Ethics Review Board of Jimma University, Jimma, Ethiopia, has provided ethical clearance for the study. The written formal cooperation letter was sent to the Jimma Medical Center, Bedelle, Mizan-Aman, and Mettu Karl where data were obtained. The study was conducted without individual informed consent or without including the name of the patients because it relied on retrospective data. The fiveyear card-based recorded data were obtained with their corresponding covariates.

Study variable
The response variable is the time to death or survival time of breast cancer patients (which is measured in months) with an indicator of time of diagnosis and time to one of the event "death" that can be considered the event of interest in this study and coded as "1". Moreover, lost to follow-up, dropped out, transferred to other hospitals with unknown reasons were considered censored and coded as "0". To investigate the effect of risk factors on the survival time of breast cancer patients, factors known to affect the survival time regarding breast cancer were measured. These factors were classified as socio-demographic and clinical factors. The demographic factors were baseline age of patients, alcohol consumption categorized as yes or no, breastfeeding categorized as yes or no, smoking habits categorized as yes or no, and place of residence categorized as urban or rural. The clinical factors included treatment taken, stage of breast cancer which was according to the staging of cancer that was done using the American Joint Committee on Cancer 2002 system categorized as Stage I, II, III, and IV based on body mass index (BMI) kg/m 2 : obese (≥30.0), overweight (25.0-29.9), normal weight (18.5-24.9), and underweight (<18.5) 25 , histological grade (well-differentiated, moderately-differentiated, and poorly-differentiated), tumor size (<2cm, 2-5cm, and >5cm), and family history (Yes or No) were risk factors considered in this study.

Statistical analysis
The survival analysis was applied in this particular study. In survival analysis, there are often observations that need to be grouped together on the basis of the study center, city, and region. In this condition, the assumption of a homogenous population failed because of the unobserved covariates of individuals belonging to the same group. Since the assumption of homogeneity is failed, the appropriate way to handle unobserved heterogeneity is introducing the frailty term 21,26 . For determining the frailty effect, the most commonly and widely used distributions are Gamma and Inverse Gaussian, which act multiplicatively on the baseline hazard 23 . Because of its computational suitability, Gamma and inverse Gaussian were used as the frailty distribution for this study.

A shared gamma frailty model
The two-parameter gamma density function for the frailty term i  with shape parameter k and scale parameter  is given by The Laplace transformed version of this density has the form 23 : The solution of the first and second partial derivatives of the Laplace function (.)  with respect to s solved to 0 gives the mean and variance of the frailty term,

The inverse Gaussian frailty model
The probability density function of an inverse Gaussian distributed frailty random variable i  with parameters 0   and 0   is given by 23 : where the Laplace transform of this function has a form . We set 1    for simplicity reason and we have that

Parametric estimation
There are various types of R-packages available. The parfm package 27 was used for estimating the parameters of the parametric shared frailty models proposed in this study. The estimates and standard errors of the parameters of interest can be obtained from the parfm package.

Model comparison and diagnostics
Akaike Information Criterion (AIC) 28 was used in this particular study, and the model with the smallest AIC value is considered a better fit 27 . After the model has been compared, it is crucial to check the effectiveness of the model in explaining the outcome. The identified accelerated failure time model should be linear and goes through the origin with the baseline distribution 20 .

Results
Out of 642 breast cancer patients, 315 (70.8%) patients living in urban areas died, while 132 (67.0%) cases died in rural areas due to breast cancer. During the study period, out of 642 patients who had the smoking habit, 426 (70.6%) cases died, compared to 21 (53.8%) cases among non-smoker. Regarding tumor size, 83 (64.3%), 167 (71.4%), and 197 (70.6%) patients who had tumor size of below 2cm, between 2cm and 5cm, and above 5cm, died, respectively (Table 1). The observed difference in survival experiences in different patient groups was also assessed using the Long-rank and Breslow test 20 . Table 2 shows a significant survival time difference in terms of smoking habit, treatment taken, stage of breast cancer patients, family history of breast cancer, and breastfeeding at a 5% significant level. Since the null hypothesis was rejected for these risk factors, post hoc analysis was conducted to perform pair-wise comparisons among the categories of factors. The survival curves of breast cancer patients were different regarding smoking habit, treatment taken, stage of breast cancer, family history of breast cancer, and breastfeeding. However, place of residence, tumor size, obesity, histologic grade, alcohol consumption, and oral contraceptives did not show a clear difference (Table 2) Furthermore, Table 3 indicates the summary status of breast cancer patients in Southwest Ethiopia. From this summary, the median follow-up time was 10 months for patients that were censored, and about 75% of the patients had 14 months of follow-up (upper quartile). The median time to death by breast cancer was obtained at 10 months.

Comparison of Models with Akaike Information Criterion
This study was conducted by considering the four-baseline hazard functions, such as Weibull, Log-logistic, Log-normal, and exponential, as well as two frailty distributions, such as Gamma and Inverse-Gaussian. Accordingly, the Gamma and Inverse-Gaussian shared frailty model was fitted to select the best model for this study using hospitals as random (frailty). The AIC value of the Log-normal-inverse Gaussian model was 2838.15, and the minimum among the other AIC values of the models indicated that it was the most efficient model to describe the breast cancer dataset using parametric frailty models (Table 4).
Based on the result of the Log-normal-inverse Gaussian shared frailty model, the age, place of residence, treatment taken, stage, histological grade, tumor size, smoking habits, and oral contraceptives were significant. However, obesity, family history of breast cancer, and breastfeeding were not significant variables (Table 5).  In the shared frailty model, the most important thing is the interpretation of the acceleration factor. It can be interpreted as if 1 is not included in the acceleration confidence interval, then the factors are statistically significant; otherwise, they are not significant. An acceleration factor (ϕ) greater than 1 specifies prolonging the time to death. From Log-normalinverse Gaussian shared frailty model, it was found that the increase of age (ϕ=0.99; 95% CI: 0.98-0.99) led to a decrease in the survival time of breast cancer patients. The acceleration factor for the rural residents was 0.14. This implies that rural residents had a shorter time to death, compared to breast cancer patients who were urban residents.
Patients at stage IV had an acceleration factor of 0.32 (95% CI: 0.22-0.46) which indicated that patients in stage I had a longer survival time, compared to breast cancer patients at stage IV. Moreover, the patients who had no smoking habit had longer survival time than those who had a smoking habit (ϕ=3.35; 95%CI: 2.12-5.27). In addition, patients who used oral contraceptives (ϕ=0.78; 95% CI: 0.64-0.96) had a lower survival time, compared to those who used no oral contraceptives. The acceleration factor and its 95% CI for poorly-differentiated histologic grade were 0.64, as well as 0.50 and 0.83, respectively. This indicates that poorlydifferentiated histologic grades for breast cancer patients had lower survival time than well-differentiated histologic grades for breast cancer patients. The estimate of the shape parameters in the Log-normal-inverse Gaussian shared frailty model is (δ=0.214). The heterogeneity in the population of the treatment center which is used as a cluster is estimated by our selected model at θ=0.998 (P=0.004), and the dependence within clusters was about τ=33.3%. There were differences in the death rate of patients in different hospitals of Southwest Ethiopia.

Model Diagnostics
If the plot of the Weibull, Log-normal, Log-logistic, or exponential are linear, the given baseline distribution is appropriate for the given dataset 29,30 . Accordingly, their respective plots are given in Figure 1, and the plots for the Lognormal baseline distribution make the straight lines better than Weibull, exponential, and Log-logistic baseline distributions.

Discussion
Different accelerated failure time models were applied to analyze the datasets since there was heterogeneity in the population of the treatment center (hospitals), which was used as a cluster effect. Based on the AIC value, it was found that Log-normal-inverse Gaussian was the best fit for the datasets. In addition, it was observed that as the stage of breast cancer increased, the survival rate of breast cancer patients decreased. This result is consistent with the findings of the studies conducted by Hoang P et al. and Mensah A et al. 31,32 . Based on the results, as the ages of breast cancer patients increase, the survival time decreases. This finding is also in line with the results of a study performed by Allemani C et al. 33 This evidence also strengthens the decision made by the AIC values that Log-normal baseline distribution is appropriate for the given dataset. The findings of this study revealed that breast cancer patients who had a tumor size of between 2cm and 5cm and above 5cm had shorter survival time, compared to those who have tumor size below 2cm (which was used as a reference category). This result was similar to the findings of a previous study carried out in Ghana 32 .  In this study, breast cancer patients with poorlydifferentiated histologic grades had lower survival time, compared to well-differentiated histologic grades (reference group). This finding is consistent with the results of the studies conducted by Baghestani AR et al. and Alotaibi RM et al. 16,17 . The result also indicated that the breast cancer patients who lived in rural areas had lower survival times than those who lived in urban areas. The result is verified by the studies carried out by Tolessa L et al. and Balekouzou A et al. 18,34 . This is due to less awareness about health-related issues in rural areas. Moreover, even those who had awareness might not have had access to health centers due to limited resources in the hospitals.
The study revealed that breast cancer patients who were oral contraceptive users had lower survival time, compared to those who did not use oral contraceptives. However, this result is not consistent with the findings of a study performed by Brinton LA et al. 35 , which indicated that oral contraceptive use did not seem to increase the risk of breast cancer. However, oral contraceptive use before a first full-term pregnancy or for more than five years can modify the development of breast cancer.
In this study, breastfeeding, obesity, and family history were not found to be the risk factors that affect the survival time of breast cancer patients. However, different studies reported that women with a family history of cancer, and those who were obese were more likely to be affected with breast cancer 18,36,37,38 . Smoking habit is an important risk factor significantly influencing breast cancer in this study. Women who did not smoke had prolonged their survival time three times more than those who did not smoke. This result is consistent with the findings of a study conducted by Catsburg C et al. 39 . They found a strong association between smoking and breast cancer; moreover, they noted that the timing of cigarette exposure was also important in this regard.

Strength and Limitations of Study
Regarding the strength of the study, different shared frailty survival models were applied to identify the risk factors affecting the survival time of breast cancer to handle heterogeneity in hospitals. Second, an appropriate sampling design was applied to collect the datasets. However, in this study, most clinical variables were not included, and they were only limited to the variables mentioned in the methodological part since they were not with full information. This in turn might affect our conclusions.

Conclusions
Factors, such as age, place of residence, treatment taken, stage, histologic grade, tumor size, smoking habit, and oral contraceptives were significantly influencing breast cancer patients. The breast cancer patients with higher age, smoking habit, oral contraceptive use, poorly-differentiated histologic grade, stage IV of breast cancer, and rural residency had shorter survival time. On the other hand, early stage (stage I), well-differentiated histological grade, urban residency, lack of oral contraceptive use, and no smoking habit had prolonged survival time of breast cancer patients, compared to others. Awareness should be given to the community to reduce the burden of breast cancer.