Statistical data analysis of cancer incidences in insurgency affected states in Nigeria

This article provides details about the various cancer types recorded in Northeastern states of Nigeria currently being affected by insurgency in Nigeria. The dataset was described and chi-square test was used to determine the dependency of the variables under consideration on each other. Also, linear, logarithmic, inverse, quadratic, cubic, power, growth, exponential and logistic regression models were fitted to the dataset to show the relationship between them.


a b s t r a c t
This article provides details about the various cancer types recorded in Northeastern states of Nigeria currently being affected by insurgency in Nigeria. The dataset was described and chi-square test was used to determine the dependency of the variables under consideration on each other. Also, linear, logarithmic, inverse, quadratic, cubic, power, growth, exponential and logistic regression models were fitted to the dataset to show the relationship between them.
& 2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Medicine More specific subject area Oncology, Public health, Biostatistics Type of data

Data
The data set represents the age, gender and topological (Top) location of cancer on the body of cancer patients in the University of Maiduguri Teaching hospital located in Maiduguri, the capital of Borno state, Nigeria.
The teaching hospital is the only tertiary health care facility in the state and often serves the other northeast states like Yobe, Taraba, Adamawa, Bauchi and Gombe.
A total of 1671 patients were considered between the period of study and SPSS version 20 was used to perform the analysis. The dataset is available as Supplementary data while a brief summary of the data is presented in Table 1.
It was observed from Table 1 that information about the gender of a patient was not available, hence the missing data of 1.
The frequency distribution of the gender of the patients is presented in Table 2. The frequency distribution of the patients' age is presented in Table 3. The various parts of the body affected by cancer incidences and the number of people affected (frequencies) are indicated in Table 4. Table 4 shows that the part of the body affected mostly is the prostate gland. This is represented graphically in Fig. 3.

Experimental design, materials and methods
The data set was obtained from the patients' records at the data center of the University of Maiduguri teaching hospital. The hospital as stated earlier serves a large population from the six Northeastern states of Nigeria and beyond. The Northeastern region in particular and the entire northern region of the country is in variance with their natural endowments such as vast fertile lands, rivers and lakes for irrigation, mineral resources and abundant sunshine for renewable energy. The weak social structure of the region has resulted to excruciating poverty which often manifest as homelessness and destitution, insurgency, violence and crime [1]. The region has high poverty index, low human development index, lack of portable drinking water, electoral violence, dearth of medical personnel, high mortality, low life expectancy, decayed infrastructure and also an epicenter for joblessness, underage and teenage pregnancy, female genital mutilation, epidemics, illiteracy, malnutrition and now terrorism which comes in form of coordinated attacks on military, police formations and remote villages, guerrilla attacks, kidnappings, regicide, suicide bombings, mass killings, abduction of school girls, extra-judicial killings and summary execution, hypnotizing and forced conscriptions, indoctrination and forceful conversion to Islam and so on. The decadence is assumed to be as a result of corruption, tribalism, military intervention in governance, inequality, misappropriation, financial recklessness, bankrupt of ideas and dearth of developmental agendas, reduction of allocation of capital due to shortfalls of Nigeria revenue as a result of decline in crude oil price. Globally, efforts towards improving the healthcare and reducing the incidence of cancer have yielded desired results except in some developing countries. Hence, cancer related deaths remain stubbornly high in those countries. Cancer awareness, screening, prevention, management, treatment strategies are very low in the region/area studied in this article. Regrettably, capital allocations to the health sector are inadequate and the available funds are often allegedly diverted by corrupt government officials.
In addition, maternal death is one area that is currently affected by the Boko haram insurgency in that region as reported by [2]. Moreover, other areas have been seriously affected; for example; food security and dynamics, under five malnutrition, child mortality, escalation of cholera outbreaks, infections, sexually transmitted diseases, unsafe birth practices and abortion, child prostitution, sex for food at the displaced persons camps, increase in polio cases, See [3][4][5][6][7][8] for details. Some related article can also be explored .
Next, we analyze the dataset collected using Chi-square test of independence and curve estimation.

Chi-square test of independence
Chi-square test of independence was used to investigate the relationship between the location of the cancer (top), gender and age of patients.
2.1.1. Test of independency between "Top" and gender of the patients Hypothesis Testing I: Remark: Table 2 indicates that there are more female patients with cancer diseases than males. This is represented in a pictorial form in Fig. 1. H 0 : There is no significant association between the topological location of cancer and the gender of the patients.
Versus. H 1 : There is a significant association between the topological location of cancer and the gender of the patients.
The result of the analysis is presented in Table 5.
The information about the correlation coefficient and its corresponding p-value is presented in Table 6.

Test of independency between "Top" and age of the patients
Hypothesis Testing II: H 0 : There is no significant association between topological location of cancer is not dependent on the age of the patients.
Versus. H 1 : There is a significant association between topological location of cancer is dependent on the age of the patients.
The result of the analysis is presented in Table 7. Information about the correlation coefficient and its corresponding p-value is presented in Table 8.

Curve estimation
Linear, logarithmic, inverse, quadratic, cubic, power, growth, exponential and logistic regression models were fitted to the dataset. "Top" is the dependent variable while Age is the independent variable. The summary of the variables used is presented in Table 9.  Table 3, the lowest age captured is 3 years old while the oldest patient is 95 years old. The cancer diseases affected both young and old but particularly, the age of the patients with highest number of cancer incidence is 60 years old. This information is represented in Fig. 2.

Simple linear regression
The summary of the simple linear regression model is presented in Table 10.
The corresponding analysis of variance (ANOVA) table testing for the fitness of the model is presented in Table 11. The linear regression model is significant at 0.05 level of significance and with R-square value of 3%.

Logarithmic model
The summary of the logarithmic model is presented in Table 12.
Estimating the model parameter gives the result in Table 13. The ANOVA table for the logarithmic model is presented in Table 14.
The logarithmic model is significant at 0.05 level of significance and with R-square value of 1.7%.

Inverse model
The summary of the inverse model is presented in Table 15.
The result for the estimation of parameters using the inverse model is presented in Table 16.
The corresponding ANOVA table is presented in Table 17. The inverse model is not significant as its p-value is greater than the level of significance (0.05).

Quadratic model
The summary for the quadratic model is presented in Table 18.
The result for the estimation of parameter using the quadratic model is presented in Table 19.
The corresponding ANOVA table is presented in Table 20.
The quadratic model is significant at 0.05 level of significance and with R-square value of 3.8%.

Cubic model
The summary for the cubic model is presented in Table 21.
The result for the estimation of parameter for the cubic model is presented in Table 22.
The corresponding ANOVA table is presented in Table 23. The cubic model is significant and with R-square value of 3.9%.

Power model
The summary for the power model is presented in Table 24.
The result for the estimation of parameter for the power model is presented in Table 25.
The corresponding ANOVA table is presented in Table 26.  The power model is significant at 0.05 level of significance and with R-square value of 2.5%.

Growth model
The model summary for the growth model is presented in Table 27.
The result for the estimation of parameter of the growth model is presented in Table 28.
The corresponding ANOVA table is presented in Table 29.
The growth model is significant at 0.05 level of significance and with R-square value of 4.7%.

Exponential model
The model summary for the exponential model is presented in Table 30.
The result for the estimation of parameter for the exponential model is presented in Table 31. Remarks: The null hypothesis (H 0 ) is rejected since the p-value (0.000) is less than the level of significance (0.05). Therefore, it can be concluded that there is a significant association between the topological location of cancer and the gender of the patients.  Remarks: Since the p-value is also less than 0.05, we conclude that there is a significant association between the topological location of cancer and the age of the patients. The corresponding ANOVA table is presented in Table 32. The exponential model is significant at 0.05 level of significance and with R-square value of 4.7%.

Logistic model
The model summary for the logistic model is presented in Table 33. The estimation of parameters for the logistic model is presented in Table 34.  The independent variable is Age. The independent variable is Age. The independent variable is Age. The corresponding ANOVA table is presented in Table 35.
The logistic model is also significant at 0.05 level of significance and with R-square value of 4.7%.
Lastly, all the fitted models are illustrated in Fig. 4. The independent variable is Age. The independent variable is age.

Table 16
Parameter estimation using inverse model. The independent variable is age. The independent variable is age. The independent variable is age. The independent variable is age. The independent variable is age. The independent variable is age.

Table 25
Parameter estimation for the power model.  The independent variable is age. The independent variable is age.  The independent variable is age. The independent variable is age.

Table 31
Parameter estimation for the exponential model.  The independent variable is age. The independent variable is age.  The independent variable is age.

Important points
More females are infected with cancer than men. The age with the highest record (or incidence) of cancer is 60 years old. The part of the body that is mostly affected by cancer is the prostate gland (based on the data set collected).
There is a significant association between the topological location of cancer and the gender of the patients.
There is a significant association between the topological location of cancer and the age of the patients.
All the models fitted to the data produced low R-square values; nevertheless, the models that best fit the data based on their R-square values are growth model, exponential model and logistic model.