Time series analysis of malaria in Kumasi: Using ARIMA models to forecast future incidence

Abstract Malaria is a disease caused by the Plasmodium genus that is transmitted between humans by Anopheles mosquitoes. The study sought to assess the trends of Malaria incidence in Kumasi Metropolis and forecast future incidence. A retrospective comparative study design was employed using data from the Regional Health Directorate from January 2010 to December 2016. Trend of malaria prevalence was analysed and compared by years and months. Data used for the study was entirely secondary which was gathered from recorded monthly malaria cases at various hospitals in Kumasi. The Quadratic model was used for the forecasting of the half year incidence of Malaria while Auto regressive integrated moving average (ARIMA) (1, 1, 2) was used for forecasting monthly malaria incidence for the years 2018 and 2019 in Kumasi Metropolis. For the general pattern, July recorded the highest number of cases whereas January recorded the lowest cases in each year. Also, 2010 was the best performing year since it recorded the lowest number of malaria cases (10,336). The projected malaria cases for the first half year of 2018 is expected to be 61,371.8, while the second half year is expected to be 77,842.0. This model is recommended to the metropolitan health directorate and researchers who want to monitor the malaria reported cases in the metropolis and other parts of the world. It is suggested that measures should be put in place to curb malaria incidence during the period of the year when high incidence were recorded.


PUBLIC INTEREST STATEMENT
Malaria is common in the tropical and subtropical areas of the world. Human infection starts from a blood meal of an infectious female mosquito. The parasites existing in the infectious mosquito's saliva, enter the bloodstream of the human though mosquito bites and migrate to the liver. Malaria is endemic in African countries. This study sought to statistically analyse the malaria incidence in Kumasi metropolis in order to determine the trend pattern of malaria and project future incidence based on the trends. The study adopted the concept of time series analysis and used information on monthly malaria cases gathered from hospitals in Kumasi. The findings revealed that July records the highest number of Malaria cases. Moreover, it is expected that malaria cases in 2018 and 2019 will be lower than previous cases.

Introduction
Malaria is caused by the Plasmodium genus that is transmitted between humans by Anopheles mosquitoes (Thomas, 2014). P. falciparum and P. vivax are the most common species that cause malaria in humans. P. falciparum is the most dangerous because of the multi-drug resistance on this strain of the disease (Medical Research Council, 2001). A severe episode of cerebral malaria can result in epilepsy, cerebral palsy, or intellectual or physical disabilities (Davies & Eaton, 2018). These malaria victims are in the poorest, and sometimes most remote parts of the world, increasing the difficulty in finding support to cope with the disease (Davies & Eaton, 2018).
Malaria is both curable and preventable with medication therapy; however, a vaccine is not available. According to the World Health Organization, in 2012, there were approximately 207 million cases of malaria resulting in 627,000 deaths (World Health Organization, 2014). The overwhelming majority that is 90% of these cases occur in Africa (Medical Research Council, 2001). Most of the deaths occur in children. However, the rate of deaths in children has been reduced by 54% since 2000(World Health Organization, 2014. The Countries with the most confirmed cases are in sub-Saharan Africa and India. Moreover, malaria contributed to 2.05% to the total global death in 2000 and was responsible for 9% of all death in Africa (WHO, 2003). WHO also estimated that the total cost of malaria in Africa was US$ 1.08 billion in 1995 and US$2 billion in 1997 (WHO, 1997). Malaria is therefore a massive problem which plagues all segments of the society. Malaria remains a major health challenge to mankind all over the world (World Health Organization, 2013). This is tied to the report that over three billion people in the world stand the risk of having malaria (World Health Organization, 2013). Despite local and international efforts towards the prevention of the disease, the rate at which people become sick and eventually die as a result of malaria is outrageous (Adebayo, Akinyemi, & Cadmus, 2015).
The future values of these variables are often predicted from their history. A time series analysis often does this and therefore this study was conducted to identify the trends of malaria cases in the Kumasi Metropolis over the period 2010 to 2016 using time series analysis and to forecast future incidence for 2018 and 2019. A good way of describing the incidence of malaria is important and it will go a long way to ensure proper planning and evaluation in the implementation of programs to monitor and control the disease, especially in endemic zones.

Study design
A retrospective comparative study design was employed using Malaria cases as reported by hospitals in the Ghanaian city of Kumasi from January 2010 to December 2016. Trend of malaria incidence was analysed and compared by years and months.

Malaria data
The nature of the data used in the analysis was secondary data gathered from records indicating monthly malaria cases. The secondary data of the reported malaria cases from January 2010 to December 2016 was obtained from the various hospitals in Kumasi for the analysis. The data included monthly number of malaria cases among the various age groups.

Auto regressive integrated moving average model
An Auto Regressive Integrated Moving Average (ARIMA) model was used and applied to Time Series Data of Malaria incidence in Kumasi. The model looks for temporal dependence between successive observations (Helfenstein, 1991). Due to the transmissibility and seasonality of malaria, models with an ARIMA structure have more predictive power compared to other methods (Nobre, Monteiro, Telles, & Williamson, 2001); such models have been applied to predict numerous infectious and noninfectious diseases with similar periodic patterns over the past decades (Luz, Mendes, Codeço, Struchiner, & Galvani, 2008;Ture & Kurt, 2006). Another advantage of the ARIMA approach is the relative simplicity and stability of the model in predicting malaria cases in a context where poor resources have led to lack of detailed data, which makes it difficult to calculate parameters needed for construction of more complex models of malaria (Pascual, Cazelles, Bouma, Chaves, & Koelle, 2008).
ARIMA models are in theory the best models for forecasting a time series. The procedure involves fitting an appropriate model, estimating the parameters and verifying the model. This model was used to forecast malaria incidence for the year 2018.
The monthly reported cases from January 2010 to December 2016 were fed into Minitab Spread sheet to generate the best trend fit, which was given by the quadratic model, The seasonal indices (S.I) were obtained by dividing the actual values at time t by its respective trend forecast.
The quadratic trend model for the half year was given as In order to choose the best model to forecast for the year 2018, an output of measure of accuracy to determine whether the use of ARIMA (1, 1, 1), ARIMA (1, 1, 2), ARIMA (0, 1, 1) or ARIMA (0, 1, 2) was appropriate for the monthly forecast of malaria in Kumasi for the years 2018 was performed and the results indicated that the ARIMA (1, 1, 2) was best fit for forecasting monthly malaria incidence for the year 2018. This was as a result of low values for the MAPE, MAD and MSE as shown in Tables 1 and 2. Therefore, the model selected for forecasting monthly incidence of Malaria in 2018 was ARIMA (1, 1, 2) because this model had the minimum normalized BIC of 15.085. The ARIMA (1, 1, 2) model also gives the best fit for forecasting monthly malaria incidence since it has the least MAD (1,351) and MSE(3,658,866) among all the models.
Considering the results, ARIMA (1, 1, 2) appeared fit for forecasting monthly malaria incidence for the year 2018 in Kumasi metropolis.
Hence the model for predicting future malaria incidence was where, μ is a constant, α, β is a parameter and is the Z t residual term.
Substituting the estimates of the parameters, we obtained

Quadratic model
Also, in order to choose the best model for forecasting for the half year malaria incidence for 2018 and 2019, an output of measure of accuracy showing whether the use of Linear or Quadratic trend analysis was performed and the results have been summarized in Table 3.
The Quadratic model appeared to be fit for the forecasting of half year malaria incidence. This was as a result of low values for the MAPE, MAD and MSE. The quadratic trend model used for the forecast of half year malaria incidence was y t = 35,573 − 287t + 159t 2 The selection of the appropriate model also depended on the values of Normalized BIC and the ACF together with the PACF. The graphs of the ACF and PACF are shown in Figures 1 and 2 respectively. Four tentative models were examined and the model with the minimum Normalized BIC was chosen.
From the normal plot of residual, it was observed that the residuals did not deviate much from the straight line. This indicated that the errors were quite close to normal with no clear outliers. Thus, the normality assumption holds. The histogram of residuals confirmed this assumption. The plot of residuals vs. the fitted values exhibited no trend in dispersion. This indicated that the model satisfies the constant variance assumption. The plot of residuals vs. the order of the data suggested that the residuals were uncorrelated. This fact is supported by the Ljung Box statistics (Table 5) which gives non-significant p-values. Thus the independent assumption is not violated. Once the assumptions hold the model can be seen as valid for prediction.  The final estimate of parameters for the model (Table 4) shows that the AR (1) and MA (2) parameters have a p-value of 0.000, indicating a significant model parameter. The model converges after 25th iterations. The test statistic value of the constant is 13.31 and p-value is 0.000, indicating that the constant is significant at 5% and 10% level of significance.  Table 5 shows the modified Box-Pierce (Ljung-Box) χ 2 statistic. It can be seen that all the lags have a p-value greater than the level of significant (0.05). This indicates non-significance implying that this model was appropriate.
The statistical software used for the analysis was Minitab Version 16, SPSS Version 16 and Microsoft Excel 2013. The data for the analysis were grouped as Monthly malaria cases and Half year malaria Cases. The monthly reported cases from January 2010 to January 2016 were fed into a Minitab spread sheet and analyzed to obtain the descriptive statistics. That was followed by graphical and tabular exploration of yearly, monthly, and half year incidence of malaria and forecast of future incidence.

Ethical consideration
An introductory letter was sent to the various hospitals even though it was a retrospective study that made use of data which were monthly aggregates and anonymous counts of clinical cases gathered from the various hospitals in Kumasi.  Table 6, the highest standard deviation is seen in the year 2011, which implies that, the reported cases in that year had the   highest monthly variations whereas the lowest standard deviation was recorded in 2012 implying that the monthly reported cases did not vary significantly.

Malaria incidence in months
The monthly analysis of malaria cases consisted of 84 months. That is, 12 month each for the 7 year period. The seasonal indices are obtained and shown in Figure 2. The seasonal indexes plot indicates that, malaria cases in June, July, August, September, October and November are above the average mark of 100%, Whiles those of January, February, March, April May, and December are below the average. The averages were adjusted so that those below 100% were below the monthly average and those above 100% were above the monthly average. Hence, from Figure 2, malaria incidence for July was 31.4% (131.37-100%) above the rate of incidence. January incidence on the other hand was 22.96% (100-77.043%) below the average. The same calculation can be done for all the other months. Thus July had the highest incidence of malaria whilst January had the lowest.
In general, the second half of the year records higher incidence of malaria than the first half.

Time series plot showing half year malaria incidence
The time series plot of the half year reported cases from January 2010 to January 2016 is shown in Figure 3. The plot shows that the time series was non-stationary.
From Figure 3, the time series exhibit an increasing trend and seasonal variations implying that there has been an increment and seasonal fluctuation in malaria incidence for the half year.
The plot also showed that, there has been a consistent rise in malaria cases for the second half year over the first half year.

Half year malaria cases
The half year analysis of malaria cases consisted of fourteen half years. That is, two half years for each of the years from 2010 to 2016. The seasonal indices were obtained and shown in Figure 4. The seasonal indices plot in Figure 4 indicates that, malaria cases for first half year were below the average mark of 1 by 8.46% (1-0.91541), Whiles the second half year is above the average mark of 1 by

Yearly malaria cases
The time series plot of the Yearly reported cases from 2010 to 2016 is shown in Table 7. The plot shows that the time series is cyclical. From Table 7, the time series plot describes the total number of malaria cases for the period under consideration (2010 to 2016). The series also exhibits an increasing trend.

Quadratic trends and the fitted graph
The graph from Figure 5 shows average half year trends of malaria cases in Kumasi and its future average forecast. Table 8 gives the expected monthly forecast of malaria incidence using the model ARIMA (1, 1, 2). From Table 8, the forecast shows consistent but gradual increase in incidence from June to December, but a decrease from January to May.

Trend analysis for the half year malaria incidence for 2018 and 2019
A forecast for each half year for the next two years using the ARIMA (1, 1, 2) model is shown in Table 9. From the Table 9, the projected malaria cases for the first half of 2018 are expected to be 61,371.8 cases, while the second half of the year is expected to record 77,842.0 cases of Malaria. In the first half of 2019, it is expected that 70,161.6 cases of malaria will be recorded which will rise to 88,959.4 cases in the second half of the year. This suggests a general reduction of malaria cases from previous years.

Discussion
For thousands of years, Malaria which is a caused by mosquitoes has afflicted humans.
Preliminary and further analysis conducted was based on Yearly, Half Year and Monthly Malaria cases. In all these cases, malaria cases have shown a continuously increasing trend. Previous studies of malaria incidence and incidence rates across different countries including Burundi in Africa have suggested the existence of high-frequency, also known as intrannual and seasonal or annual cycles (Gomez-Elipe, Otero, van Herp, & Aguirre-Jaime, 2007;Roca-Feltrer, Schellenberg, Smith, & Carneiro, 2009) and some of these have been associated with the influence of cyclic factors from either the mosquito/transmission patterns or natural environment, or both (Shumway, 1988). The yearly malaria cases recorded for this study was also cyclical, for instance, the year 2011 had a shoot up in cases over the year 2010 whilst the year 2012 had a decline over year 2011. This alternating incidence recorded yearly could be attributed to either the people not being meticulous about the diseases or the health care authorities have being inconsistent in the management of the disease. This study reported that the general pattern of the recorded malaria cases had variations in seasons. Thus, for monthly cases, July recorded the highest number of cases whilst January recorded the lowest number of cases. This may be as a result of the high incidence of rainfall in July which is generally the rainfall season in Ghana and the temperature associated with rainfall seasons. This supports Mabaso, Craig, Ross, and Smith (2007) who revealed that rainfall seasonality as well as minimum  temperature has being associated with the number of P. falciparum infective bites received by an individual during a season or annually. Previous research has shown that climate is also a key factor in explaining the incidence of malaria (Anyamba et al., 2006;Campbell-Lendrum & Woodruff, 2006;Craig et al., 1999;Gagnon et al., 2002;Hay et al., 2002;Jones et al., 2007;Mantilla et al., 2009;Pascual et al., 2006;Patz et al., 2005;Poveda et al., 2001;Thomson et al., 2006;Zhou et al., 2004).
Other studies have concluded that temperature, precipitation, humidity, and atmospheric pressure may be linked to Malaria (Anyamba et al., 2006;Campbell-Lendrum & Woodruff, 2006;Craig et al., 1999;Gagnon et al., 2002;Hay et al., 2002;Jones et al., 2007;Mantilla et al., 2009;Pascual et al., 2006;Patz et al., 2005;Poveda et al., 2001;Thomson et al., 2006;Zhou et al., 2004). According to Thomson, Mason, Phindela, and Connor (2005), Zhou et al. (2004), Small, Goetz, and Hay (2003) and Hay et al. (2002), rainfall has an influence on the incidence of malaria because Mosquitoes require standing water in order to complete their life cycle. However, few studies found negative correlation between rainfall and malaria. Negative relationships between malaria and rainfall were found in studies conducted in Sri Lanka, the Amazon Basin, and Romania (Briët, Vounatsou, Gunawardena, Galappaththy, & Amerasinghe, 2008;Olson et al., 2009). This can be linked to the changing landscape of the region from forest to agriculture as claimed by Chaves, Cohen, Pascual, and Wilson (2008), Guerra et al. (2006), Massarani and Shanahan (2006) and maybe topography as claimed by Olson et al. (2009). According to Olson et al. (2009), since these countries have relatively flat ground rainfall washes the breeding ground away so Mosquitoes are not able complete their life cycle.
For the half year results, there was a consistent rise in malaria cases in the second half of the year when compared to the first half of each year. This may be due to the climate conditions in the second half of the year when rains are usually frequent in the Metropolis and harmattan sets in getting to the end of the second half. This period is usually accustomed with extreme weather conditions. Pascual et al. (2006) claims that increased temperatures can translate into a 30-100% increase in mosquito abundance. This is supported by Patz and Olson (2006) who claimed that increased temperatures also shorten the larval development, decreasing the amount of time needed for adult mosquitoes to spread malaria and allowing for the development of more mosquitoes The previous studies are in line with this study as seasonal indices analysis for malaria cases for first half year was below the average mark of 1 by 8.46% whiles the second half year was above the average mark of 1 by 8.56%. This implies that Malaria cases consistently rise in the second half of each year. The time series model developed for predicting the number of monthly cases of malaria was ARIMA (1, 1, 2) while the Quadratic model was used for the forecasting of the half year incidence of Malaria. This implies that ARIMA (1, 1, 2) can be used as a forecasting model to project the future values of a series based entirely on its own inertia. ARIMA works best when data exhibits a stable or consistent pattern over time with a minimum amount of outliers (Labys, 2006). This model can be used by researchers for forecasting malaria reported cases (Nobre et al., 2001). However, it should be updated from time to time with the incorporation of current data.

Conclusion and recommendations
Based on the findings of the study, we make a number of conclusions and recommendations.
The general trend of both monthly and half year malaria cases follows an increasing quadratic trend and there is seasonality in both cases. For the general pattern, July recorded the highest number of cases whereas January recorded the lowest cases in each year and the second half of each year records higher number of malaria cases.
It was projected that Malaria cases for 2018 and 2019 will reduce when compared to previous years and therefore a projection for the first half of 2018 was 61,371.8 cases, while the second half cases will rise to 77,842.0. For the first half of 2019, it was projected that 70,161.6 cases of malaria will be recorded which is expected to rise to 88,959.4 cases in the second half of the year.
The ARIMA (1, 1, 2) model was used for forecasting the number of expected monthly cases of Malaria. The model was used to predict a twelve-month lead period of malaria cases for 2018. The Quadratic model was also used for the forecasting of expected half year incidence of Malaria cases for 2018 and 2019. This model is recommended to the metropolitan health directorate and researchers who would want to monitor the malaria reported cases in the metropolis and other parts of the world.
It is recommended that in some instances, indoor residual spraying should be applied to minimize environmental contamination. Indoor residual spraying is suggested for application prior to the rainy season to prevent and control epidemic outbreaks.
The use of Insecticide Treated Nets is also recommended especially in the rainy seasons when malaria cases increase.