ForecastingConfirmedMalariaCases inNorthwesternProvinceof Zambia: A Time Series Analysis Using 2014–2020 Routine Data

Background. Malaria remains a significant public health problem, especially in resource-poor settings. We aimed to forecast the year 2021 monthly confirmed malaria cases in the northwestern province of Zambia. Methods. *e total number of confirmed monthly malaria cases recorded at health facilities over the past 7-years period (January 2014 to December 2020) was taken from the District Health Information System version 2 (DHIS.2) database. Box–Jenkins autoregressive integrated moving average (ARIMA) was used to forecast monthly confirmed malaria cases for 2021. STATA software version 16 was used for analyzing the time series data. Results. Between 2014 and 2020, there were 3,795,541 confirmedmalaria cases in the northwestern province with a monthly mean of 45,185 cases. ARIMA (2, 1, 2) (0, 1, 1) was the best fit and the most parsimonious model. *e forecasted mean monthly confirmed malaria cases were 60,284 (95%CI 30,969–121,944), and the total forecasted confirmed malaria cases were 723,413 (95%CI 371,626–1,463,322) for the year 2021.Conclusion.*e forecasted confirmedmalaria cases suggest that there will be an increase in the number of confirmedmalaria cases for the year 2021 in the northwestern province.*erefore, there is a need for concerted efforts to prevent and eliminate the disease if the goal to eliminate malaria in Zambia by 2030 is to be realized.


Background
Malaria remains a significant public health challenge in low and middle-income countries (LMICs) despite some advancements in diagnostic and treatment modalities and resources applied to prevention, control, and elimination [1][2][3]. Malaria is a disease caused by infection of parasites of Plasmodium species and transmitted through infected Anopheles mosquito bites [4,5]. However, though preventable and treatable, malaria continues to contribute significantly to global morbidity and mortality, especially in LMICs, mainly in sub-Saharan Africa, Latin America, and Asia [3,6,7]. Globally, in the year 2019, approximately 229 million people were estimated to have had malaria, with about 409,000 reported deaths due to the disease [8]. e African region has a disproportionately significant share of the global malaria burden, which accounted for 94% of malaria cases and deaths in 2019 [8]. As of 2019, malaria transmission was endemic in 87 countries and territories, mainly in Africa, Asia, and South America. Malaria has substantial economic consequences at individual, household, and country levels, such as low individual productivity, household food insecurity, and poor economic growth, respectively [9]. It is estimated that the annual cost of control and elimination interventions is US$ 3 billion, whilst a recent systematic review estimated the annual per capita cost of control and elimination at US$ 2.21 and 3.00, respectively [8,9].
ere have been significant prevention and control intervention measures and efforts to eliminate malaria [7,10,11]. Malaria elimination is one of the key priorities in Zambia's National Health Strategic Plan, and it is the main objective for the 2017-2021 National Malaria Elimination Strategic Plan [11,12]. With the support from international organizations and partners such as the Global Fund to fight AIDS, tuberculosis, and malaria, the United States President's Malaria Initiative, and the World Bank, Zambia, utilize vital interventions such as indoor residual spraying (IRS), distribution of long-lasting insecticidal nets (LLINs), prompt diagnosis and treatment of malaria cases, and intermittent preventive treatment of malaria in pregnancy (IPTp) among others to fight the disease [11,13]. However, in certain parts of the Zambia, such as the northwestern province, there has been a persistence of malaria cases, and in some years, even resurgence incidents have been recorded [5,14]. Malaria is still one of the leading causes of morbidity and mortality, especially in children aged below five years with the peak of transmission during the rainy season, between December and April every year [2,15].
ere is a need to estimate future malaria occurrence for planning and intervention purposes to mobilise additional malaria elimination programming resources. e autoregressive integrated moving average (ARIMA) time series method can produce an estimation model with known monthly malaria cases [16]. Forecasting malaria cases is essential for allocating appropriate preventive control measures and eventual elimination strategies [5]. Time series analysis can predict malaria in a particular month based on preceding months [16,17]. ere is heterogeneity in malaria occurrence in Zambia in terms of geospatial distribution and temporal effects of temperature and precipitation [2,11]. erefore, we designed this study to develop a temporal model for forecasting confirmed malaria cases based on previous malaria cases in the northwestern province of Zambia, one of the subnational regions with malaria resurgence.

Study Area.
e study was conducted using data from the northwestern province of Zambia, which is one of the ten provinces of Zambia and it lies between 13.005°S and 24.9042°E and covers an area of 125, 826 square kilometres, and its estimated population for the year 2020 was 950,789 [18]. More than three-quarters (77%) of the province is rural and has an average annual rainfall of more than 1200 mm which is above the national average of about 1000 mm [2]. e higher rainfall predisposes the province to an increased risk of malaria; thus, predicting malaria in one of the high burden areas can help the country in the fight against the disease. Resistance to commonly used insecticides such as pyrethroids has also been documented in Zambia in areas adjacent to the Northwestern province. However, due to limited research conducted in the country, the authors did not find literature on insecticide resistance in the province [19,20].
e Churches Health Association of Zambia (CHAZ) compliments the Ministry of Health in managing malaria programs in this region. Over the time period considered in this study, mass distribution campaigns of LLINs in households in 2014 and 2017 were done, while continuous distribution to children below five years and pregnant women was done at health facilities during underfive clinics and antenatal clinics, respectively [21]. Other services, such as IPT p , case management, and social behavioural change communication, weredone on a routine basis throughout the years. Indoor residual spraying is done annually in selected households [11]. Information on healthseeking behaviour among the population in the northwestern province with suspected symptoms of malaria is not available; however, it has been noted that the presence of chronic carriers of malaria parasites among the adult population may lead to underestimation of the true number of malaria cases in the population over the time period of the study [22]. Among the children under the age of five years, a national survey in 2015 found that only about 22% who had had a fever in the two weeks preceding the survey had sought treatment from health facilities [5]. Despite these limitations in available data, for planning purposes for those that seek medical care at health facilities, the monthly reported cases of confirmed malaria through the District Health Information System (DHIS) is helpful to health authorities.

Data Collection.
All health facilities in Zambia collect routine attendance data and report monthly to the district health authorities. e district health offices, in turn, collate the inputs from all health facilities within their jurisdictions and report to the Ministry of Health Headquarters via the web-based District Health Information System version 2 (DHIS.2). Malaria cases confirmed by either the rapid diagnostic tests (RDT) or microscopy in the communities by community health volunteers and health facilities are reported through the system. CHAZ supports the Ministry of Health in managing malaria programs in three provinces, namely, eastern, northwestern, and southern provinces. e number of monthly confirmed malaria cases (RDTs and microscopy) from January 2014 to December 2020 was extracted from the DHIS.2 database by CHAZ monitoring and evaluation staff, using a Microsoft Excel data extraction sheet. ere have not been policy changes in malaria diagnosis using either RDTs or microscopy during the time of the study as RDTs were introduced before 2014 [23].

Modelling of Time Series.
e ARIMA model, which is also known as the Box-Jenkins methodology, was used to model the time series and was applied to confirmed malaria cases [16]. is methodology was based on the presence of autocorrelation within the time series [24]. e time interval was equally spaced, while the model's construction was done on stationary data (constant mean and variance over time) [24]. It was applied on count data that were continuous (number of confirmed cases per month), and studies have suggested that there should be at least 50 observations, while our study had 84 observation [25].

2.4.
Notation of the ARIMA. An important notation: p stands for the order of the autoregressive (AR) component. d stands for the order of differencing. q stands for the order of the moving average (MA) component, which is an error of the difference between the observed and estimated values [26].
A seasonal ARIMA model is represented by ARIMA (p, q) (P, D, Q) s, where p and P represents autoregressive and seasonal autoregressive, respectively; d and D are the nonseasonal and seasonal differencing, respectively. For q and Q, these are the moving average parameters and seasonal moving average parameters, respectively, and s represents the longevity of the seasonal period; in this case, it is 12.
e Box-Jenkins methodology was used to model the ARIMA and through four main steps: identification, estimation, diagnostics, and forecasting ( Figure 1).
Step 1. ARIMA model identification e ARIMA model identification requires that the data are stationary. Stationary data require that there are no systematic changes in the mean and variance and no periodic variation. is was achieved by first-order differencing the data. After the first-order difference, the graphical tools used to identify the model were autocorrelation function (ACF) and partial autocorrelation function (PACF). We then used the ACF and PACF to decide the suitability of the moving average and autoregressive components, respectively. To further confirm the data's stationarity, the augmented Dickey-Fuller test was conducted, which has a null hypothesis that data are not stationary. e test statistics (−4.024) was less than the critical value (−3.535) at the 1% level, and the p value was statistically significant (p � 0.003), suggesting stationarity of data.
Step 2. Model estimation e 8 tentative models that were obtained from the plots of ACF and PACF were used for estimating the appropriate model. All models were considered, and the appropriate model selection was based on the one with the highest loglikelihood value, lowest volatility, most significant coefficients, and with lowest BIC and AIC values.
Step 3. Model diagnostic checks To test for adequacy of the selected ARIMA model, we used the residuals of the fitted model to find the ACF plot, and we checked for normal distribution. We then conducted the portmanteau (Q) test to test for the presence of white noise. In addition, the Q-Q plot and Shapiro-Wilk test were performed to test for normality of the residuals. After diagnostic tests were completed, the ARIMA model was considered appropriate when acceptable limits were within acceptable limits.

Step 4. Forecasting
We used the model that was selected to forecast malaria cases and evaluated the model forecast accuracy by dividing the data into two groups. First, the development was based on data from (January 2014 to December 2020), and forecasting was done for January 2021 to December 2021. Forecast accuracy was assessed using the mean absolute percentage error (MAPE). All data analyses were conducted using STATA 16 (STATA Corp, College Station, Texas, USA) with p < 0.05 was considered statistically significant.

Ethical
Considerations. Approval to conduct this study was obtained from the ERES Converge Institutional Review Board Committee (Ref. no. 2020 Nov 003). No personal data were used as the data were in aggregate form; therefore, confidentiality was maintained.

Results
From January 2014 to December 2020, there were 3,795,541 malaria cases for all ages in northwestern province. In the time series plot of monthly confirmed cases of malaria, there were seasonal trends that were observed. e augmented Dickey-Fuller (ADF) test results showed the presence of unit root (z(t) � −0.852, p � 0.835, lags � 15), suggesting that the series was not stationary (Figure 2(a)). But after the first differencing, the series became stationary (Figure 2(b)), and all further statistical analyses were conducted on stationary data.
To identify the AR and MA process's appropriate lags, we used the correlograms (plots of ACF and PACF against lags lengths), ACF, and PACF (Figure 3). e first two lags of the ACF were significant (outside the 95% CI band). For the PACF, the first two lags were significant and lag 14 with decaying over time. Based on the ACF and PACF plots, tentative models were identified.
Using the log-likelihood, sigma-squared (volatility), number of significant coefficients, Akaike information criteria (AIC), and Bayesian information criteria (BIC), initially, first-order differences were assessed, followed by Advances in Public Health 3 nondifferenced models. Model identification was based on the one with the highest log-likelihood ratio, lowest sigmasquared (volatility), the highest number of significant coefficients, lowest Akaike information criteria (AIC), and lowest Bayesian information criteria (BIC). e ARIMA (2, 1, (2) (0, 1, 1) 12 model was identified as the most ideal ( Table 1).
Plots of the autocorrelation function (ACF) and partial autocorrelation (PACF) of the ARIMA model residuals showed that ACFs and PACFs were not significantly different from zero (white noise) (Figure 4). e portmanteau Q-test (Q15 � 18.4, p � 0.241) favoured the null hypothesis of no autocorrelation in the residuals. e residual histogram plot showed no volatility clustering and assumed the   Advances in Public Health residuals were homoscedastic ( Figure 5).
e skewnesskurtosis test (X 2 � 4.4, p � 0.133) was consistent with the model residuals' normality assumption. e mean malaria cases for the year 2021 forecast period was 60,284. e model estimated an increase in malaria cases compared to the mean of the previously observed months-a plot of the observed and forecasted values of malaria cases ( Figure 5). A t-test of the observed and projected mean malaria cases showed no significant difference (p > 0.05) ( Table 2). e estimated mean absolute percentage error (MAPE) was 10.4% (SD = 6.2).
A forecast for the year 2021 using the ARIMA (2, 1, 2) (0, 1, 1) 12 model is given in Table 2. From Table 2, the forecast for the year 2021, the mean monthly confirmed malaria cases are expected to be 60, 284 (95% CI 30 969-121 944), which is higher than the monthly means for cases from 2014 to 2020 of 45,185 cases. is suggests a 33% increase in anticipated malaria cases from previous years.

Discussion
is study set out to forecast malaria cases in the northwestern province, one of the ten provinces in Zambia with high transmission of malaria using the ARIMA model, which used the model temporal dependence structure of a time series occurrence [27,28]. Herein, we developed an ARIMA model that tried to offer a simple tool that can forecast the probable number of malaria cases per month in the future based on previously observed malaria cases over several years. Our ARIMA (2, 1, 2) (0, 1, 1) 12 model was the best in forecasting malaria cases in the study area. e developed model was validated and seemed to fit the data well, given the forecasting's tolerable error level. Other studies have found ARIMA (0, 1, 1) (0, 1, 0) 12 [27] and (2, 1, 1) (0, 1, 1) 12 [29] to be the best for forecasting malaria. Also, studies from Ghana [30] and Ethiopia [31] have used the ARIMA model for forecasting malaria cases.
Furthermore, ARIMA models have been used in the assessment and prediction of new HIV infections in Korea [32], forecasting of monthly dengue infections in Brazil [33], the occurrence of haemorrhagic fever in China [34], and hantavirus eruption in Chile [35]. ARIMA models have one main advantage: they take care of periodic changes and trends and random disturbances in the time series. To the best of our knowledge as informed by our search of published literature, this is the first time that ARIMA models have been applied to project confirmed malaria cases on a monthly basis for a given area in Zambia. e Ministry of Health can adopt this approach at provincial and district levels in planning for malaria interventions and management.
is study established that ARIMA can predict monthly confirmed malaria cases in the northwestern province of Zambia. e fitted model indicates the need for first-order differencing to make the data stationary and then a secondorder autoregressive term and a second-order moving average term to accommodate serial correlation in the data.
is suggests that expected malaria cases for each month are  directly influenced by confirmed malaria cases in the previous month and the prediction errors of the current and previous months. Our model suggests that it can be used to inform, advocate, and plan for interventions for the year 2021 and going forward. is study found that the forecasted number of confirmed malaria cases for the northwestern province in 2021 is higher than that of the previously observed periods. e forecasted malaria cases are in keeping with a recent upsurge in malaria cases reported in the secondary analysis of three recent Malaria Indicator Surveys in Zambia [2]. e increase is despite the recent implementation of prevention and treatment interventions such as the mass distribution of insecticide-treated nets [21], prompt diagnosis and treatment of malaria cases, indoor residual spray, as well as intermittent preventive therapy for pregnant women [2]. Malaria is endemic to Zambia, with transmission all year round. Still, the northwestern province is among the provinces with the highest malaria cases with peak transmission during the rainy season between November and April [2]. Although a recent study in Zambia showed that climatic indicators such as rainfall and temperature were not significant determinates of increases in malaria, ARIMA models elsewhere have demonstrated otherwise, which increase mosquito population and influence their biting behaviour [36,37].
In this study, results indicate that the ARIMA (2, 1, 2) model gives a good forecast of malaria cases for January 2021-December 2021 (12 months) with a 9% margin of prediction error, i.e., MAPE � 9.4. e error was higher than 4%, which was reported in another study that used the ARIMA (2, 1, 1) (0, 1, 1) 12 model to forecast the occurrence of malaria cases in Bhutan [29]. One plausible explanation for the difference could be that our observations covered a slightly shorter period of 84 months compared to 168 months in the previous study, leading to more accuracy. In the future, studies should consider assessing the effect of other time-changing variables such as malaria treatment policy, vector control, and malaria drug resistance on malaria cases over time and also, forecast using different models such as ARIMA models meteorological factors and ARIMA without a constant term to see how the models might perform.

Limitation.
Our study has some limitations. Firstly, the ongoing malaria interventions could distort the forecasted values as we did not factor in interventions in the model; however, the influence of the interventions in the province on confirmed malaria cases has been captured indirectly as we used the actual number of confirmed cases in the province over the last seven years during which the various interventions were ongoing. We believe our model will hold when the levels of interventions continue at the same rates  and types of interventions applied. Secondly, the study did not consider meteorological variables such as rainfall, temperature, and humidity, which have been shown to influence malaria transmission; equally, the effects of these factors on the numbers of confirmed malaria cases have been captured indirectly through our use of actual monthly reported cases of confirmed malaria; therefore, unless there will be drastic climatic changes that are different from the ones experienced in the last seven years, our model will still hold. irdly, this study predicted the malaria cases at the provincial level, which is still a large area where there is heterogeneity in geospatial terms. Different districts within the province might experience malaria differently. However, from the programs planning and implementation perspective, the provincial health authorities guide how the districts and health facilities within the province operate; therefore, planning at the provincial level can have more impact in prioritising resources and advocacy for resource mobilisation with the central government and cooperating partners. We, therefore, feel that our approach can have a better impact than when applied at lower levels such as districts and health facility levels, given that lower capacities exist at the district and health facility levels compared to the provincial level in using data to engage central government and cooperating partners.

Conclusion
is study has been able to use historical data that considers seasonality patterns at the provincial level to forecast malaria cases on a monthly basis. e predicted cases show an expected increase. e forecasted malaria cases provided in advance can help planners and implementers of malaria programs to effectively mobilise resources and implement effective prevention and elimination measures. Further studies should attempt to evaluate the usefulness of incorporating the forecasting model such as this one into the existing malaria prevention and elimination programs to assess its impact in reducing malaria and the cost of intervention measures.

Data Availability
e data used to support the findings of this study are available upon request from CHAZ management through e-mail ed@chaz.org.zm.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
DMM conceived the idea and developed the initial draft concept. RKZ, JM, HM, and CMM collected the data, and NM did the analysis. All the authors made substantial contributions to the development of the manuscript and reviewed and approved the final manuscript. KS contributed to all stages of the development and provided the overall leadership.