COVID-19 prevalence estimation: Four most affected African countries

The world at large has been confronted with several disease outbreak which has posed and still posing a serious menace to public health globally. Recently, COVID-19 a new kind of coronavirus emerge from Wuhan city in China and was declared a pandemic by the World Health Organization. There has been a reported case of about 8622985 with global death of 457,355 as of 15.05 GMT, June 19, 2020. South-Africa, Egypt, Nigeria and Ghana are the most affected African countries with this outbreak. Thus, there is a need to monitor and predict COVID-19 prevalence in this region for effective control and management. Different statistical tools and time series model such as the linear regression model and autoregressive integrated moving average (ARIMA) models have been applied for disease prevalence/incidence prediction in different diseases outbreak. However, in this study, we adopted the ARIMA model to forecast the trend of COVID-19 prevalence in the aforementioned African countries. The datasets examined in this analysis spanned from February 21, 2020, to June 16, 2020, and was extracted from the World Health Organization website. ARIMA models with minimum Akaike information criterion correction (AICc) and statistically significant parameters were selected as the best models. Accordingly, the ARIMA (0,2,3), ARIMA (0,1,1), ARIMA (3,1,0) and ARIMA (0,1,2) models were chosen as the best models for SA, Nigeria, and Ghana and Egypt, respectively. Forecasting was made based on the best models. It is noteworthy to claim that the ARIMA models are appropriate for predicting the prevalence of COVID-19. We noticed a form of exponential growth in the trend of this virus in Africa in the days to come. Thus, the government and health authorities should pay attention to the pattern of COVID-19 in Africa. Necessary plans and precautions should be put in place to curb this pandemic in Africa.


Introduction
Infectious diseases are becoming prominent and have continued to infect and reduce human populations. The world at large has been confronted with several disease outbreak which has posed and still posing a serious menace to public health globally. These include the severe acute respiratory syndrome (SARS) which resulted in eight hundred (800) deaths out of about approximately eight thousand (8000) cases in 2002, the H1N1 prevalent that recorded about eighteen thousand five hundred (18500) deaths in 2009. The Middle East respiratory syndrome (MERS) outbreak which led to the demise of eight hundred (800) people out of two thousand five hundred (2500) cases in the year 2012. The Ebola epidemic, with eleven thousand three hundred and ten (11310) deaths out of twenty-eight thousand six hundred and sixteen (28616) cases in 2014 (Anjorin, 2020). Recently, there begins another pandemic called COVID-19. COVID-19 is a novel kind of coronavirus had been recognized from a family of zoonotic coronaviruses. It includes the severe acute respiratory syndrome coronavirus (SARS-CoV) and the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) (Ogundokun et al., 2020;Wang et al., 2020). Coronavirus disease  outbreak with global deaths of 457,355 out 8,622,985 positive cases identified as of 15.05 GMT, June 19, 2020(WHO, 2020. COVID-19 is a respiratory infectious disease caused by a new strain of coronavirus that causes illness in humans. Scientists are still learning about the disease, and think that the virus began in animals. At some point, one or more humans acquired infection from an animal, and those infected humans began transmitting the infection to other humans. The disease spreads from person to person through infected air droplets that are projected during sneezing or coughing. It can also be transmitted when humans have contact with hands or surfaces that contain the virus and touch their eyes, nose, or mouth with the contaminated hands (Ayinde et al., 2020). COVID-19 was first reported in China, but it has now spread throughout the world. Before COVID-19 was officially declared as a pandemic by the world health organization on March 11, 2020, the virus has escalated and covered a large area of more than 114 countries and territories.
Recently, Ayinde et al. (2020) applied the linear regression model and some other curve estimation model to predict the prevalence of COVID-19 in Nigeria. SEIR model and Regression model have been used for predictions for COVID 19 in India (Pandey et al., 2020). Linear Regression Analysis was adopted to predict the number of deaths in India due to SARS-CoV-2 (Ghosal et al., 2020). Among the various statistical tools used to predict epidemic cases is the AutoRegressive Integrated Moving Average (ARIMA) model. A high number of researchers have applied the ARIMA model for disease prevalence/incidence prediction in different diseases outbreak. These include Guan et al. (2004), Earnest et al. (2005), Gaudart et al. (2009), Liu et al. (2011), Nsoesie et al. (2013, Zheng et al. (2015), He and Tao (2018), Fang et al. (2020), Polwiang (2020) and Cao et al. (2020). Sato (2013) presented the process of using the ARIMA model in evaluating infectious and non-infectious disease management. ARIMA model has been used to ameliorate the forecast accuracy of several epidemic diseases (Pan et al., 2016). (Perone, 2020) estimated an autoregressive integrated moving average (ARIMA) model to forecast the epidemic by using the Italian epidemiological data at the national and regional level. An Auto-Regressive Integrated Moving Average (ARIMA) model prediction was performed on the Johns Hopkins epidemiological data to predict the epidemiological trend of the prevalence and incidence of COVID-2019 (Benvenuto et al., 2020). Rauf and Oladipo (2020) forecasted the spread of COVID-19 in Nigeria using the ARIMA model. Auto-Regressive Integrated Moving Average (ARIMA) model was used to predict the pattern of confirmed cases of COVID-19 in different countries (Tania et al., 2020). Hitesh et al. (2020) developed an ARIMA model and then employed it for forecasting future COVID-19 cases in India. Ceylan (2020) applied the auto-regressive integrated moving average (ARIMA) model to predict the prevalence of COVID-19 in Italy, Spain, and France.
In this study, we will focus on the estimation and prediction of COVID-19 prevalence in the following countries that are Africa epicentre: South Africa, Egypt, Nigeria, and Ghana. The datasets examined in this analysis spanned February 21, 2020 to June 16, 2020. According to Cao et al. (2020), ARIMA models have made significant progress in the health sciences as well as different fields for an efficient epidemic forecast. This simple time series model was adopted in this study. Finally, this study will help to monitor the trends of this pandemic in Africa, provide a reliable estimate and forecast. It will further help the federal government to take decisions to curb the outbreak.

Methodology
The COVID-19 dataset employed in this study was taken from the WHO website (https://www.who.int/emergencies/ diseases/novel-coronavirus-2019/situation-reports/), and the analysis was done using Gretl software. The descriptive statistics of the COVID-19 data of the Africa COVID-19 epicentres between 28/02/2020e15/06/2020 are given in Table 1. According to Box et al. (2015), at least 30 observations are required for stable and effective ARIMA modelling. The sample size in this study is more than 30. We predicted COVID-19 prevalence of South Africa, Egypt, Nigeria and Ghana over the next twenty days with 95% relative confidence intervals. Table 1 shows that among the epicentres, the highest confirmed case of COVID-19 is from South-Africa as of June 15, 2020. There are days each of those countries do not record any instances of COVID-19. An average of 687, 437, 148 and 126 people are reported to have COVID-19 in South-Africa, Egypt, Nigeria and Ghana daily respectively.
The skewness values for each of the countries are higher than one (1), which shows that the daily cases reported in each country are highly skewed to the right. The result of the Kurtosis further indicates the dataset is not symmetric. Among these countries, Ghana has the least number of reported cases daily. It is very clear from Fig. 1 that Nigeria was the first among these countries to publish the first COVID-19 cases on February 28, 2020. Both South-Africa and Egypt recorded their first cases on March 6, 2020, while Ghana first case happened four days after.
Also, from Fig. 1, we observed a slow movement in the progression of the number of COVID-19 cases reported daily in the four countries until March 23. The number of cases reported daily until March 24 were in their tens. However, as seen in Fig. 1, South-Africa began to experience a peak by recording hundreds of COVID-19 cases daily. Fig. 1 confirm that the overall prevalence of COVID-19 used in this study does not show seasonal patterns. The box-plot in Fig. 2 provides a good insight into the number of outlying cases. For instance, COVID-19 cases in Nigeria were outlying in the following days: 104, 106 and 107. About eleven (11) outlying cases were identified in South-Africa. Four and eight outlying points were detected in Ghana and South-Africa, respectively. Fig. 3 is the time series plot of the cumulative confirmed cases of COVID-19.

ARIMA models
According to Fanoodi et al. (2019), time-series data are a set of observations indexed and ordered in time. Time series analysis seeks to analyse time-series data to reveal the characteristics of the data and obtained meaningful. Time series forecasting employed time series model to predict future values of the series based on previously observed values (Liu et al., 2011;Elevli et al., 2016;He and Tao, 2018;Benvenuto et al., 2020, Ceylan, 2020. The autoregressive integrated moving average (ARIMA) models is a popularly used time series model introduced by Box and Jenkins in the 1970s (Box et al., 2015). The ARIMA models take into account the non-stationarity aspect of the data with the possibility of one or two differencing. The model is applicable for all kinds of data, including trend, seasonality, and cyclicity (Ceylan, 2020). It is generally denoted ARIMA (p,d,q) where p is the order of autoregressive model, d is the degree of differencing, and q is the order of moving average (Li et al., 2019). The ARIMA model is a generalization of the autoregressive moving average (ARMA) model. The autoregressive (AR) involves regressing the variable of interest Y t on its lagged values Y tÀ1 , Y tÀ2 ,..,Y tÀp . The moving average (MA) part involves regressing the series Y t on the current residuals ε t and its lagged residual series ε tÀ1; ε tÀ2 ; ε tÀ3;: :::; ε tÀq : The integrated (I) denotes taking the difference between the actual time series data and its lagged values. The differencing can be done once or twice. The ARMA(p,q) model consist of the AR and MA process and it is defined as follows: :: þ f p Y tÀp þ ε t À q 1 ε tÀ1 À q 2 ε tÀ2 À ::: À q q ε tÀq : (1) where a is a constant, f and q are the autoregressive and moving average parameters, respectively. Y t is the observed value at time t and ε t is the value of the residual at time t such that ε t eNð0; s 2 Þ: The ARIMA modelling approach includes model identification and model selection, parameter estimation, diagnostic checking, and forecasting. The first steps of model identification and model selection are to ensure that the time series variable is stationary and to identify if the series is seasonal or not. The autocorrelation (ACF) and the partial autocorrelation (PACF) functions of the dependent variable was adopted for model identification and selection. The parameters in the model are estimated using maximum likelihood estimation. We conduct statistical model checking by testing whether the estimated model conforms to the specifications of a stationary univariate process. In particular, the residuals should be independent of each other and constant in mean and variance over time. Misspecification can be identified by plotting the mean and variance  of residuals over time and performing a LjungeBox test. Alternatively, we can plot the autocorrelation and partial autocorrelation of the residuals. If the estimation is inadequate, we have to return to step one and attempt to build a better model. Augmented Dickey-Fuller test was adopted to check the stationarity status of the series (daily confirmed cases of COVID-19 in South Africa (SA), Nigeria, Ghana and Egypt). The result of the test is provided in Table 2. We observed that all the daily confirmed cases in the four countries are not stationary at the actual level. The time-series data become stable at the first difference for the following countries: Nigeria, Egypt and Ghana while that of South-Africa became stationary at the second difference. Thus, the series is ready for modelling in line with the Box-Jenken ARIMA modelling approach. The parameters of the ARIMA models were determined according to the ACF and PACF plots (see Fig. 4a and b). Potential models combination were obtained from the ACF and the PACF plot for SA, Nigeria, Ghana and Egypt. The order of the model was determined according to ACF and PACF after applying the individual difference of the country's prevalence series. The result of the different possible models is presented in Table 3. The ACF and PACF plots for the countries COVID-19 prevalence series are displayed in Fig. 4aed.    ARIMA models with minimum Akaike information criterion correction (AICc)and statistically significant parameters were selected as the best models. Accordingly, the ARIMA (0,2,3), ARIMA (0,1,1), ARIMA (3,1,0) and ARIMA (0,1,2) models were chosen as the best models for SA, Nigeria, and Ghana and Egypt, respectively. The models fitted the COVID-19 data reasonably well (Table 4) with a minimum AICc ¼ 1471.74, 1225.67, 1374.12 and 1296.01 for SA, Nigeria, and Ghana and Egypt, respectively.
The best model are estimated in Table 4 and eventually employed for forecasting the daily spread series of COVID19. The forecast is available in Table 6.
From the histogram of forecast errors in Fig. 5aed, for the prevalence of COVID-19 series of SA, Nigeria, Ghana and Egypt respectively, it seems that some of the forecast errors are not normally distributed with mean zero. However, adopting the formal test using the Chi-square test in Table 5 indicates that the forecast errors are normally distributed. It suggests that the ARIMA models provide an adequate predictive model of the prevalence of COVID-19 in SA, Nigeria, Ghana and Egypt. Furthermore, the assumptions upon which the prediction intervals were based are thus valid.
As shown in Table 6, the daily spread data from June 16 to July 5, 2020 (20 days) were predicted using the ARIMA (0,2,3), ARIMA (0,1,1), ARIMA (3,1,0) and ARIMA (0,1,2) models for SA, Nigeria, and Ghana and Egypt, respectively. Based on the box-Ljung test, the results suggested that the predicted values fitted well with the actual values. The fitted and forecasted values are presented in Fig. 6aed. We observed an exponential increase in the trends of future COVID-19 cases in the selected African countries. However, the growth rate is not sporadic in Ghana. The result from Table 6 shows that the forecasted COVID-19 prevalence for the first four most affected Africa countries as at July 3 is expected to be as follows: 59998.4-8517.69 in SA, 583.88 to 780.19 in Nigeria, À3 to 886.45 in Ghana, 1379 in Egypt. Ghana is the only country that tends to experience a rapid decrease. It is also important to state here that Ghana is the only country among these countries that have recorded the least number of death of COVID-19 infected patients. For other countries, there is no downward trend movement Fig. 4d. ACF and PACF for Egypt.       in the predicted number of confirmed cases. The result of this study is a bit alarming, especially for South-Africa. Due to the recent relaxation in lockdown in each of these countries, there is a tendency that clinical and social problems will be unmanageable and consequently leads to crisis.

Conclusion
In this study, we presented the time series modelling for the novel COVID-19, which emerge recently in Wuhan China. The ARIMA models were applied to the daily confirmed COVID-19 cases of four African countries that are the first four most affected countries: South-Africa, Egypt, Nigeria and Ghana. We obtained the best ARIMA models for each of the countries and made a daily forecast from 16th June to July 5, 2020. The result shows that the daily number of COVID-19 cases in these epicentres will continue to increase especially in South-Africa, Nigeria and Egypt. There is a downward upward movement in Ghana. The country does not exhibit exponential progression as obtainable in other countries. The number of death rate in Ghana have also shown that the country can manage the situation well than the three others. The forecast indicates that there is a tendency of more cases, especially in the South-Africa if the outbreak is not well control. The findings in this study show that the government still have a lot to do in managing this present pandemic. We believe this study will help the government and other health authorities to plan and supply resources effectively for effective management of this pandemic in the days to come.