ONLINE FORECASTING OF COVID-19 CASES IN NIGERIA USING LIMITED DATA

The novel Coronavirus disease (COVID-19) was first identified in Wuhan, China in December 2019 but later spread to other parts of the world. The disease as at the point of writing this paper has been declared a pandemic by the World Health Organization (WHO). The application of mathematical models, artificial intelligence, big data, and similar methodologies are potential tools to predict the extent of the spread and effectiveness of containment strategies to stem the transmission of this disease. In societies with constrained data infrastructures, modeling and forecasting COVID-19 becomes an extremely difficult endeavor. Nonetheless, we propose an online forecasting mechanism that streams data from the Nigeria Center for Disease Control to update the parameters of an ensemble model which in turn provides updated COVID-19 forecasts every 24 hours. The ensemble combines an Auto-Regressive Integrated Moving Average model (ARIMA), Prophet - an additive regression model developed by Facebook, and a Holt-Winters Exponential Smoothing model combined with Generalized Autoregressive Conditional Heteroscedasticity (GARCH). The outcomes of these efforts are expected to provide academic thrust in guiding the policymakers in the deployment of containment strategies and/or assessment of containment interventions in stemming the spread of the disease in Nigeria


a b s t r a c t
The novel Coronavirus disease (COVID-19) was first identified in Wuhan, China in December 2019 but later spread to other parts of the world. The disease as at the point of writing this paper has been declared a pandemic by the World Health Organization (WHO). The application of mathematical models, artificial intelligence, big data, and similar methodologies are potential tools to predict the extent of the spread and effectiveness of containment strategies to stem the transmission of this disease. In societies with constrained data infrastructures, modeling and forecasting COVID-19 becomes an extremely difficult endeavor. Nonetheless, we propose an online forecasting mechanism that streams data from the Nigeria Center for Disease Control to update the parameters of an ensemble model which in turn provides updated COVID-19 forecasts every 24 hours. The ensemble combines an Auto-Regressive Integrated Moving Average model (ARIMA), Prophet -an additive regression model developed by Facebook, and a Holt-Winters Exponential Smoothing model combined with Generalized Autoregressive Conditional Heteroscedasticity (GARCH). The outcomes of these effort s are expected to provide academic thrust in guiding the policymakers in the deployment of containment strategies and/or assessment of containment interventions in stemming the spread of the disease in Nigeria © 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Table   Subject Decision Sciences Specific subject area Application of an online framework for forecasting the range of COVID-19 cases in Nigeria using limited data.
Type of data Value of the data • These data are useful as they present facts that drive analytics on COVID-19 cases in Nigeria.
• Academic institutions, public health agencies, scientific communities, researchers, students, and self-explorers can use these data, code, and models to analyze COVID-19 cases in Nigeria and beyond. • The data -with daily updates, model, code, and analysis presented can be applied to drive analytics, policy development, and decision making in other countries where data is scarce. It also represents an early reference that can be used in the future. • The ensemble of models leverages the strengths and compensates for weaknesses in the individual forecasting algorithms even with limited data.

Data Description
The daily number of COVID-19 cases in Nigeria from February 27, 2020, to April 5, 2020, were automatically mined every 24 hours from the official websites of NCDC ( http://covid19.ncdc.gov. ng/ ) and Wikipedia ( http://tiny.cc/nigeria _ covid19 ) using a python script. The case numbers up to April 5 can be found in the supplemental data (Appendix A). As at the point of writing this brief, the dataset contains 39 time-series data points. For forecasting purposes, earlier days with zero incidences of COVID-19 were filtered for building the forecast models. This is presented in tabular form in Table 1 -2 and visual form in Fig. 1 while the number of new cases per day is shown in Fig. 2 . Seven successive daily forecasts by the ensemble, starting from March 29, 2020, to April 5, 2020, are presented in Table 3 . The autocorrelation and partial autocorrelation behavior of the dataset of Fig. 3 aids in developing ARIMA modeling insights. Similarly, Fig. 4 is a visualization of the forecast beams indicating the direction and strength of increases or decreases in the forecasted number of cases. Table 4 highlights the relative strength and weaknesses of the models individually and finally; Fig. 5 presents a visual comparison of COVID-19 cases in Nigeria vs South Africa for analyzing policy impact.

Experimental Design, Materials, and Methods
In this paper, we present the application of ensemble forecasting models in a data constrained environment. The objective is to establish the lower and upper bounds on the possible     number of COVID-19 cases per day using a framework that automatically streams web data in real-time from reliable sources. This data is used for retraining and adapting the parameters of an ensemble of three models which in turn updates its forecast prediction for the following day. Each of the three models provides an estimated lower bound and upper bound for the number of cases. The Ensemble forecast is achieved by taking the minimum of the lower bounds and the maximum of the upper bounds. Due to limited sources of data (the only available information is a single variable, "number of COVID-19 cases per day"), it makes it difficult to implement spe-

Holt-Winters ES
Strong and accurate forecasting (short-term), favors recent data samples, requires few data points, straightforward implementation.
Lagged forecasts. cialized, advanced, and more generalizable methods that often require a variety of features and bigger datasets. It is also important to highlight that there are only a few samples (22) in the dataset making it equally challenging to apply nonparametric neural models. The Nigeria Center for Disease Control (NCDC) records the number of cases of COVID-19 in Nigeria using established epidemiological methods [1] . This data is presented to the public on social media several times a day as updates arrive and are available at NCDC secretariat/website. Unfortunately, information about the number of tests carried out per day and other factors are not readily available. This leaves us with only one variable: the total number of cases. In the same vein, Wikipedia maintains a data table of the number of COVID-19 cases in Nigeria with values updated using information from NCDC as well as other reliable verified news and media outlets. Surprisingly, even under data and information constraints, bounds on our ensemble forecast has been able to accurately capture the daily total number of cases from March 29, 2020, to April 5, 2020 ( Tables 1 and 2 ; Figs. 1 and 2 ). We also provide an informative data visualization comparing COVID-19 cases in South Africa after policy impact.

Ensemble of Forecasting
y t represents the series of differences, φ i are the coefficients, the y t−p and ε t−q are the lagged predictors for the model [2] . The ARIMA model is a generalization of many sub-models and characterized by three parameters: order of autoregressive observations p , degree of differencing d , and number of moving average terms q . The foundations of ARIMA lie on the fact that nonstationary time-series can be made stationary by through differencing. The verification of stationarity can be achieved using autocorrelation plots and unit root tests such as the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test [3] . For this model, we are interested in non-seasonal phenomena. Using brute search and inspection of the autocorrelation function ACF and partial autocorrelation function PACF plots ( Fig. 3 ), an ARIMA (2,1,0) model was chosen.
ii. Prophet is an additive regression time-series forecasting algorithm developed by Facebook [4 , 5] . It has strengths in dealing with strong seasonal effects, missing data, outliers, and shifts in trend making it fully automatic. It is also implemented with a Stan backend which introduces a fast solution to L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shannon algorithm) for the forecasting problem. Prophet includes a decomposable time series model defined by: is the trend, s (t) represents seasonal changes and h ( t ) captures irregular effects. We elicit trend characteristics g ( t ) as of primary importance in developing the forecast. The trend model in this work utilizes a piecewise saturated growth model with time-varying carrying capacity defined below: is the time-varying carrying capacity, k is the growth rate, m is an offset. The growth rate is not constant but piecewise with α( t ), γ , and δ defining its structure.
iii. Holt-Winters Exponential smoothing is a famous time-series modeling and forecasting algorithm that came to light in the 1950s after some scientific reports [6][7][8] . This algorithm is a weighted average of past observations with exponentially decaying weights to capture the trend in a time-series dataset. It has the general form of: Where the values of alpha tune the response of the model. Alpha values close to 0 emphasize past input data, while values close to 1 emphasize recent input data. The Holt-Winters ES algorithm introduces a seasonality component to the vanilla ES. In addition to the ES model, a GARCH model [9] was used to forecast variances and combined with the ES model gives the upper and lower bound for this model iv. Forecasts: In Table 3 , the dates for each of the forecasts along with the corresponding acutal reported (official) cases by NCDC can be observed. The performance of the ensemble models can easily be visualized in Fig. 4 . The forecast envelope accurately captures the actual number of real cases detected by the NCDC.
By combining these three algorithms, we compensate for their relative weaknesses while reinforcing their relative strengths ( Table 4 ). Ultimately, we believe in the mantra that All models are wrong, but some are useful -George Box.

Visualization of data comparing COVID-19 cases in South Africa with Nigeria and policy impact
Although, there is more information on the COVID-19 situation in North America, Europe, and Asia in comparison with Nigeria. Fig. 5 presents the comparison of COVID 19 cases in South Africa (SA) with Nigeria (NG). The figure shows that the number of confirmed cases of the infection in SA in geometrically higher than NG. This observation could be explained from two perspectives: (i) SA either has much more people infected with COVID-19 than NG; (ii) or there is wider coverage in the number of people test per day among the population. The latter reason may probably be stronger than the former as there is corroboratory evidence for the later. As of the 20th March 2020, SA had conducted 6,438 tests [10] while Nigeria only performed 69 tests [11] .
One of the current strategies being deployed for the containment of COVID-19 is the lockdown of regions affected to avoid further spread by human movement. SA declared a National lockdown on the 26th of March, 2020 [12] to tame the spread of infection. As shown in Fig.  5 , there was a kink in the progression of confirmed cases on March 27, 2020. The steepness of the curve was relatively flattened and steady for a few days thereafter. The steep upward trend of the infections was stemmed the same day the national lockdown policy was announced. The fact that those that have been infected before the lockdown will be presenting themselves for tests and treatment could have accounted for a slight rise in cases within 2 weeks of lockdown. The full compliance with the policy and number of new cases in post-2-weeks lockdown policy are scenarios that would guide further actions in SA.
It is difficult to model COVID-19 (in fact, any real-life scenario) has inherent modeling difficulties such as the number of tests, randomness, interventions, stay-at-home compliance, curfews, epidemiological realities, and many other factors contribute to the difficulty of forecast models in this case. Countries, especially in Africa who are just witnessing a progressive rise in COVID-19 cases must be decisive in implementing the containment interventions and ensure strict compliance by the citizenry.