Generalized Richards model for predicting COVID-19 dynamics in Saudi Arabia based on particle swarm optimization Algorithm

COVID-19 pandemic is spreading around the world becoming thus a serious concern for health, economic and social systems worldwide. In such situation, predicting as accurately as possible the future dynamics of the virus is a challenging problem for scientists and decision-makers. In this paper, four phenomenological epidemic models as well as Suspected-Infected-Recovered (SIR) model are investigated for predicting the cumulative number of infected cases in Saudi Arabia in addition to the probable end-date of the outbreak. The prediction problem is formulated as an optimization framework and solved using a Particle Swarm Optimization (PSO) algorithm. The Generalized Richards Model (GRM) has been found to be the best one in achieving two objectives: first, fitting the collected data (covering 223 days between March 2nd and October 10, 2020) with the lowest mean absolute percentage error (MAPE = 3.2889%), the highest coefficient of determination (R2 = 0.9953) and the lowest root mean squared error (RMSE = 8827); and second, predicting a probable end date found to be around the end of December 2020 with a projected number of 378,299 at the end of the outbreak. The obtained results may help the decision-makers to take suitable decisions related to the pandemic mitigation and containment and provide clear understanding of the virus dynamics in Saudi Arabia.


Introduction
In early 2019, the COVID-19 disease has started spreading from Wuhan city, Hubei province in China. Few days later, it quickly spread across the world infecting 39,170,503 persons and causing the death of 1,102,926 as of October 16, 2020. Although 29,378,739 individuals have recovered/discharged, 8,688,838 are reported as still active cases [1]. Several actions have been implemented by different countries ranging from total lockdown to social distancing measures in order to contain the virus and limit its spread [2]. Moreover, economic activities and health systems around the globe have been extremely affected which made the world in front of an unprecedented difficult situation.
Large attention is being paid by scientists from different backgrounds since they are hardly working on COVID-19 various aspects including the virus dynamics modeling. Several researches have been published aiming to predict the number of infected cases, the number of deaths and more particularly a probable end date in different countries and sometimes in different cities (provinces) inside the same country. As examples, we can cite references [3] for Saudi Arabia, [4] for Kuwait and [5] for India. The reported results have shown varying degrees of accuracy and reliability due to several causes including the quality and quantity of available information about the virus [6]. Suspected-Infected-Recovered (SIR) model as well as its variants have been used to predict the virus dynamics [2,4,7]. Although these models are continuous-time and the data of reported cases are available in discrete-time frequency (daily), several studies have considered SIR based on discrete-time frameworks [8][9][10]. The effect of containment and control measures has been also considered by using appropriate values of the model parameters. Moreover, time-series models such as autoregressive moving average (ARIMA) [11][12] and its variants in addition to different artificial intelligence (AI) based models are facing luck of sufficient information both quantitatively and qualitatively [5][6]10,12]. The logistic growth model and many of its variants have been utilized for predicting COVID-19 future in different countries. For example, the study in [3] have developed simultaneously SIR and logistic growth models for the case study of Saudi Arabia using data between March 2 nd and May 15 th , 2020. Unfortunately, by checking back their forecasted cumulative infected numbers and projected end date, it has been found that their projections were inaccurate since they expected a number ranging between 69,000 and 79,000 at the end of June 2020 as probable end date (the real number was 190,823 in June 30, 2020) [13]. The case study of Kuwait using logistic regression models has been examined in [4]. The predicted and real total infected numbers are checked and found to be different at the projected end date (predicted: 4,100, real: 17,700) [14]. Five-parameter logistic growth model has been used for reconstructing COVID-19 data in USA [15]. Results showed a good fit and a relatively accurate estimation of new infected cases as for April 4 th , 2020 but unfortunately the developed model has failed in predicting the end date since the virus continues its increase till today (October 16, 2020). The case study of India has been also investigated using logistic growth model in [16]. The developed model has been found to over-estimate the total number of cumulative infected cases by May 22, 2020 (predicted: 1 million, reported: 124,000). The drawback of the logistic growth model is that it is suitable only for modeling early stages of epidemics [17]. Combined models have been also considered for predicting COVID-19 [18][19][20]. A combination of logistic model with machine learning [18], ANFIS with virus optimization algorithm [19] and Gaussian mixture model are examples of such frameworks reported to be efficient. As a conclusion, it is clearly observed that most of the developed models succeeded in fitting the historical data but faced difficulties in predicting the number of infected cases in the coming future.
Until now, based on the above discussed references and to the best of the authors' knowledge, few researches have used phenomenological models for COVID-19 forecasting. Moreover, few studies that considered using Richards model combined with a global optimization swarm intelligence technique (the Particle Swarm Optimization (PSO)) for predicting COVID-19 dynamics are available. For this aim, this study will focus on implementing phenomenological models, namely, generalized growth model (GGM), generalized logistic model (GLM), classical logistic growing model (CLGM) and generalized Richards model (GRM) to predict the dynamics of COVID-19 dynamics in Saudi Arabia. SIR model will be also implemented on the same dataset for comparison purpose. The main motivations behind this choice are summarized as follows: (i) this class of models is known to be robust, (ii) they include few parameters to be calibrated optimally and (iii) COVID-19 (like any other epidemic) should pass through an exponential growing phase at the beginning, should reach a peak and should exhibit a flat and stable curve which is well-captured by both CLGM and GRM. In this study, the identification of the models' parameters is defined as a minimization problem. The quadratic error between the cumulative infected cases and the reported infected cases is minimized by respect to the model parameters. The identification problem is then solved using the global PSO technique. The uniqueness and the stability of the developed models will also be investigated.
The remaining of this paper is organized as follows. Section 2 presents the models investigated in this study as well as details about the computational methodology used to identify the models' parameters. In Section 3, the results of the conducted experiments using Saudi Arabia case study are provided. The obtained results will be discussed in Section 4. Finally, conclusion and recommendations are provided in Section 5.

Materials and methods
In this section, the four phenomenological models as well as the SIR compartmental model used to forecast the dynamics of COVID-19 in short-term horizon are presented and discussed in detail [21][22][23]33,34].

Generalized growth model (GGM)
The generalized growth model is described in Eq 1 below: This model is usually used in early stages of an epidemic spread. It relies on two parameters; namely the intrinsic growth rate, and the scaling growth rate, . According to the value of , three growth profiles can occur: a straight linear behavior ( = 0), an exponential growth ( = 1) and a sub-exponential growth ( < 1). In a model identification framework, the parameter search space is set to be the interval [0-1]. ( ) denotes the cumulative number of infected cases and ( ) is its derivative with respect to the time, t.

Classical logistic growth model (CLGM)
The classical logistic growth model is described in Eq 2.
The model parameters are the same as in the GGM. This model is expected to capture both the early stage exponential curve and a steady state (flat curve) reached at the end of the epidemic. During the late stages of the virus, the number of new cases becomes small and the number of cumulative cases becomes constant equal to the final capacity, K.

Generalized logistic model (GLM)
The generalized logistic model is described in Eq 3 below: This model is similar to the CLGM except of the inclusion of a scaling growth rate, .

Generalized Richards model (GRM)
The generalized Richards model is described in Eq 4 below: This model includes all the features covered by the previous three models and it includes an exponent factor used to capture the deviation of the symmetric S-shaped dynamics of the simple logistic model. It is known to be pertinent since it can capture the epidemic curve in all its phases.

Suspected-Infected-Recovered (SIR) model
In this paper, the SIR model is adopted since it has been reported to be simpler than other complicated models such as SIER [8,[35][36][37]. It will be used for comparison with the developed four logistic models. SIR model includes a set of differential equations (in continuous-time) describing the relationships between subsets of a population including Suspected (S), Infected (I) and Recovered (R) [9][10]. In SIR model, the dynamics of the COVID-19 are described as follows Eq 5-Eq 7: where is the contact rate expressing the probability of being infected and is the characteristic duration of the disease.
Since the COVID-19 pandemic data are available in a discrete-time level (daily), finding SIR model solutions should be based on efficient algorithms operating on discrete-time [9]. For this purpose, the first-order Euler method is used for transforming the SIR continuous-time model (Eq 5-Eq 7) into a discrete-time form as follows (Eq 8-Eq 10) (for this purpose, dS(t) = S(t+dt) − S(t) and dt = 1 day): where the subscript indicates the day number.

Computational methodology
PSO has been used in the literature for modeling the COVID-19 dynamics. For example, in [24], PSO has been used among other metaheuristic algorithms to tune optimally the parameters of an adaptive neuro-fuzzy inference system (ANFIS) used for predicting the number of infected cases in upcoming days in China. PSO has provided good results in terms of coefficient of determination (R 2 = 0.9492) and mean absolute percentage error (MAPE = 5.12%). The PSO technique has been also used successfully in calibrating SEIR model for the case study of Hubei, China [25]. In addition to a good data fitting, the PSO algorithm helped in detecting several nonlinear aspects including chaos. The identified model parameters have been used later to establish efficient control strategies. Robust machine learning approach based on PSO has been investigated by [26] to predict the number of infected individuals in Italy. Seven-parameters SEIR model has been found to provide accuracies in the number of susceptible cases ranging from 10% of the population to 40% respectively in Lombardy and Valle d'Aosta.
In this study, the identification of the five proposed models' parameters is performed using a swarm intelligence population-based algorithm, the particle swarm optimization (PSO) [27]. PSO is an emerging optimization technique inspired from animals' social behavior such as bird flocking or fish schooling. It is known to be easy to implement and simple in its intuitive principle [28]. In the PSO principle, the swarm is composed of S particles coding each one a candidate solution for the epidemic model parameters' identification problem. During the optimization procedure, these particles fly across the D-dimensional search-space looking for the optimal position leading to the minimum value of an objective function. The objective function in this study is the nonlinear least-square error between the reported cumulative infected cases and the number of cases calculated using the D parameters of the model associated to each particle. After being initialized randomly at the beginning of the optimization process, the particle move is based on three components: following its current direction, returning back to its best personal position so far visited and moving toward the position reached by the best particle in the swarm. Let i be the index of the i th particle and j be the j th dimension of the model parameters vector. The particle position is defined as: To the i th particle are associated three vectors having the same size and the same parameters orders as in X i . , and denote respectively the best position so far visited, the position of the best particle among the swarm and the current velocity vector. During the optimization process (with k denotes the iteration index), each particle inside the swarm evaluates its current objective function and updates its position following the set of Eq 12-Eq 14: The inertia weight coefficient is included to create a balance between global search ability at the beginning (high value of the inertia weight) and local search ability at the end of the of optimization procedure. The inertia weight decreases linearly across the iteration index. The computations are stopped when a maximum value of the iteration index will be reached. The adopted solution of the identification problem is reported as the position vector of the global best particle at the final iteration. The optimization problem is then defined as follows: X jl and X ju denote respectively the lower and upper limits of the j th dimension in X. For the SIR model, the criterion to be minimized is given below (Eq 17): 0 and are the first day and the last day of the COVID-19 collected data, respectively. ̂( ),̂( ) ̂( ) are the estimates of S, I and R at the t th day.

Results
In this section, the results of using the particle swarm optimization (PSO) algorithm described in the previous section for calibrating four phenomenological epidemic models and SIR compartmental model using data of COVID-19 cumulative cases in Saudi Arabia covering the period from March 2 nd to October 10 th , 2020 are provided. The PSO setting parameters are presented in Table 1. The dataset used to calibrate the five previously described models is collected from the Saudi Ministry of Health website covering the period between March 2 nd and October 10 th , 2020. This dataset includes 223 daily observations which have been divided into two subsets: 190 observations used for training the models and the remaining 33 observations used for the models' validation. Testing dataset is composed of the five days immediately following October 10, 2020. This dataset is not included in the training and validation phases. When examining the Saudi Arabia's COVID-19 curves, it is easy to detect that an exponential growth for the cumulative infected cases occurs at the beginning of the pandemic and that around September 20, 2020, the curve starts becoming flat. The new daily reported cases' curve exhibits a clear fluctuating and variable behavior presenting four peaks (May 17, June 17, June 30 and July 6). This fluctuation is correlated with numerous measures implemented by the Saudi authorities to mitigate the virus spread while considering economic and social issues.
To compare the four models' prediction results, three statistical performance indicators, namely, the coefficient of determination (R 2 ), the mean absolute percentage error (MAPE), the root mean squared error (RMSE) [27] and the Kolmogorov-Smirnov (K-S) test [29] are used. K-S is a two-sample test used to compare the cumulative distribution probabilities of the reported number of infected individuals and the same number forecasted by one of the investigated models. In Table 2 below, H = 0 indicates that "Do not reject the null hypothesis at the 5% significance level" and H = 1 indicates that "Reject the null hypothesis at the 5% significance level".
In this paper, the results of the best run of each model are reported since PSO is a stochastic optimization technique which need to be run many times. Table 2 shows the optimal parameters for each model as well as the values of the performance indicators including the K-S test result.

Comparison between the five developed models
From Table 2, it can be seen that the generalized Richards model (GRM) outperforms all the other four models in terms of all performance indicators. Moreover, this model allows predicting a probable end date (found to be around the end of 2020). The SIR model is ranked in the second place after the GRM in terms of performance metrics. Both GRM and SIR models were valid according to the Kolmogorov-Smirnov K-S test indicating that they provide forecasted and real (reported) infections derived from similar empirical distribution functions [29]. Both models provide also almost similar values of the coefficient of determination R 2 (0.9953 for GRM and 0.9926 for SIR). This result can be attributed to the fact that logistic models are rigorously derived from the simple epidemiological SIR model as reported in [9,30]. The third-ranked model is found to be the GLM. In fact, it provided a probable end date near from the one provided by the best model (the GRM) but the number of infected cases at the end of the pandemic is higher than the one provided by the GRM. The GLM model is found to be valid according to the K-S test. The fourth model is CLGM. In fact, it yielded a final capacity of the virus almost near of the one issued from the best model (GRM) but an end date around the end of August, 2020 which is not true according to the current COVID-19 statistics in Saudi Arabia. Moreover, the K-S test was non valid for the CLGM model. It should be noted that, according to the curve in Figure 4, the CLGM has a chance to meet the end of the virus capacity since the real curve is still under the curve provided by the CLGM. The last model is the GGM. This model is found to be good in fitting data for the early stage of the epidemic but it becomes inaccurate in the coming stages [31]. Conceptually, the GGM model cannot capture the probable end of the outbreak since its curve will continue growing with time. In Table 3, the actual and forecasted (by the GRM) cumulative number of infected cases are provided for the validation phase. The differences are relatively small which confirms the good fit performance of the GRM when applied to a dataset not used during the training phase.  In Table 4, the number of cumulative infected cases provided by the GRM as well as real reported numbers are compared. A relative error around 4% is found to characterize the model. Note here that the considered five days were not included in the model training and validation phases.

Analysis of the generalized Richards model (GRM) forecasts
The generalized Richards model (GRM) is found to outperform all the other four models developed in this paper. Therefore, its main features including the advantages, the limitations and the robustness will be discussed in detail. In forecasting exercises, the data available at the hand of the designer is usually divided into 80% for training the model and 20% for validating it. However, when investigating our models including the GRM, we have tried other choices for the two subsets and we found that this didn't affect a lot the overall forecasting performances more particularly for the case of the GRM. One of the main drawbacks of the GRM is that it can't predict the fatality curve like in other researches such as [38].
Possible sources of errors in final estimates of the cumulative number of infections have been included in the models' parameters through the particle swarm optimization (PSO) algorithm. In fact, during the parameters' identification process, the vector of parameters is perturbed using the r1 and r2 random numbers (see Table 1: parameter settings of the PSO and Eq 12: particle move). A stretched logistic equation for COVID-19 spreading in Italy has been proposed in [30]. This model is expected to take into account the time-dependency of the growth rate. By comparing the curve provided in Figure 6 of [30] and the curve provided by our GRM model ( Figure 5 of this paper), we can conclude that our model is behaving similarly to a stretched logistic model with the difference that in Saudi Arabia, COVID-19 pandemic didn't reach the flat curve, indicating the end of the virus current wave, yet.
In order to highlight the effect of the changing situation of transmissibility, several scenarios related to intermediate date ranges have been investigated for the GRM as representative of the studied four logistic models. The GRM has been calibrated respectively using the first 100, 150, 200 and 223 days used for the model training. The results of different scenarios are summarized in Table 5. As depicted in Table 5, the four runs' parameters are extremely different which reflects relatively high variability of the growth dynamics of the epidemic. The situation of transmissibility is found to be changing. This fact is also clear in Figure 1 where it is shown that the curve of new infections presented four local peaks. Obviously, the coefficient of determination R 2 increases when the length of the sample used for training the GRM increases. This confirms also that using more data for training may allow detecting different dynamics of the COVID-19 pandemic and therefore being able to explain the effect of the virus containment measures [39] and the efficiency of the social behavior including isolation like in the case of Brazil [40].
• The GRM model has been trained using other parameter fitting techniques (genetic algorithm (ga) and Levenberg-Marquardt (LM)) in order to show the superiority of the particle swarm optimization technique. The PSO is found to be more efficient at least in three points: (1) The PSO has global search ability since the particles are initialized randomly over the whole search space at the beginning of the identification process; (2) The particles are then driven towards the "sub-optimal" solution. Following the inertia weight coefficient which favors the "exploration" at the beginning and favors the "exploitation" at the end of the identification process; and (3) The PSO has been found to operate in complex search-spaces that should be non-convex. However, both LM and "ga" have been trapped into local minima since they are sensitive to the initialization.
• The PSO technique used in this paper to calibrate all the developed models is known to be stochastic since it initializes the particles randomly within the search-space limits and it includes two numbers (r1 and r2) which are generated randomly inside the interval [0-1] at each iteration. The two random numbers multiply respectively the local search component and the global search component (see Eq 12). The particle move includes through the use of r1 and r2 some kind of perturbations reflecting thus the robustness of the derived solutions. Due to this random aspect, the algorithm has been run several times and the results of the best run are adopted. Thus, the PSO is found to yield "sub-optimal" solutions that are not unique. A trade-off between optimality and feasibility is ensured.
• In order to compare our approaches to other approaches applied to the case study of Saudi Arabia, three references studying the prediction of the COVID-19 number of infected cases are selected. The results of comparison are included in Table 6 below.
It can be concluded from this comparison that the results are extremely variable since the forecasting performance metrics depend on the length and quality of the available data, the virus stage and transmissibility variation, the measures taken by the local authorities and on the used method. Forecasting is then a case-sensitive exercise.

Conclusion
In this paper, a modeling procedure based on phenomenological and compartmental models combined with the particle swarm optimization (PSO) technique is carried out using reported cases from Saudi Arabia for the period starting on March 2 nd and ending on October 10, 2020. The cumulative infected cases and a probable end date of the COVID-19 pandemic are predicted using five models including the four-parameter generalized Richards model (GRM). This model has provided good fit of data and a probable projected end date around the end of 2020 with a total number of infected cases around 379 thousand. According to the study results, it has been shown that the COVID-19 outbreak is approaching its end. The Saudi experience can be considered as successful in containing the first wave until now and more efforts in enforcing social distancing should contribute to mitigate the virus.