Expert Systems With Applications

The sanitary emergency caused by COVID-19 has compromised countries and generated a worldwide health and economic crisis. To provide support to the countries’ responses, numerous lines of research have been developed. The spotlight was put on effectively and rapidly diagnosing and predicting the evolution of the pandemic, one of the most challenging problems of the past months. This work contributes to the existing literature by developing a two-step methodology to analyze the transmission rate, designing models applied to territories with similar pandemic behavior characteristics. Virus transmission is considered as bacterial growth curves to understand the spread of the virus and to make predictions about its future evolution. Hence, an analytical clustering procedure is first applied to create groups of locations where the virus transmission rate behaved similarly in the different outbreaks. A curve decomposition process based on an iterative polynomial process is then applied, obtaining meaningful forecasting features. Information of the territories belonging to the same cluster is merged to build models capable of simultaneously predicting the 14-day incidence in several locations using Evolutionary Artificial Neural Networks. The methodology is applied to Andalusia (Spain), although it is applicable to any region across the world. Individual models trained for a specific territory are carried out for comparison purposes. The results demonstrate that this methodology achieves statistically similar, or even better, performance for most of the locations. In addition to being extremely competitive, the main advantage of the proposal lies in its complexity cost reduction. The total number of parameters to be estimated is reduced up to 93.51% for the short term and 93.31% for the mid-term forecasting, respectively. Moreover, the number of required models is reduced by 73.53% and 58.82% for the shortand mid-term forecasting horizons.


Introduction
Growth curves model the development of a population of organisms or individuals over time. Their analysis is composed of a set of statistical methods used to measure the evolution of the population and to identify those factors that most influence the growth. Their versatility makes them applicable to different sciences, like economy, to study the change over time of economic indicators (Encinas-Ferrer & Villegas-Zermeño, 2015), or engineering, to describe the crack growth curves of materials (Mohanty, Verma, Ray, & Parhi, 2010;Siqueira, Baptista, Guimarães, & Ruchert, 2010). However, the most popular use of growth curves is in the study of controlled bacterial growth, i.e., to describe the reproduction and behavior of microorganisms under different conditions, such as temperature or pH (Allen & Waclaw, of the virus was proposed. For instance, some of these measures were nationwide lockdowns, curfews, the use of facial masks or social restrictions, among others. Since then, some research has been carried out in order to demonstrate the effectiveness of these measures in mitigating the effects of the virus (Kharroubi & Saleh, 2020;Kraemer et al., 2020). In the case of Spain, the national emergency state was declared on March 11, 2020, allowing the government to both close non-essential stores and establish strict home confinement that lasted until June 21, 2020, among other safety measures. The main objective of these control policies was to avoid hospital overcrowding due to the imminent increase in hospital admissions and ICU bed occupancy.
During the first outbreak, most countries adopted the same restrictive measures, resulting in similar wave behavior and duration. However, the nature of the successive waves taking place in the different countries depended on numerous parameters, such as different government decisions or population density. Despite this, all the outbreaks show the same contagion rate behavior, exhibiting a worldwide common pandemic dynamics. Hence, the existence of robust forecasting systems is paramount for anticipating the critical moment of an outbreak and devising suitable prevention measures to minimize the socioeconomic impact on the population. Since the first transmission of SARS-CoV-2 in Wuhan, there has been a growing interest in the scientific community, where different proposals have contributed and helped the health and social sectors. Regarding the Machine Learning (ML) field, two areas were specifically explored: early SARS-CoV-2 diagnosis and pandemic indicators forecasting. On the one hand, regarding the early diagnosis area, many ML techniques have been applied to improve traditional diagnostic tools, providing support systems or standalone methods when viral testing is unavailable. For instance, Hidayat et al. (2022) proposed a hierarchical clustering approach combined with a feature selection procedure for the detection of COVID-19 infections using data from gas sensors. Several ML algorithms were trained and evaluated in de Fátima Cobre et al. (2022) using data from Chinese patients to predict the diagnosis, severity and fatality of COVID-19, identifying biomarkers associated with the positivity, severity and fatality of the virus. Alao, Cevik, Yasin, Jaiganesh, and Abu-Zidan (2022) studied the impact of the COVID-19 pandemic on the pattern of injury and outcome of hospitalized trauma patients in a city of the United Arab Emirates, to anticipate as much as possible to future pandemics. Clustering techniques were also applied in Mittal, Pandey, Pal, and Tripathi (2021), in order to improve the results of supervised techniques caused by the limited size of chest X-ray images (CXR), chest computed tomography (CT) scan, and lung ultrasound data. CXR images are also used in Liu, Cai, Tang, Zhang, and Wang (2022), where class residual attention networks are proposed in combination with an image preprocessing step, obtaining good accuracy in the diagnosis of positive cases. Furthermore, it is worth of mention that chest CT image features have been very useful in the creation of ML models to support the diagnosis of COVID-19 cases (Li, 2020;Ufuk & Savaş, 2020). A comprehensive review of these techniques can be found in Raptis et al. (2020), Salehi et al. (2020), Sun, Zhang, Li, and Xu (2020).
On the other hand, regarding the pandemic indicators forecasting field, the first proposals consisted in applying epidemiological models (Kermack & McKendrick, 1927), which acquired enormous importance in modeling the evolution of the pandemic (Thompson, 2020). These models were used to describe the dynamics of the population using different compartments. Then, the individuals flow between these compartments according to a set of parameters. For instance, in Zhang et al. (2022) the authors proposed a SEIR-FMi (Susceptible-Exposed-Infectious-Recovery with Flow and Medical investments) method in order to extend traditional epidemiological models including intraand inter-city population movement and medical investment resources. Haghrah, Ghaemi, and Badamchizadeh (2022) presented another approach, using data from seven countries to build a fuzzy SIDR (Susceptible-Infectious-Recovered-Dead) model to solve the lack of parameter adaptability of conventional SIDR models. A comprehensive review of this kind of models is presented in Xiang et al. (2021).
Apart from epidemiological models, ML techniques have been widely applied to forecast the evolution of the COVID-19 pandemic. The complexity of the ML techniques used in this field varies from simple to advanced approaches. Regarding the former, Rath, Tripathy, and Tripathy (2020), proposed a multiple linear regression to accurately predict the number of active cases in India and Odisha. Another work using traditional models was presented in Kufel (2020), where an Autoregressive Integrated Moving Average (ARIMA) model was used to forecast the dynamics of the pandemic at different stages in several European countries. In addition to these classical methods, many studies were carried out using more complex forecasting techniques. For instance, in Chimmula and Zhang (2020), a forecasting system is developed using a Long Sort-Term Memory (LSTM) model for predicting the dynamics at different time horizons, from 2 to 14 days. In this sense, there are several works that have considered this sort of models. Islam, Islam, and Asraf (2020) proposed a combination of convolutional neural networks and LSTM to diagnose COVID-19 from X-ray images. And (Shahid, Zameer, & Muneeb, 2020) exploited the use of Bidirectional LSTM (Bi-LSTM) models to predict the number of confirmed cases, deaths and recoveries of COVID-19 in ten major countries. Also, Garg, Kumar, and Muhuri (2022) proposed the use of a Multi-Source Deep Transfer Learning (MSDTL) approach for forecasting the evolution of COVID-19 infections in regions where there is not enough information. A comprehensive review of ML techniques, including regression, classification, time series processing, and deep learning techniques applied to model COVID-19 dynamics and temporal features, is presented in Comito and Pizzuti (2022).
The use of clustering and prediction techniques in modeling COVID-19 behavior has been largely unexplored. Nonetheless, some studies have proposed methodologies in this area. One such methodology, proposed in Said, Erradi, Aly, and Mohamed (2021), involves clustering countries based on their properties, constructing multivariate time series, and using a Bi-LSTM network to forecast daily cumulative COVID-19 cases for any country based on its cluster's data and associated lockdown measures. While this approach is promising, the suitability of the Bi-LSTM network for short time series may be questionable, and using many multivariate series could worsen data quality issues that impact the model's accuracy, which is heavily dependent on the availability and quality of the data used for training and prediction. Another methodology involving clustering is presented in Liu, Clemente, et al. (2020), where news media activity and digital traces are considered as inputs in a model for forecasting the virus spread in several Chinese provinces. While these sources of data can provide valuable insights into public awareness and perception of the outbreak, they may also introduce biases and inaccuracies into the forecasting model. For example, individuals may search for information about COVID-19 for reasons unrelated to actual disease activity in their region, such as out of curiosity or due to news coverage from other regions. Similarly, news media activity may not accurately reflect the actual spread of the disease, as news outlets may prioritize sensational or attention-grabbing stories over accurate reporting.
In this study, a two-phased methodology is developed to reduce the number of forecasting models and their complexity by addressing the analysis of the number of SARS-CoV-2 contagions over time as a bacterial growth curve. Bacterial growth curves typically have four phases: lag (the bacteria prepare to grow and multiply), exponential or log (it actively grows and divides), stationary (the growth slows down or stops), and death (the population begins to decrease). In our case, the number of contagions is treated as a bacterial growth curve due to the similarities they share in shape. In this sense, the proposed methodology can be applied to any process that behaves similarly, carrying about the same statistical analysis to obtain similarities between a group of curves.
While viruses and bacteria are indeed distinct biological entities, they share deep similarities in their growth patterns. These similarities make it reasonable to consider a bacterial growth rate model for viral growth rate analysis. Both viruses and bacteria may be affected by various environmental factors such as temperature, pH, and nutrient availability. In addition, both types of pathogens may exhibit similar patterns of growth and transmission, including the initial lag phase followed by exponential growth, a stationary phase, and eventual decline.
The mentioned methodology consists of two phases: firstly, an analytical procedure is carried out by transforming the 14-day incidence curve of an outbreak to a set of describing and informative features. A novel dataset is built as a result of applying this transformation to all the outbreaks of all the considered areas. This dataset contains the entire behavior for each localization during the pandemic period. This novel representation aims to discover similar pandemic behaviors across different areas by applying a curve clustering technique. In our case, due to the ease of transmission, the SARS-CoV-2 positive diagnoses growth curves keep a strong similarity with bacterial growth curves, and thus the four curve phases the forehead mentioned have been taken into account for the clustering stage. Concretely, bacterial growth curves and COVID-19 contagion curves are similar in that both represent the growth or spread of a population over time. Both sorts of curves typically have an initial period of slow growth or spread, followed by an exponential phase where the population spreads rapidly, and then a plateau or decline as the population reaches its maximum size or the spread of the disease is contained. Concerning the second phase, a curve decomposition procedure is applied to all the outbreaks and all the areas through an iterative polynomial fit procedure, obtaining informative features for each day. Then, the information from the locations belonging to the same cluster is combined, and different forecasting datasets are created to model both short-and mid-term 14-day incidence predictions by using Evolutionary Artificial Neural Networks (EANNs). The results demonstrate that this methodology has two main advantages: (1) reducing the number of models in a 75.53% and 58.82% for the short-and mid-term forecasting, respectively, and (2) reducing up to a 94.36% of the number of parameters to be estimated, maintaining the same accuracy or even improving it in several locations.
Summarizing, the main contributions of this methodology are the following: 1. The use of an analytical clustering procedure based on decomposing the curve of contagions. For this, an iterative polynomial process has been applied to obtain meaningful characteristics, which represent the dynamics of the contagions in a specific region. 2. The use of the clustering stage groups as much as possible similar sanitary districts. This grouping implies a reduction in the number of parameters used by the forecasting models. In this way, one forecasting model is used to forecast the evolution of the COVID-19 contagion rate in several similar sanitary districts. 3. A comparison against developing individual forecasting models per sanitary district has been carried out, demonstrating that the loss in performance is minimal in comparison to the reduction in terms of complexity. 4. The methodology has been assessed in Andalusia, the most populated region of Spain. It is important to mention that this methodology applies to any region across the world. 5. The use of autoregressive models for building the datasets used for the forecasting stage. Specifically, vector autoregressive models have been used, allowing the use of recent past values for several exogenous variables. Note that in this kind of problem, past information is valuable as it provides significant information on the recent dynamics of the time series.
6. Forecasting models have been developed using artificial neural networks trained utilizing an evolutionary algorithm. Models have been shown to achieve an excellent performance in forecasting the COVID-19 contagion rate.
The remainder of this paper is organized as follows: a complete description of the data used, the temporal limitations of the outbreaks, and the 14-day incidence definition is presented in Section 2. Then, Section 3 details the two-step methodology. Concretely, Section 3.1 presents the first step of the methodology, including the clustering algorithm and internal validation metrics used, whereas Section 3.2 describes the second step of the methodology, including the curve decomposition procedure and the forecast models. Section 4 presents the experimental settings, the results of the clustering and the forecast steps, and a statistical analysis. Finally, Section 5 concludes this work.

Data description
This study is focused on the 34 sanitary districts of Andalusia, the country's second largest and most populated autonomous region, located in southern Spain. Sanitary districts are geographical divisions that have their own local health competencies. The geographical representation of the Andalusian territory and the district's population are shown in Fig. 1 and Table 1, respectively.
The COVID-19 infection data used in this study was collected from the official website of the Andalusian Government. 1 The website provides daily reports on the number of COVID-19 infected, recovered, and deceased individuals across the 34 sanitary districts of Andalusia. The data was collected until the end of the sixth wave on March 28th, 2022. From this date, a less exhaustive case detection system surveillance method was implemented.
This study considers the reported COVID-19 contagion data in these districts from August 1, 2020, to March 22, 2022. During this period, five different outbreaks (from the second to the sixth) took place starting almost at the same time in all the regions. The first wave (spanning from March 11, 2020 to July 31, 2020) was excluded due to a combination of factors. Firstly, a strict lockdown took place during this time, which caused an abnormal spread of the virus. Secondly, the reporting system was not completely functional. This resulted in an erratic low-intensity wave, as the reporting of positive diagnoses data was not considered a priority. For this, we decided to exclude the first wave from our analysis. Moreover, the fourth wave (spanning from March 13, 2021, to June 15, 2021) was also excluded since the vaccination campaign took place, making the data corresponding to this period irregular over the different locations and unusable for the analytical procedure proposed in this paper.
Consequently, the following outbreaks are considered for this study, being the time interval of each one the same for all the districts: the second wave, which spans from August 1, 2020, to December 15, 2020, covers the period from mid-summer to the beginning of Christmas. The third outbreak happened during the 2020 Christmas holidays, from December 16, 2020 to March 13, 2021. The high mobility of the population during these dates led to a fast increment in the number of contagions. The fifth wave began the June at 16, 2021 and lasted until September at 15, 2021. Thanks to the high number of vaccinated people, this outbreak was less intense than the previous summer outbreak. Finally, the sixth wave spanned from November 5, 2021 to March 22, 2022. Its incidence severely increased in all the countries due to the Omicron variant of SARS-CoV-2. This new strain of the virus involved higher risks of reinfection (Araf et al., 2022). In this sense, the effectiveness of the actual vaccines over this variant is still under study (Chenchula, Karunakaran, Sharma, & Chavan, 2022). However, this new strain provoked less severe symptoms than the previous variants, leading  Table 1. to a lower hospitalization rate (Mahase, 2021). Considering the limitations previously discussed in Díaz-Lozano, Guijo-Rubio, Gutiérrez, Gómez-Orellana, et al. (2022), the final number of daily observations corresponding to the second, third, fifth, and sixth waves is 137, 88, 92, and 138, respectively, making a total of 455 observations. An important aspect of the reported data is the presence of an inherent pattern of fewer positive COVID-19 diagnoses on weekends, which results in noisy daily contagion time series. This effect is shared by most of the countries (Ricon-Becker, Tarrasch, Blinder, & Ben-Eliyahu, 2020) and it is caused by different factors, such as delays in testing results. For the analysis carried out in this paper, the contagion time series of the 34 sanitary districts have been transformed to the 14-day cumulative COVID-19 cases per 100.000 people, also known as 14-day incidence. This is a well-known indicator (also known as 14day incidence) used by governments as a measure of the intensity of the different outbreaks. It takes into account the population size of the region, a crucial aspect when the virus transmission rate is high. This indicator is calculated as follows: where and are the 14-day incidence and the number of reported cases at the day , respectively, and is the population size of the district. The number of daily reported COVID-19 cases in the Sevilla distrito (SD) and Córdoba distrito (CD), which are two of the 34 districts in Andalusia, as well as their 14-day cumulative transformation, are displayed in Figs. 2 and 3, respectively. The sawtooth effect produced by the weekend pattern is easily visible in Figs. 2(a) and 3(a), which is mitigated by applying the aforementioned 14-day incidence transformation (Figs. 2(b) and 3(b)).
Although the evolution and the intensity of the outbreaks are slightly different for each region, they always exhibit similar behavior and timing: at the beginning of the wave, a small number of people get infected. Then, when a sufficiently high number of people contract the disease, the growth in infections becomes exponential, given the ease of contagion of the virus. At this point, control measures are usually applied to bring under control the outbreak, reaching the maximum number of contagions. Shortly after the peak, the number of daily contagions decreases rapidly, and the outbreak stabilizes with residual infections. Hence, in each outbreak, three different important stages can be identified: beginning, growth and stabilization.

Methodology
In this section, we detail the two-phased methodology proposed. First, as the data previously described in Section 2 has a temporal component, patterns are treated as time series. A univariate time series is a chronologically ordered sequence of values, i.e. = ( 1 , 2 , … , ). Time series can be found in a wide variety of fields from renewable energies (Guijo-Rubio, Gómez-Orellana,  to medicine (Mohan et al., 2022). Moreover, several ML tasks could be applied to time series, from classification (Ismail Fawaz, Forestier, Weber, Idoumghar, & Muller, 2019) or clustering (Aghabozorgi, Shirkhorshidi, & Wah, 2015) to forecasting (De Gooijer & Hyndman, 2006;Lim & Zohren, 2021).
Time series can also be multivariate if they have more than one dimension. A multivariate time series is composed of several exogenous variables (i.e. dimensions) whose relationship with the endogenous variable is used in the analysis. Formally, an -dimensional multivariate time series of length is defined as = ( 1 , 2 , … , ), where = ( ,1 , ,2 , … , , ), i.e. , is the th observation of the th dimension. The proposed methodology consists of two phases: (1) a cluster analysis to group (or cluster) those sanitary districts with similar behavior during the pandemic by identifying the typical four stages of a bacterial growth curve, and (2) for each cluster, a forecasting approach is applied to predict the COVID-19 incidence over all the sanitary districts of that cluster at once. In this way, the forecast modeling benefits from combining information from several sanitary districts. Moreover, this task is simplified as much as possible since one forecast model provides a prognosis for several districts' 14-day incidence. To demonstrate that this methodology achieves competitive results, a comparison against those achieved by 34 individual analysis (i.e. forecasts are trained only with specific information of the corresponding sanitary district) is carried out. Figs. 4 and 5 depict the methodologies for both individual and cluster analyses, respectively.

Cluster analysis
The first phase of the proposal consists in performing a descriptive analysis following a clustering approach. The clustering task aims to group sanitary districts sharing similarities concerning the behavior across the pandemic. Typically, clustering is used as a preprocessing step before applying prediction techniques (Aghabozorgi & Teh, 2014;Graves & Pedrycz, 2010). Therefore, this phase will serve to know which sanitary districts have behaved similarly; hence, comparable measures to control the expansion of the pandemic could have been or can be applied in the future or other upcoming pandemics. In this way, achieving robust clusters is essential to accurately predict the daily 14-day incidence for a given group of sanitary districts.
Two sorts of clustering techniques can be applied: (1) traditional clustering techniques and (2) specific Time Series Clustering (TSC) techniques. The former is composed of a wide variety of well-known techniques. From this group, two standard clustering techniques stand out: the partitional clustering (Celebi, 2014) and the hierarchical clustering (Murtagh & Contreras, 2012), as they have been applied to a wide variety of fields. On the one hand, -means is the most wellknown partitional clustering algorithm, whose aim is to partition the patterns into clusters, assigning them to the cluster with the closest centroid (the average values of the features of the patterns belonging to the cluster). On the other hand, the hierarchical clustering (Murtagh & Contreras, 2012) builds a hierarchy of groups through a dendrogram, a diagram that shows the clusters' hierarchy and the distance between them. The two most known algorithms of this sort are the agglomerative and the divisive (Roux, 2018) ones. The first algorithm follows a bottom-up strategy, where at the beginning the clustering patterns form their own cluster, and then, iteratively, the two most similar clustering patterns (or clusters) are merged. Otherwise, the divisive algorithm does the opposite, all the clustering patterns start in one cluster, and then, the two most different clustering patterns (or clusters) are split at each iteration. Also, several enhancements have been proposed in the literature. For instance, in Wu, Peng, Lee, Leibnitz, and Xia (2021), the authors presented a novel method for hierarchical clustering using two types of structural similarities in nearest neighbor graph, considering that the maximum similarity is a transitive and closure relation between clusters, instead of being the highest similarity during each iteration.
Furthermore, as data considered in this work are time series of the daily 14-day incidence, Time Series Clustering (TSC) techniques may also be applied (Aghabozorgi et al., 2015), considering the temporal dimension of the data. During the last few years, there has been an explosion of interest in TSC, with several novel proposals. For instance, in Guijo-Rubio, Durán-Rosal, Gutiérrez, Troncoso, and Hervás-Martínez (2020), the authors presented a novel TSC technique that combines a preprocessing segmentation approach with a standard clustering technique. Recently, Holder, Middlehurst, and Bagnall (2022) presented a comprehensive review of elastic distance measures for TSC, recommending the use of the move-split-merge method along with -medoids as a baseline technique for TSC.
Even though the data used in this work consists of 14-day incidence time series, these time series have two important peculiarities: (1) the time series are discontinuous as only outbreaks are considered, and points between some waves are not taken into account. Moreover, the fourth wave is not considered as it was aforementioned, causing a considerable gap between outbreaks. And (2) the time series (i.e. the outbreaks) are considerably short, which makes it difficult to apply specific TSC techniques. In this regard, traditional clustering techniques are more appropriate to be applied. Nevertheless, these well-known techniques are not directly applied to the original time series data but to a transformation that accordingly characterizes the time series (Holder et al., 2022). This strategy has been proven to achieve outstanding results while reducing the computational cost, as the input space is significantly reduced. In this sense, the original 14-day incidence time series is transformed into a set of features employing a curve characterization process, which is explained in Section 3.1.1. After that, the agglomerative hierarchical clustering technique is applied. The main reasons behind choosing this method over the rest of the traditional techniques are: (1) it is deterministic since there is not any random initialization nor any other stochastic step; (2) the existence of the dendrogram facilitates the decision-making about the number of clusters to be chosen; (3) it is the most straightforward approach, which makes it easy to interpret; and finally, (4) this technique can achieve outstanding performance, and, when applied to small-or medium-sized datasets, does not entail an enormous computational load.

Curve characterization
The feature extraction procedure followed in this paper was first proposed in Díaz-Lozano, Guijo-Rubio, Gutiérrez, and Hervás-Martínez (2022). This method, known as curve characterization, transforms the TSC problem into a traditional clustering task, obtaining high-quality results in terms of internal validation metrics.
This procedure identifies three critical instants of each outbreak curve: (1) the instant when the 14-day incidence begins to grow exponentially ( ); (2) the instant when the outbreak stabilizes ( ), and (3) the peak of the outbreak ( ). The two first instants are identified by analyzing the percentage change of the 14-day incidence for the previous day. Thus, the day with the maximum percentage change from the beginning of the outbreak to the peak is denoted as . The day with the minimum percentage change from the peak to the end of the outbreak is denoted as . The outbreak peak, denoted as , is the day of maximum incidence. To graphically illustrate this procedure, Fig. 6 shows both the original 14-day incidence curve and its percentage change evolution for the third outbreak of the district of Granada.
These critical instants define, in turn, 6 stages for each outbreak. These stages are formed by the periods between pairwise critical instants and the beginning and end of the outbreak and define some of   the phases of a typical bacterial growth curve: (1) from the beginning of the outbreak to (rising stage) behaves similarly that the lag bacterial phase; (2) from to (exponential stage) exhibits a growth behavior similar to that of the bacterial log phase; (3) from the beginning to (growing stage); (4) from to the ending of the outbreak (stabilization stage) represents the death stage of a bacterial growth curve; (5) from to (decreasing stage); (6) from to the ending of the outbreak (ending stage). After that, the intensity of each stage ( ) is computed as the angle tangent formed by the increment of 14-day incidence ( ) and the stage duration in days ( ). As an example, the intensity for the growing stage is computed as = ( ) = . These stage intensities work as quantitative representations of the phases of each outbreak. They specify the contagion speed rate and the stabilization tempo of a specific wave. It is worthy of mention that the beginning and the exponential phases are sub-stages of the growing phase, whereas the decreasing and ending ones are sub-stages of the stabilization phase.    These sub-stages are intended to describe the container phase in more detail. To provide an insight into these intensities, Fig. 7 graphically shows the stages identified for the third outbreak of Granada's sanitary district, and Table 2 quantitatively shows them. Therefore each sanitary district is then characterized by 24 intensity features (6 values per outbreak). 2

Clustering
As it was aforementioned, a bottom-up agglomerative approach is used. In this technique, each descriptive pattern starts in its own cluster, and clusters are merged iteratively according to an objective function. For this, we have considered Ward's method (Ward, 1963) as the objective function as is one of the most popular. This method considers the distance between two clusters, and , as the increment in the sum of squared distances between clustering patterns of the merged group, . This increment, called merging cost, is defined as follows: where and are the number of clustering patterns, and and are the centroids, both for the clusters and , respectively. ‖ − ‖ is the euclidean distance between both centroids. This distance increases as the clusters are merged. Nevertheless, Ward's method aims to keep this growth to the minimum possible.
The hierarchical clustering builds a dendrogram showing how clusters are merged, including the distances between them. The number of clusters is appropriately chosen according to several criteria, which focus on measuring the quality of the resulting groups attending to different parameters (Arbelaitz, Gurrutxaga, Muguerza, Pérez, & Perona, 2013). These criteria are divided into two categories: internal and external metrics. On the one hand, the external metrics validate the cluster analysis using ground-truth labels. On the other hand, internal metrics are used to measure the goodness of the cluster structure without considering external information. For this, internal metrics measures two meaningful characteristics of a cluster analysis (Liu, Li, Xiong, Gao, & Wu, 2010): cohesion, which aims to reduce the intra-cluster distance, and separation, which aims to maximize the inter-cluster distance. Multiple internal metrics have been proposed in the last few years. Apart from the dendrogram, three internal metrics are considered to select the appropriate number of clusters. The internal validation metrics considered are the following: 1. Sum of Squared Errors ( ), which measures the cluster cohesion by computing the euclidean distance between the patterns of a cluster to their respective centroids. The is calculated as follows: where is the number of clusters, represents the th cluster, , is the th descriptive pattern belonging to the th cluster and is the centroid of the th cluster. Note that the is a strictly decreasing function, being = 0 for = , where is the total number of clustering patterns. 2. Silhouette coefficient (SI) (Rousseeuw, 1987), which measures both the cluster cohesion and separation. The of a given cluster result is calculated as follows: M. Díaz-Lozano et al. Fig. 8. Cumulative 14-day incidence ( * ) of the second, third, fifth, and sixth waves for the district of Granada.
being ( ) the individual silhouette value for the th descriptive patterns, calculated as: where ( , ) is the average distance of , to all other clustering patterns of its same cluster and ( , ) is the minimum mean distance from , to all the clustering patterns belonging to a different cluster. This metric ranges in [−1, 1] and measure how well clustered is a descriptive pattern. High positive values indicate that within dissimilarity of , ( , ), is much smaller than the smaller between dissimilarity, ( , ), hence the pattern is well clustered. On the contrary, high negative values indicate a poor-quality cluster result for a given pattern. 3. Calinski-Harabasz Index ( ) (Caliński & JA, 1974), which also measures both the cluster cohesion and separation. The is calculated as follows: where is the number of patterns belonging to the th cluster, and is the global centroid.

Predictive modeling
The second phase develops forecasting models to estimate the shortand mid-term incidence. For this, an analytical procedure is applied to create datasets containing information about the dynamics of the wave time series for each district. This procedure consists, in turn, of two sub-steps: first, a curve decomposition method is applied to each 14-day incidence curve for all the waves and the 34 sanitary districts. Second, specific information for each day is extracted. After that, autoregressive features are included using lags of the original features. Finally, Evolutionary Artificial Neural Networks (EANNs) are used to perform the final forecast of the short-and mid-term incidence.

Curve decomposition
Curve fitting techniques are a well-known tool for describing the evolution of dynamic systems. When observed data exhibits shapes that can be described using an algebraic expression, curve fitting can help to both understand the inherent behavior of a process and extract relevant information, which can, in turn, help in posterior analysis tasks. In this sense, curve fitting techniques have been applied in many studies to use the information gained from the curve fitting methodology by including it in ML tasks, either supervised (Hamidi, Ghassemian, & Imani, 2018) and unsupervised (Abraham, Cornillon, Matzner-Løber, & Molinari, 2003;Martin et al., 1998).
In the present study, the original 14-day incidence ( ) curves of all the districts have been transformed to their accumulated form, i.e., * , = ∑ =0 , . This transformation results in a monotonically nondecreasing time series where the forenamed outbreak stages are still present and visible. i.e., when the number of infected people is low at the begging and end of an outbreak, the cumulative curve keeps growing but with lower intensity. In Fig. 8, the cumulative 14-day incidence ( * ) curves of the second, third, fifth, and sixth waves in the district of Granada are shown. estimated 3 for the day of the th outbreak * , cum. 14-day incidence of the th outbreak after days Following the guidelines given in Díaz-Lozano, Guijo-Rubio, Gutiérrez, Gómez-Orellana, et al. (2022), we perform a decomposition of the outbreaks curves using a polynomial model. A -degree polynomial function is composed of + 1 parameters and is defined as follows: * where * , is the cumulative 14-day incidence and days from the beginning of the th outbreak, α = ( 0 , 1 , 2 , … , ) are the parameters of the polynomial fit, being 0 the intercept, and, finally, is the error term. The choice of a 3-degree polynomial model was based on the analysis of the incidence curves, which exhibit a characteristic shape with a single inflection point. A 3-degree polynomial model is the simplest model capable of adjusting to this shape, with only one inflection point needed to capture the change in trend. Additionally, this model has been shown to have a lower variance compared to higher-degree models, leading to overfitting, decreased generalization performance, and non-representative features. Then, to extract information for each day of each outbreak, a 3-degree polynomial model is fitted using observations from the start of an outbreak (day 0) to the day . It is necessary to mention that a minimum number of observations (days), , is required to achieve accurate models for each wave and district. Accordingly, can only take values in the range [ , ], being the duration in days of the th outbreak. The accuracy of the fitted models is measured using the well-known coefficient of determination, 2 . The mean 2 of the fitted models for each day has been computed independently for each sanitary district.
After applying this curve extraction procedure, every day of an outbreak and district is transformed into a pattern or case, , , composed of 5 features, detailed in Table 3. Hence, a day is described by 5 values representing the behavior of the outbreak evolution, resulting in a 5-dimensional multivariate time series, where * , is the endogenous variable, i.e. the variable to be estimated.
It is worth mentioning that the wave identifier, , has been intentionally excluded as an input feature in , . The reason behind this exclusion is to develop general forecasting models able to capture the evolution of all waves, regardless of the order and moment when they took place.

Autoregressive models
Autoregressive (AR) techniques model univariate time series using a linear combination of past values or lags. Concretely, a -order AR model models a future value of a time series using lags of itself, in this case, past values of * , . Formally, it is defined as follows: * +ℎ, = 0 + where is the th coefficient of the AR model with 0 being the intercept, ℎ is the forecast horizon considered and is the error term. When dealing with multivariate time series, AR cannot straightforwardly be applied, employing a Vector Autoregressive (VAR) model instead. VAR models are an extension of AR models. In this case, the future value is modeled using a linear combination of exogenous variables instead of a single one. Hence, VAR models comprise lags for all the dimensions composing the multivariate time series. Theorder VAR model used for modeling the cumulative 14-day incidence is defined as follows: * +ℎ, = 0 + where is now a vector of coefficients and − , are the − lagged values of the variables shown in Table 3. Using VAR models makes it possible for models to benefit from the inherent relationship between all the involved dimensions of the time series. In order words, when VAR models are applied, forecasting models are able to consider recent past information of the dynamics of the time series. This valuable information allows the forecasting model to perform a better prediction than if it would not know the exact previous situation. For instance, it allows the forecasting model to know if we are predicting the contagions rate when the outbreak is starting or ending. Previous works, such as Guijo-Rubio et al. (2018) or Lydia, Kumar, Selvakumar, and Kumar (2016), have demonstrated that VAR models are a powerful methodology to improve the performance of standard forecast models.

Artificial neural networks
Once the forecasting dataset is built, any forecasting ML technique could be applied. In this case, we have chosen Artificial Neural Networks (ANNs) (Bishop et al., 1995) since they have been successfully applied in several fields, such as medicine, as aforementioned in Section 1. Several approaches have been presented to the literature in the last decades: one of the first approaches of Feed-Forward ANNs (FNNs) was presented in Joseph (1961), which included a hidden layer with several nodes. The MultiLayer Perceptron (MLP) (Bishop et al., 1995) includes Sigmoid Unit (SU) functions in the nodes of the hidden layer, while the Radial Basis Functions (RBF) (Broomhead & Lowe, 1988) neural networks include Gaussian transfer functions in the neurons of the hidden layers. Finally, the Product Unit (PU) (Durbin & Rumelhart, 1989) neural networks are based on a weighted product for the input of the nodes.
Regardless of the basis functions of the hidden layer, a feed-forward one-layer ANN can be formally defined as: where ( , γ, ) is the output of the model considered for the forecasting task. For a given day and a forecast horizon of ℎ days, ( , γ, ) = * +ℎ and is a predictive pattern composed of the inputs of the models, i.e. the features describing any outbreak evolution after days. γ is a vector including the weights from the nodes in the hidden layer to the output node, γ = ( 0 , 1 , … , ), 0 representing the bias. = ( 0 , 1 , … , ) includes the weights from the inputs to the nodes of the hidden layer, i.e. = ( 0 , 1 , … , ) contains the weights from the input nodes to the th hidden node. Finally, ( , ) represents the basis function of the th hidden neuron. In this study, we have considered as basis functions the Product Units (PUs), which are able to obtain excellent results for regression and classification tasks, while limiting the number of connections when properly trained. Moreover, they have been proven to achieve excellent performance in time series forecasting (e.g. Fernández-Navarro, de la Cruz, Gutiérrez, Castaño, and Hervás-Martínez (2018)). PUs are typically used when there is a strong interaction between the inputs and when the decision regions are not separable into hyperplanes. Among the advantages of using PUs as basis functions stand out the ability to form a higher-order combination of inputs, developing a higher information capacity with a small architecture. PUs are formally defined as:

Evolutionary artificial neural networks
For the training step of ANNs, several approaches have been presented in the literature: one of the first methods was the backpropagation algorithm (BP) (Rumelhart, Hinton, & Williams, 1986), which has been widely used. However, it suffers from the problem of not being guaranteed to find the global minimum of the error function but a local minimum. Lately, among some other approaches, Evolutionary Algorithms (EAs) (Yao, 1993) were proposed. EAs are a good approach for training ANNs, as they overcome the previous inconveniences of the BP algorithm. An EA is part of the evolutionary computation family, which uses mechanisms inspired by biological evolution to determine the best solution from a set of candidate solutions known as population. By mimicking the evolutionary process and natural selection, EAs perform a global search to achieve excellent solutions (i.e. finding the error surface with the lowest error). In this sense, EAs are an excellent approach for optimizing the architecture and the weights of the ANNs, giving rise to Evolutionary ANNs (EANNs).
In this study, we have used the EA proposed in Martínez-Estudillo, Martínez-Estudillo, Hervás-Martínez, and García-Pedrajas (2006), whose main steps are detailed in Fig. 9. The goal of the EA is maximizing the fitness of the population of ANN throughout the evolution stage. For this, the population of ANN is optimized by means of maximizing its fitness (i.e. improving its performance) during the evolutionary stage. The fitness is a quality value computed using a Fitness Function (FF). This FF is calculated as a strictly decreasing transformation of the loss function, in this case, the Mean Squared Error (MSE). In this way, the best individual of the ANN population is the ANN with the lowest MSE, and hence, the highest fitness value. Specifically, the FF is defined as: where MSE is the loss function considered, which is in turn defined as: where * +ℎ, and̂ * +ℎ, are the observed and forecasted cumulative 14-day incidence, respectively, and = (2, 3, 5, 6) are the indices of the considered outbreaks. Note that̂ * +ℎ, is obtained by using the output of the model ( , , γ, ).
Specifically, the EA detailed in Fig. 9 starts by initializing a random population of ANNs, which consists in randomly defining the number of nodes in the hidden layer and the connections between the inputs and the hidden neurons, and the synaptic weights of these connections. Then, the entire population is assessed based on the fitness function defined in Eq. (12). Later on, the population is ordered regarding the fitness function. After that, the worst 10% of the population is replaced by a copy of the best 10% of the population. In other words, those individuals at the bottom of the ordered population are replaced by those at the top. Subsequently, two mutations are applied; (1) the parametric mutation is applied to the best 10% of the population, and (2) the structural mutation is applied to the remaining 90% of the population. The strength of each mutation varies during the evolution stage. The parametric mutation modifies the synaptic weights by adding Gaussian noise of zero mean and decreasing variance throughout the evolution. In this sense, the EA moves from exploring solutions to exploiting them. On the other hand, the structural mutation alters the structure of the ANN by adding or removing hidden neurons and connections that link them to the input and output layers. As this mutation is M. Díaz-Lozano et al. applied to the remaining 90% of the population, the EA explores a wide range of structures, expanding the area of the search space and keeping a diverse population. We have considered five different structural mutations: adding neurons, deleting neurons, adding connections, deleting connections and fusioning neurons, which are sequentially performed according to a probability. In case that due to the probability none structural mutations are performed, one is randomly chosen and applied to keep the exploring of new structures. The mutations are performed as specified in Gutiérrez, Hervás, Carbonero, andFernández (2009), Martínez-Estudillo et al. (2006). After performing the mutations, the entire population is evaluated again, and the procedure continues until the stopping criterion is met. At this moment, the best solution according to fitness has been obtained.
In this study, we differentiate two models: individual EANNs (IEANNs) and clustered EANNs (CEANNs). Despite having the same parametric and structural configurations for both models, the input data considered for each model is different. In this sense, as it will be explained in the following Section 4.1, CEANNs models are trained using information from all the districts belonging to a given cluster. On the other hand, IEANNs are trained for each district separately.

Experimental settings and results
This Section includes a description of the datasets and the experimental settings used for both steps of the methodology (clustering and forecasting). Moreover, the results achieved and a comprehensive analysis of them are also detailed. Finally, this Section is closed with the statistical analysis carried out to demonstrate the significance of these results.

Datasets
Two sorts of datasets have been used in this study, depending on the phase being accomplished: clustering and forecasting. For the clustering step, one dataset is built following the methodology detailed in Section 3.1.1. Time series data of the outbreaks is transformed into a set of quantitative features describing the behavior of each outbreak. This procedure resulted in a dataset composed of 34 patterns, one per sanitary district. Each pattern is formed, in turn, of 24 magnitude features, 6 per outbreak considered. A hierarchical clustering approach is carried out to find common patterns, i.e., to find districts with similar intensities features along the 4 considered outbreaks.
On the other hand, for the forecasting step, two sorts of datasets are built: (1) Clustered Forecasting Datasets ( ), that combine the information of all the districts belonging to the same group or cluster ; and (2) Individual Forecasting Datasets ( ), which contains the information about the daily outbreak incidence time series of a single distinct. As can be noticed, are aggregations of from the districts belonging to the th cluster. Therefore, and 34 are the total number of datasets built for the types and , respectively, where < 34. In this way, the information for each district is represented as a set of 5 features per day resulting from the polynomial decomposition described in Section 3.2.1. As mentioned above, a minimum number of samples, , is required to build accurate polynomial fits. In our case, the results indicate that for the 4 waves considered in this study, the minimum number of days to estimate accurate polynomial models is = 10. Hence, a dataset of 415 (455 − 40) predictive patterns and 5 features is built for each district. However, using 0 and 1 coefficients in the modeling step resulted in overfitted models with poor generalization capacity. For this reason, only 2 and 3 coefficient (the coefficients determining the inflection point of the polynomial function, reached at ′′ , = − 3 3 4 ) are considered for training the models. Therefore, the dimension of the original datasets is reduced from 5 to 3. This inflection instant is mathematically related to the dynamics change in the 14-day incidence causing the exponential growth. Hence, at this point, the outbreak gets out of control.
Regarding the VAR models, all the are extended using = 2 lags for each feature. The choice of is based on the trade-off between the need for autoregressive information and the generalization capability of the trained models. In addition to the autoregressive information, the number of days since the outbreak started is also included. The number of days temporally locates the evolution of the outbreak. Accordingly, the 7 features for each predictive pattern are , = ( * −1, , * −2, , 2, −1, , 2, −2, , 3, −1, , 3, −2, , ). Given that 2 lags of each outbreak are incorporated, the number of samples for each is reduced to 407 (415 − 8). Therefore, as the resulting consist of 407 observations and 7 features, each is composed of 407 * observations and the same 7 features.
Furthermore, the cumulative 14-day incidence ℎ days ahead is considered as the output variable for each district. The magnitude of the forecast horizon, ℎ, is a choice that depends on the problem being tackled. The inherent interchange between anticipation and precision requires the establishment of a forecast horizon suitable to the prediction objective. For example, in Ghalehkhondabi, Ardjmand, Weckman, and Young (2017), forecast horizons from 5 to 10 years are considered long-term, whereas horizons from 1 month to 5 years are considered mid-term. Nevertheless, in Du Jardin and Séverin (2011), a 1 year forecast horizon is considered short-term. In this study, to anticipate possible upturns of an outbreak incidence adopting preventive measures, 3 and 5 days are considered short-and mid-term, respectively. These forecast horizons offer enough anticipation while keeping a reliable accuracy. Hence, * +3 and * +5 are used as output variables. In order to validate the trained models, a hold-out strategy is considered. An 80% of the time series data is used for estimating the model parameters, and the remaining 20%, for the model testing. As aforementioned, three different stages are identified for each outbreak: beginning, growth, and stabilization. In this hold-out approach, to generate powerful forecast models capable of analyzing the evolution of an outbreak at different stages, diverse periods of distinct outbreaks are included in the train set. Concretely, the test datasets have been created with data from the beginning stage from the second wave, growth stage data from the third and sixth waves, and stabilization data from the fifth wave. The same train and test periods have been used for and . To graphically illustrate this hold-out split, Fig. 10 shows the cumulative 14-day incidence in the district of Córdoba during the second, third, fifth, and sixth waves and the hold-out partition carried out.

Experimental settings
This Section presents the experimental configuration and the values for the parameters of the methods used in the proposed methodology. For the first phase of the methodology (i.e. the analytical clustering procedure), a hierarchical agglomerative approach has been applied to find similarities among sanitary districts. For this, districts are characterized using the methodology detailed in Section 3.1.1. Moreover, Ward's method, defined in Eq. (2), is used to measure the distance between clusters. Regarding the internal validation metrics, only the Silhouette Index has parameters: the euclidean distance has been chosen for calculating the similarity and dissimilarity between patterns.
Concerning the predictive modeling step, the same hyperparameter configuration has been applied for the training of both sorts of models, CEANNs and IEANNs, but the maximum number of generations. In this sense, as the CEANN models combine pandemic information about several districts, the complexity of estimating the model parameters is notably higher. This increase justifies the difference in the number of generations, being 1500 and 6000 for the IEANN and CEANN models, respectively. It is important mentioning that CEANNs are more efficient since the 6000 generations are used to estimate generic models modeling several districts together.
The evolutionary algorithm has been configured to build simple single-layer ANNs composed of a maximum of 4 PUs. The weights between the input, hidden, and output layers are randomly initialized. The input features have been scaled in the range [0.1, 0.9], whereas the output layer values are scaled in the range [−10, 10]. The ANNs only contain one output neuron, the estimation of the 3-or 5-days ahead cumulative 14-day incidence. The rest of the parameters have been configured following the guidelines published in Gutiérrez et al. (2009), Martínez-Estudillo et al. (2006, and they are presented in Table 4. Finally, the performance of the EANNs is evaluated using the Standard Error of Prediction ( ). , although highly correlated with , is not a good choice since the magnitude of the incidence cannot be compared across sanitary districts. Instead, the Standard Error of Prediction ( ) counteracts this disadvantage being a dimensionless metric. The is defined as follows: = 100 where * +ℎ is the average of the 14-day incidence ℎ days ahead and MSE was introduced in Eq. (13).
As the weights of the forecasting models are randomly initialized, they are considered stochastic, leading to different results when the seed changes. Thus, intending to obtain robust results, 30 runs for all the forecast models (IEANNs and CEANNs) have been carried out using different seeds. Hence, the results will be expressed as the mean and standard deviation ( ) of the of these 30 executions. Both forecasting models have been developed using the NNEP (Neural Network Evolutionary Programming) software, developed in Java, to construct and optimize the neural network models. NNEP is a powerful and flexible software tool that combines the benefits of neural network and evolutionary programming techniques to create complex neural network architectures for regression and classification tasks. The software utilizes an evolutionary algorithm to evolve the structure and weights of neural networks to achieve optimal performance on the given task.

Results
This section presents the results for the two-step methodology proposed, i.e. the conclusions extracted from the clustering analysis and the performance of the forecasting models. For the first step, to determine an appropriate number of groups, the evolution of the internal quality metrics (Section 3.1.2) for = (1, 2, 3, … , 8) is shown in Fig. 11.
Following the results presented in Fig. 11, several conclusions can be observed: (1) following the elbow method for the SSE, = 2 is the optimum number of clusters, determined when the decrease in the SSE is no longer accentuated; (2) regarding the SI, which has to be maximized, the decrease begins when = 3, indicating that this number of clusters is also descriptively accurate; and (3) as for CHI, its results support both numbers of clusters. For these reasons, the 34 districts are grouped in = 2 and = 3 clusters. In addition, Fig. 12 presents the corresponding dendrogram, and Fig. 13 shows the geographical representation of the sanitary districts belonging to their respective clusters.
The resulting dendrogram shows two major groups, grouping 20 (orange) and 14 (green) sanitary districts, respectively. When = Fig. 11. Internal quality measures ( , and ) for the clusters resulting from the hierarchical agglomerative clustering using = 1, 2, 3, … , 8.  3, three districts (Levante Alto Almanzora, Campo de Gibraltar Este, and Córdoba Norte) from the smallest group are segregated into another group. Considering both options, Table 5 presents the number of predictive patterns used in the forecasting step ( , ) for each configuration.
Once the cluster analysis is performed, the two EANN models are trained: (1) IEANNs are individually trained for each of the 34 sanitary districts, using the corresponding for each district. Hence, 34 models are trained.
(2) CEANNs are trained for each of the clusters obtained in the first phase, using the corresponding . In this case, models are trained (e.g. = 2 or = 3), being known as CEANNs =(2,3) . Then, to evaluate the difference between IEANNs and CEANNs =(2,3) , the test partitions of the 34 IFDs are applied to both models, obtaining the performance for each sanitary district separately. Table 6 presents the results on the testing dataset for all the districts. It is worth mentioning that the results are expressed in terms of of the 30 runs performed. For each sanitary district, the performances achieved by the individual and clustered models are presented for the two forecast horizons (ℎ = 3 and ℎ = 5).
The results presented in Table 6 show that, for a short-term forecast horizon of 3 days, IEANNs get the best results for 13 of the 34 districts, i.e., CEANNs achieve the best results for most of the districts. Specifically, the 3-cluster models perform generally better than those generated with 2 clusters, achieving the best results in 16 out of the 21 districts. On the contrary, when using ℎ = 5, IEANN models perform generally better obtaining the best results in 18 districts, whereas CEANNs =(2,3) using achieve the best performance on 16 districts (8 for = 2, and 8 for = 3). This behavior was expected to happen due to the greater complexity caused by the anticipation with a higher horizon. CEANNs =(2,3) were far more difficult to be trained, requiring a presumable increase in the number of training generations. Despite this, CEANN =(2,3) models exhibit a competitive performance in the mid-term forecast horizon, obtaining the best results each in 8 different districts.
Note that, as detailed in Section 4.2, CEANN models present a higher complexity due to the increase of information that they must process with respect to IEANN ones. For this reason, the limit of the evolution for both EANNs was set differently. Concretely, 1500 for the IEANNs and 6000 for the CEANNs. Even so, considering that 1500 generations are used for each district individually, the CEANN models are more efficient since the higher number of generations is used to Table 6 Performance achieved by the IEANNs and CEANNs =(2,3) models for the * +3 and * +5 outputs on the 34 sanitary districts. Results are expressed as the of the generalization set of the . estimate the pandemic evolution for up to 20 districts (in the case of the largest cluster in this study). In this sense, the model estimation procedure is also simplified, assuming an 80% reduction in the number of generations in the case mentioned above. In order to obtain an overall perspective on the results generated by our methodology for each wave, Table 7 presents the mean errors, in terms of SEP, achieved by the CEANNs models for both time horizons. To calculate these mean errors, we employed the interval data for each specific outbreak, and performed 30 independent runs for each district.
As presented in Table 7, the worst average results were obtained during the second wave for both models and time horizons. Although the transmission rate during this outbreak was not as high as for other waves, the time of maximum incidence lasted longer due to the delay in implementing control measures. This delay may explain the observed average deviations from the actual results. The second worst results were obtained during the sixth wave, which were produced by the Omicron strain. This variant was characterized by its rapid transmission, resulting in the fastest chain of infection during the pandemic. The severity of this wave is reflected in the deviation of the results from the actual data, indicating the challenges of accurately predicting outbreaks caused by highly infectious strains. Apart from the results, the model complexities are also subject of study given that complexity reduction is one of the objectives of the proposed methodology. In this sense, the benefits of reducing the number of models may be compromised if the resulting CEANNs were much more complex in terms of architecture. The complexity of the models is measured using the total number of connections in the architecture. Note that the number of hidden neurons is not included as all the models have the maximum number of hidden neurons allowed (4). Then, to perform a fair comparison between the CEANN and IEANN models, the complexities are computed in the following way: (1) IEANN complexities are measured as the sum of the connections of the 30 * models. That is the connections of the 30 models trained for each district belonging to the cluster are summed up. (2) CEANN complexities are measured as the sum of connections of the 30 models for each cluster. Table 8 presents the complexities for the IEANN and CEANN =(2,3) models, for both forecasting horizons ℎ = (3, 5).
As can be observed from Table 8, the total number of links is much lower for the CEANN models compared to the number required by IEANN models. For example, in the short-term predictions (ℎ = 3), the number of connections of IEANN models is reduced up to 93.51% and 90.51% using CEANNs =2 and CEANNs =3 , respectively. Finding simple models with a good generalization capacity is of high interest to the community. As has been demonstrated, CEANNs models can achieve at least the same, if not better, performance without involving a huge complexity cost. Hence, their use is justified as they model several sanitary districts together. As expected, the model complexities increase when the forecast horizon increases, as the forecasting problem becomes more challenging. However, CEANNs still benefit from reducing up to 93.31% and 90.02% the total number of connections with respect to IEANNs for = 2 and = 3, respectively.

Statistical analysis
A statistical study is performed to further analyze the differences between the results obtained by the different forecast models. For this purpose, a Kolmogorov-Smirnov test (Massey, 1951) is applied to the 30 results achieved by IEANNs and CEANNs, i.e. it is applied to the results presented in Table 6. In all cases, the obtained p-values show that the null hypothesis of normality is accepted.
Then, a paired t-test is used to find statistical differences between the results obtained with IEANN and CEANN =(2,3) models in all the districts for ℎ = (3, 5). The results of this test are presented in Table 9 where, considering a level of significance of = 0.05, statistically different model results are highlighted in bold. * and * * indicate that statistically significant differences in favor of CEANN and IEANN models are found, respectively. For the sake of clarity, Table 10 summarizes the number of districts where CEANN models outperform, underperform and perform similarly to the IEANNs.
As can be observed in the statistical overview presented in Tables 9 and 10, for the short-term horizon (ℎ = 3) only 8 and 6 district present significant differences in favor of IEANNs against CEANN =2 and CEANN =3 , respectively. On the other hand, for the mid-term horizon (ℎ = 5), 13 and 16 districts are significantly better modeled using IEANN than using CEANN =2 and CEANN =3 , respectively. However, these are the fewest districts that do not benefit from using CEANNs, as for most districts obtain significantly similar, or even better, results.
In this regard, attending to Table 10, to achieve the best results combining both IEANN and CEANN models, the number of models that it is required to be built is computed as the number of districts performing better using IEANNs and the number of models ( ) built using CEANNs . For instance, forecasting at a short-term horizon (ℎ = 3) requires training 9 models, i.e. 6 IEANNs and 3 CEANNs (as = 3). It is worth mentioning that not only is there a complexity reduction, but also results significantly improve for 15 of them using CEANNs. Likewise, for mid-term horizon (ℎ = 5) 14 models are required, i.e. 12 IEANNs and 2 CEANNs (as = 2). In this case, we are also benefiting from significantly improving the results for 12 districts.
In addition, Fig. 14 graphically represent the significant differences between IEANNs and CEANNs, as a function of the maximum generations of the CEANN models. This Figure shows the number of districts without statistically significant differences between both models and the number of sanitary districts where results are significantly different in both directions. Note that, for this comparison, the number of generations of the IEANN is fixed at 1500.
As expected, the number of districts where CEANN =(2,3) models outperform or perform equally to the IEANN models is higher when the maximum generations for CEANN =(2,3) models increases. However, although CEANN models are more complex to train, even with a maximum of 1000 generations (500 less than in the IEANN models), higher performance is achieved for several districts using CEANN =(2,3) . Concretely, for the short-term forecast horizon (ℎ = 3), most districts achieve at least the same results with this number of generations when using CEANN =3 models. For CEANN =2 models, 2000 generations are needed to achieve at least similar results for most districts. When CEANN =(2,3) models are trained using 6000 generations, only a few districts keep obtaining the best results with IEANN models. On the other hand, regarding the mid-term forecast horizon (ℎ = 5), the number of cases where the CEANN models improve IEANNs does not markedly increase with higher maximum numbers of generations. For CEANN =2 models, when they are trained up to 6000 generations, the cumulative 14-day incidence is estimated with statistically the same, at least, accuracy for 22 districts. It means that, for ℎ = 5, instead of training 34 IEANN models, it can be reduced to 2 CEANN =2 and 12 IEANNs.

Conclusions
This study proposes a novel methodology for forecasting the pandemic incidence in Andalusia, Spain, which aims to achieve excellent performance while notably reducing the complexity of the required models. We have considered two forecast horizons in this work: a shortterm horizon of 3 days and a mid-term horizon of 5 days. The proposed methodology consists of two steps: first, a cluster analysis is performed to identify districts where the pandemic has behaved similarly by considering certain characteristics existing in bacterial growth curves. These characteristics are obtained by identifying different phases of each wave and measuring their intensity. Then, a hierarchical clustering technique is applied. The performance of the clustering approach is evaluated using internal validation metrics, which indicate that 2 and 3 are the most appropriate number of clusters for grouping the sanitary districts. Then, forecasting models are applied to the resulting groups.
Secondly, a forecast step is carried out using a numerical representation of the 14-day incidence curves obtained through a decomposition procedure. This procedure is based on an iterative polynomial fitting method where, for each day of each outbreak, a 3-degree polynomial function is fitted to the cumulative 14-day incidence. Then, each day is characterized by the 4 parameters of the polynomial function and the Table 9 Statistical differences between IEANN and CEANN =(2,3) models. Results are expressed as the -value resulting from a paired t-test computed using the results for the 30 models for each district.  cumulative 14-day incidence value. After that, autoregressive models are built to include past information of the outbreak dynamics. The forecast models are built using Evolutionary Artificial Neural Networks (EANNs) with product units as basis functions. EANNs trained by combining all the districts belonging to the same cluster are known as Clustered EANNs (CEANNs). They have been compared against Individual EANN (IEANN) models specifically trained for each sanitary district.
The results obtained demonstrate that CEANN models perform at least equally or even better than most of the IEANN models for both horizons. Moreover, CEANNs also benefit from notably reducing the required models to carry out a global forecast analysis of the whole Andalusian territory. Furthermore, CEANN models perform generally better than IEANNs on a short-term horizon. However, for a midterm horizon, CEANN models perform similarly to IEANNs. A statistical analysis carried out demonstrates that CEANNs perform significantly equally, or outperform, IEANNs for 28 and 22 districts for the short-and mid-term horizon, using 3 and 2 clusters, respectively. It is important to take into account that there are some districts with particular characteristics in their pandemic evolution where CEANN models do not perform properly. This is the case of Costa del Sol, Sevilla Norte, Jerez Costa Nordeste, and La Vega, which must be individually modeled.
In addition to being extremely competitive, the main advantage of CEANNs lies in their complexity cost reduction. The number of connections used by IEANNs is reduced by up to 93.51% (CEANN =2 ) and 93.31% (CEANN =2 ) for the short and mid-term horizons. Besides, this reduction is also noticed regarding the number of models required: of the 34 IEANN models initially required, the same or better prediction accuracy can be achieved using only 9 models (3 CEANN =3 and 6 IEANN models) to predict the short-term incidence. For a mid-term horizon, 14 models would be required (2 CEANN =2 and 12 IEANN models). In this way, the number of models for predicting the 14day incidence 3 and 5 days ahead is reduced a 73.53% and 58.82%, respectively. Finally, it is worth-mentioning that even though this methodology has been applied to the region of Andalusia (Spain), it can be applied to any region across the world, or even to any process that results in a temporal evolution similar to a bacterial growth curve.
Despite its potential advantages, the proposed methodology has some limitations. Firstly, the accuracy and reliability of the predictions heavily depend on the quality and availability of the input data. Errors and inconsistencies in the data may negatively affect the accuracy of the predictions. Secondly, the system operates with an automated process that lacks human intervention, which could potentially limit its flexibility and adaptability in response to unexpected events or changes in the environment. For instance, lockdowns or alterations in mobility restrictions could cause significant fluctuations in the model's  predictions, as demonstrated by the variations in the average results per wave presented in Table 7.
Finally, as future works, there are a wide range of ways for tackling this kind of problems. One of these are regulatory networks based on time series data, which have been successfully applied to a range of areas achieving excellent solutions (Weber, Defterli, Gök, & Kropat, 2011). Another way is via fuzzy systems (Kropat, Özmen, Weber, Meyer-Nieberg, & Defterli, 2016), which allows for an investigation of systems under various kinds of uncertainty.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.