A Bayesian predictive analytics model for improving long range epidemic forecasting during an infection wave

Following the outbreak of the coronavirus epidemic in early 2020, municipalities, regional governments and policymakers worldwide had to plan their Non-Pharmaceutical Interventions (NPIs) amidst a scenario of great uncertainty. At this early stage of an epidemic, where no vaccine or medical treatment is in sight, algorithmic prediction can become a powerful tool to inform local policymaking. However, when we replicated one prominent epidemiological model to inform health authorities in a region in the south of Brazil, we found that this model relied too heavily on manually predetermined covariates and was too reactive to changes in data trends. Our four proposed models access data of both daily reported deaths and infections as well as take into account missing data (e.g., the under-reporting of cases) more explicitly, with two of the proposed versions also attempting to model the delay in test reporting. We simulated weekly forecasting of deaths from the period from 31/05/2020 until 31/01/2021, with first week data being used as a cold-start to the algorithm, after which we use a lighter variant of the model for faster forecasting. Because our models are significantly more proactive in identifying trend changes, this has improved forecasting, especially in long-range predictions and after the peak of an infection wave, as they were quicker to adapt to scenarios after these peaks in reported deaths. Assuming reported cases were under-reported greatly benefited the model in its stability, and modelling retroactively-added data (due to the “hot” nature of the data used) had a negligible impact on performance.


Introduction
The World Health Organization (WHO) declared COVID-19 a global pandemic in mid-March 2020, prompting countries to take actions to reduce the spread of the virus in view of the serious respiratory problems that require specialised care in Intensive Care Units (ICU) [1,2]. Little was known about this new strain of coronavirus that threatened to overwhelm health systems, had forced several countries to lockdown and had already been rapidly spreading across Brazil [3].
In many countries, measures such as self-isolation, border closures, testing and social distancing were proposed [4], and it was said that ''these protective measures are crucial to managing this disease'', although vaccines were considered to be critical [5]. With no drug treatments or vaccines in sight at that time and in the face of a lack of national measures to prevent the spread of the disease [6,7], governors

Contributions
In this paper, we present four variations to the Flaxman et al.
model (here called base model) that forecast deaths by  and overcome limitations we observed after replicating the model on a weekly basis to the seven macro-regions that compose the state of Santa Catarina, a southern state in Brazil. Work on this project started in March 2020, and as we replicated the algorithm every week, we noticed that the original model could not accurately identify the wave pattern so characteristic in an epidemic death curve. The base model seemed to assume a linear tendency when forecasting deaths; if deaths had been increasing for the past couple of weeks, they would keep increasing in the following weeks, and vice versa. We conjectured this happened because the base model did not consider the reported infections, it only used reported deaths as data input, and therefore it could not anticipate the changes in the infection patterns that had led to the reported deaths.
The covariates used in the model were also inadequate because they limited the effective reproductive number to change only at manually predetermined time points, where an NPI measure came into effect [21]. Other obstacles not directly considered by the base were the under-notification of infected cases [22], and the delays in test reporting [23], all common problems at the early stage of the pandemic. From the data available in the official database of Santa Catarina state, we could estimate an average delay of 5 days from RT-PCR test collection until the result was available. Consequently, data from the previous week was guaranteed to be incomplete and uninformative. Despite these limitations, the base model was used elsewhere to estimate the impact of NPI measures in two Brazilian states [24].
Our proposed methods aim at overcoming the limitations mentioned above, allowing this mathematical model to be used more effectively for forecasting. Additional equations and algorithmic strategies allow the model to use reported cases to estimate deaths by COVID-19.

Literature review
In this section, we comment on related literature to our work. The Covid-19 pandemic brought the research community together, and many predictive analytics methods were tested to help solve different problems in the pandemic. One can find statistical and algorithmic solutions to various related logistic and forecasting problems related to the spread of the disease, such as vaccine allocation [5], metaheuristic feature selection methods for detecting Covid-19 [25], Covid-19 detection with medical images [26], to name a few. In our literature review, however, we focus on three research areas that we consider the most relevant to our study. In Section 2.1 we talk about non-Bayesian methods for Covid-19 forecasting; Section 2.2 focuses on describing Bayesian approaches for forecasting and the main differences between them and our model; finally, in Section 2.3 we describe recent related methods that attempt to perform missing case imputation, as a way to overcome under-reporting and/or delays in test reporting.

Deterministic forecasting
Traditionally, modelling and forecasting in epidemiology are informed by the use of compartmental models (e.g. SIR, SEIR), where individuals are placed in compartments according to their status (Susceptible, Infectious, Exposed, Recovered, etc.). A recent review, which assessed multiple studies of such type of methods during the COVID-19 pandemic, identified most of these to be ''(...) deterministic in nature, by default'', with ''extensions to stochastic models'' being possible [27]. Furthermore, it found that the use of stochastic models was considered to be ''more realistic than deterministic models'' since deterministic models are valid only where there is a sufficiently large population [28]. But one can also find studies of deterministic approaches that are outside the traditional compartmental literature. For example, the established field of time-series forecasting [29] inspired the models of Parbat and Chakraborty [30] and Sharin et al. [31]. Upon closer look, however, these methods still have room for improvement in their methodology. The cross-validation employed therein was a simple kfold, whereas standard practice for time series forecasting is to employ some temporally-aware train vs test splitting, such as Walk-forward testing [32,33] or last-block evaluation [34]. In fact, our models use a variation of last-block evaluation, to avoid providing unrealistic predictions and to provide a more grounded model validation [34].
Given the general limitations of deterministic models and the issues we highlighted above, we give more attention in our review of the literature to models that are stochastic in nature or to those in which uncertainty is an integral assumption.

Bayesian forecasting
We also found compartmental models that use Bayesian inference as part of their solution, the same methodological framework used in our method. In it, prior distributions describe the initial assumptions about the values of random variables in the model. Then, with the foundation of Bayes's Theorem, an optimisation algorithm -typically a variation of a Markov Chain Monte Carlo (MCMC) algorithm -finds solutions to the problem that fit the available data while considering the priors plus any extra custom equations that govern the model [35]. In Roda et al. [11], the authors proposed SIR and SEIR Bayesian compartmental models solved by MCMC, and argued that their simpler SIR model was the most accurate. Whereas this method only uses reported cases as data to fit the model, our models consider previous reported cases and deaths, the patterns of mobility in the geographic region during the period, and it also contains variables to account for missing data and under-reporting. A few other similar models also attempt to model the pandemic without explicitly providing periodic forecasting (e.g., [14,15]).
Many Bayesian models were developed with the intent to measure the impact of public policies on the reproduction rate of virus [10,18,24], or on the economy [16]. Assessing the effectiveness of public policies is not a goal of our study. Instead, we aim to improve forecasting of deaths by COVID-19, even where such policy information is not readily available or unreliable, and therefore we do not compare our proposed models with these in detail. The closest of these models to our case would be [24], which still attempts to measure the impact of NPIs in Brazil using the model made available in Flaxman et al. [20]. Thus, we chose these as baselines and explained them in more detail in Section 2.4

Missing data imputation
None of the abovementioned work tackle two problems central to COVID forecasting in a low-resource country such as Brazil, namely the under-notification of infected cases [22] and delays in test reporting [23]. These problems, common at the early stage of the pandemic, persisted throughout most of the time in our case study . Although vast, the literature on data imputation is geared mainly towards imputing tabular data (e.g., [36]), which is not directly applicable to our case. Hence, we opted for a data-driven approach that explicitly considers under-notification, delays in test reporting, or both when imputing missing data.
A recent paper by de Nicola et al. [37] used Generalised Additive Models (GAMs) [38] implemented in the mgcv R package [39]

Baseline
Due to the abovementioned reasons, we chose Flaxman et al.'s previously-proposed [20] model as our main baseline, referred to us as base. This baseline model is almost identical to the MCMC-based model made available by Flaxman et al. [20], except for the covariates used. Instead of manually curated non-pharmaceutical intervention measures, we used Google Mobility data as covariates to the model since our primary goal is to forecast new infections and not measure the effect of NPIs on the reproduction rate of the pandemic.
This strategy also counteracts an implicit confirmation bias of the model [21] and have been used by the original group in a subsequent work [40,41]. In contrast to European countries targeted by the original base model, the State of Santa Catarina did not impose strict state-wide measures consistently during this period. Local governments (cities and regional associations of municipalities) were responsible for independently deciding on their social distancing measures [8], which rendered changes in legislation impractical.

Proposed models
In this section, we describe the equations and modifications of our proposed models, base-ron, base-rnn, base-ror and basernr. The reader is referred to Fig. 1 for a high level depiction of the fixed parameters, data and variables used in this section. Each weekly run, all methods receive the same information: the reported cases and deaths, fatality rate, infection-to-death probability distribution, fixed parameters (population size and serial interval), and the Google Mobility covariates. What distinguishes each model is how they treat this data. Whereas the base model implicitly treats reported cases as ''ground-truth'', our proposed variations all ''augment'' (i.e., impute) the reported data to account for delays in test reporting or undernotification of infections by adding variables to the Bayesian inference model. Table 1 offers for a more technical comparison between base model and our proposals.

Proposed models
Since our Bayesian models rely on the same distributions as the base model, we will use the same symbols to mean the same distributions whenever possible. We refer the reader to the original paper for a full description of the symbols [20]. In mathematical terms, we observe daily deaths , and daily cases , for days ∈ {1, … , } and geographical regions ∈ {1, … , }. These values might be augmented (i.e., imputed) in some models (e.g. base-ror and base-rnr) to a * , and * , , which are explained later in this section. Deaths by COVID-19 are statistically modelled as: * where , represents the number of modelled cases, following: where xf r is either the region-specific case-fatality rate cf r (in models base-rnn and base-rnr) or the infection-fatality rate if r ≈ 0.0076 estimated for Brazil in [40,41] (in models base, base-ron, and base-ror). All models have an uncertainty factor added in the same way as in the original model. Reported infections, on the other hand, are modelled as in the original model: where number of reported cases ( , ) depends on the number of susceptible individuals , reproduction rate , and the generation distribution , a fixed polynomial that is also usually referred to as ''serial interval'' [42].
A key component in all variations of the proposed algorithm is that we do not use , directly, but instead, we ''augment'' (i.e., impute) it. Thus, rather than relying solely on the death-based data to model the pandemic -arguably the most reliable source -we also use the reported infections data as a way to make the model less reactive. In some model variations, we assume that the number of reported infections , is under-reported and we model this by adding a normally distributed overestimate variable overestimate ∼  (11.5, 2.0) to model under-reporting explicitly, following estimates that the number of COVID-19 cases in Brazil was about 11 times higher than what was officially being reported [22]. Table 1 The models differ mainly in which objectives they optimise and how one can interpret the model. For our proposed models, we use infection data (with and without an estimate of under-reporting) to try to make the model less reactive, since data depending on deaths may only reflect the situation from a few weeks back. , represents the number of predicted deaths and * , is the number of deaths used as an input to the model, in the same fashion , and * , are the number of predicted cases and the number used as an input to the model.

base-rnr
Includes reported cases, model retroactive data but does not attempt to overestimate infections As reported a These values may not be exact, since the model has to take into consideration the number of both cases and deaths.
b  (11.5, 2.0) follows from estimates that the number of COVID-19 cases in Brazil was about 11 times higher than officially reported [22]. c These cases are interpreted as the number of real cases as long as all the assumptions of the model hold true, which most likely they do not. d Cases , and deaths , from the last week of a data snapshot are augmented (i.e., imputed) according to how historically these values had been retroactively changed, by having * , ≈ , , , and * , ≈ , , , .
Hence, in addition to Eq. (1), we also rely on Eq. (4) below to calibrate our model: * where * , is the new (augmented) number of reported infections and , = , overestimate estimates the actual number of people infected at time considering an overestimate. That is, the augmented case number is where our model performs imputation due to delayed case notification, while the overestimate parameters is where our method imputes missing data due to under-reporting.
Our models have = 7 covariates. Six of them are the Google Mobility indicators (described in Section 4.1), and the remaining covariate is the percentage of the population of a region Susceptible to infection , . The reproduction rate , is assumed to vary with the covariates: where the percentage of the population that is susceptible to the disease , (the seventh covariate) was modelled with a similar impact measure as the mobility data. On both the and our proposed models we use an extra-region impact measure as well as a perregion impact measure , . To consistently simulate the baseline algorithm, our simulations with model did not include , as a covariate, and we also used the same way of weighting these covariates, with both a state-wide and a per-region , .
Some of our models (base-ror and base-rnr) also try to take into account the delay between PCR test collection and test result notification. At any given week, we expect the number of people getting infected to be higher than reported. When we look back at the tally of infections for the same week a few weeks later, we will notice that infections can generally be between 2.5x to 7.5x higher than their initial reported values, and from 60% up to more than 85% of cases are reported with 5 days of delay. See Figures S1 and S2 for a comparison of the delay in the number of reported cases and reported deaths, respectively. Deaths follow a similar but less volatile pattern, rarely passing values 2x higher than their initial reports and possibly having 50% of the data reported after this period. While this confirms that deaths are the most reliable source of information, this also shows an issue in using such ''hot'' data for modelling: it can often be incomplete and lead to a false decrease in the number of cases and deaths.
In the models mentioned above, we compensate for this delay by augmenting the number of cases and deaths of the past week by a percentage, Δretroactive , , and Δretroactive , , , respectively. These values vary per region and were calculated every week based on historical data up until that point. A sample of their distributions can be seen in Figures S3 and S4. This gives us the augmented values: * and * , = , (1 + Δretroactive , , ) Another minor modification was that the original model assumed the onset-to-death distribution to follow the distribution Gamma(17.8, 0.45). We kept modelling this as a gamma function but we estimated the average and deviation from the data at each snapshot. For reference, this value was close to Gamma(20.67, 0.76) on the latest simulated weeks and the distribution of onset-to-death in Santa Catarina can be seen in Figure S7.

Test workflow and validation
Our main goal is to assess which combination of equations would most accurately forecast the curve of deaths by COVID-19 for the seven demographic macro-regions within Santa Catarina. We ran the models as close as possible to what happened in real life. For every week, we only used the data available at that point in time, akin to the last-block evaluation methods normally used for time series forecasting [34]. Forecasts of the regions were then aggregated to compose the overall prediction for the entire state.
The models produce an average prediction which we compare to the ''ground-truth'' number of deaths using Root Mean Squared Error (RMSE) and the Mean Average Error (MAE) metrics. To validate our results, we selected the data snapshot from 07/03/2021 to represent the ''ground-truth'' -1 month after the last run simulation -to account for the notification delays discussed in the previous section. We also calculated RMSE and MAE of the upper and lower confidence intervals and took their average. The resulting metrics, RMSE_conf and MAE_conf, provide a measure of how distant the borders of the confidence interval are from the truth values.
Another important aspect of our testing procedure is how we set up the priors for the weekly simulations (a summary can be seen in Algorithm 1). At any given week, except the first one, posteriors inferred from the previous week were used as starting points for the current models. This practice of updating the priors with previous estimations is known as Sequential Bayesian Updating, or simply Bayesian Updating [43], and has been used for similar purposes in related literature [44], as well as in other academic fields [45,46]. This strategy allowed the inference optimisation to converge much faster in our experiments, so we were able to run fewer iterations of the algorithm than if we had to build the model from the ground up every week. The number of iterations (see Table S1) was chosen after a preliminary test phase where we analysed the trade-off between reliability and execution time. While this sequential nature of the experiments means that we only report a single run of the model for each week, we still believe the sample size produced is more than enough to allow us to identify which models are better.

Data
We were granted access to the state government big data platform Plataforma BoaVista [47], from where we obtained anonymised data on every confirmed case of COVID-19 in SC along with the date of onset of first symptoms, date of PCR test collection, date of death and municipality of residence. We collected data every week, starting from 31/05/2020 -when daily snapshots of data became available in the official database system of the state -until 31/01/2021. We also downloaded mobility data from Google Mobility community reports [48]. This data describes how people's mobility has changed during the pandemic and were available per day and in six categories: Grocery & pharmacy, Parks, Transit stations, Retail & Recreation, Residential, and Workplaces. In practice, when simulating the weekly runs, Google Mobility data from the past week onward was unavailable; therefore, we assumed that these covariates would remain constant from one week before the snapshot date.
The state of Santa Catarina has over 7 million inhabitants organised in 297 municipalities organised in 6 distinct geographical macro-regions distributed across 95 square kilometres of land area. Estimated population in each of the seven regions of Santa Catarina were obtained from the Brazilian Institute of Geography and Statistics, IBGE [49]. The state government imposed suspensions of many economic activities after the first deaths were confirmed in SC in March 2020 but ended up relaxing social distancing measures, eventually leading to a decree in June 2020 after which municipal governments would be responsible for most decisions regarding NPI measures. By 24 March 2021, when we completed this study, over 764,000 cases and over 9800 deaths by COVID-19 had been confirmed, hospitals were fully occupied, and local news reported that at least 397 people were on the waiting list for ICU beds [50].

Results
Results of our simulations of the base model and the four variations of our proposed model, (base-ron, base-rnn, base-ror, basernr) is presented here. All of our models add the newly reported infections to the equations but they differ in how the number of infections is augmented (i.e., imputed) and whether an estimated percentage of cases and deaths are added retroactively in the data before running the model to account for delay in test reporting.
On average, all models had a similar prediction accuracy on the first 7-10 days of forecast ( -value > 0.05, One-way ANOVA) but our proposed models outperformed the base model in the medium term ( -value < 10 −14 , One-way ANOVA). This is illustrated in Fig. 2, where we show the average residual errors for all weekly forecasts aggregated to the entire state of SC. Notice how the margin of prediction errors made by base model grew wider over time while our models maintained a more stable error throughout the forecasting period. Table 2 also provide a numerical comparison of these errors for a window of 7 and 30 days, respectively. In terms of both RMSE and MAE for a 30-day forecasting window, all of our models were considered to be significantly different from the base model ( -value < 10 −5 , independent T-test), while for a 7-day window all models except base-rnn were individually considered significantly different ( -value < 0.01, independent T-test). On Table S2, one could also inspect the predictive performance of the models for each of the seven individual geographic regions that compose the state of Santa Catarina.
The gap between the baseline model and the proposed method over time is most noticeable at particular points in time, as indicated by RMSE plots for 7-day and 30-day forecasting windows of the models in Fig. 3. The base model showed the largest short-term error in the middle of August 2020 and at the end of November 2020. On the 30-day window, the baseline algorithm is clearly making the worst predictions, particularly during August 2020 and the beginning of 2021 ( Fig. 3(b)). The dates where we observe higher errors on base models correspond to predictions made on dates during or immediately after the peaks in the daily number of deaths, as highlighted in Fig. 4. Diagnostic graphs produced by the model for these dates confirm that base was unable to reflect major changes in the trend of death data.
The model predicted that the number of infections was growing even though data regarding new reported infections already displayed a downward trend (Figures S5a-S5c). One could contrast the diagnostic plots above to the ones obtained by base-ron model for the same dates in Figures S6a-S6c, where this misdirection in predicting death trend was not present in our proposed models.
Another way to visualise these results is by comparing the graph of cumulative deaths with predictions made by base and one of the best performing models base-ron (Fig. 5). Scenarios 01 and 03 show models' 95% confidence interval around the average prediction indicated as Scenario 02. It is clear that our model has improved the predictions, providing a narrower confidence interval which was closer to the real value.

Discussion
Our methods outperform the baseline in nearly all runs, with the best algorithms being the ones with the ''overestimate'' variable: base-ron and base-ror. These methods yielded the most stable predictions over both forecasting periods examined. Interestingly, the posterior distribution of the ''overestimate'' parameter in these models resulted in values much smaller compared to the priors we set (overestimate ∼  (11.5, 2.0)) -see Figure S8. For example, the mean value of overestimate for the macro-region ''Foz do Rio Itajai'' was close to 7.5, and lower than 2.5 for the ''Grande Oeste'' macroregion. The fact that the MCMC inference algorithm automatically converged to smaller values consistently across macro-regions suggests that, although present and significant, the sub-notification in the state of Santa Catarina was not as high as we had assumed. On the other hand, base-rnn and base-rnr exhibited larger errors in predictions for certain periods in time, for example the middle of July 2020, later August 2020 or at the end of December 2020 (Fig. 3(a)).
One interesting finding of is that augmenting the number of infections by a percentage of estimated retroactive data did not seem to contribute much to the predictive accuracy, as this feature was present both in one of the best-performing models (base-ror) and in one of the less predictive ones, base-rnr. We refer the reader to Section 3 and in Table 1 for other assumptions embedded in each model variation.
There is evidence in the literature that COVID-19 models are inefficient for long-range forecasting [51]. However, these results show that one can use ''hot'' data (i.e., the most recent data that is constantly being updated and might still not be complete) to update a model every week and achieve higher accuracy. Although we concede that such ''hot'' data come with some issues, such as unreliability and delays in updating, when forecasting epidemic outbreaks, one has to address real-time uncertainties and changes as closely as possible. A model which does not take into account the fast pace production of data is bound to underperform in a real-time setting.

Conclusions
In this paper, we propose Bayesian inference models to overcome some limitations of the original algorithm in Flaxman et al. [20], mostly by introducing original equations that allow the model to access data regarding daily reported infections and letting to account for underreporting in a more explicit way. We show that these changes increase the predictive accuracy of forecasting, not only in the near future (a week) but even in the medium term (thirty days). We have also tested some variations in the algorithm to account for the delay within test collection and notification but these did not prove as useful in predicting the death curve in the state of Santa Catarina.
These alternative models, however, are not without their failings, of course. While predictions have improved and some assumptions of the model could be confirmed by inspection of the data -for example, the onset-to-death indeed seems to follow a gamma distribution with parameters very close to what the original model assumed -there are just too many assumptions in the original model that have not been thoroughly validated [52] (for example, the value of initial reproduction number, the 0 parameter). Another issue is that, from the point of view of the optimisation algorithm, estimated values and estimated number of people infected daily are interchangeable. That is, the algorithm could reach two opposing configurations that are equally valid and optimal: one where the reproduction rate is low, but there is a large pool of infected people in the population, and another separate solution in which the number of infected people is small but is larger. Also, even though adding infection data into the model has given it more adaptability, the same data could make the model more fragile in the future. If the dynamics of infections change (e.g. because of new, more transmissible strains of the virus), the model might be biased to readjust its fitting of the historical data to compensate. In theory, this could be counteracted by more reliable epidemiological data from other sources (i.e., tracking the prevalence of new virus strains, information about age, or information on entry and exit into ICU) or by including even more granular mobility data, at the expense of the citizens' privacy, all of which are generally more expensive or infeasible to obtain. One could also consider adding assumptions to the model, such as the cultural orientation of the population [53] or the peculiarities of the test strategy in place in the modelled region [54].   New or data of higher resolution can alleviate some challenges in epidemic modelling but, importantly, new observations can help us revise assumptions of existing models in search of a more accurate description of real-world cases [51,55]. Our proposed model is one step in that direction of scientific inquiry. We show that a model can become more accurate by adding one more data source and a new assumption about under-reporting the tests, and therefore more useful for forecasting and decision making. As more immediate plans for future works, we also want to extend our models to account for the geographical spread of disease using concepts from network analysis in a principled way [56]. We plan to adapt algorithms our group has previously developed to analyse other types of networks in tasks involving regression, clustering and temporal data [57][58][59]. We also intend to review the other assumptions built into the original model and continue to investigate how much the changes we have introduced are sustained in the face of new epidemic waves, new variants of the virus and new political measures that affect the dynamics of contagion.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
The research presented in this paper was conceived in the context of the Intersectorial Data Intelligence Center for COVID-19 (NI-IDC), a group of volunteers that ran from March until June 2020 from academia, private companies and public local state departments dedicated to discussing data analytic strategies with potential to inform decision-makers in the southern Brazilian state of Santa Catarina (SC) [60,61]. Special thanks to Dr. Ana Luiza Curi Hallal for the valuable exchanges and discussions about epidemic modelling in the early days of this group. We thank CIASC (SC) for providing us the credentials to download data from Platform BoaVista. Thanks, Bang Wong, for providing a colour scheme inclusive for colour-blind people [62]. Data Science Brigade acknowledges funding from ICASA (SC). P.H.C.A. and L.C.L. acknowledge that this study was financed in part by CNPq (Brazilian Research Council) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Finance Code 001. P.H.C.A. acknowledges that during his stay at KCL and A*STAR he is partly funded by King's College London and by the A*STAR Research Attachment Programme (ARAP).

Code availability
The source code to replicate this study and to generates the figures in this paper are available at: https://github.com/jonjoncardoso/paper-covid19-modelling-2022-healthcare-analytics. The source code of the models, written in R and Stan, can be found on the following Github repository: https://github.com/jonjoncardoso/modelo-epidemiologicosc.

Appendix A. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.health.2022.100115.