Comparing the accuracy of several network-based COVID-19 prediction algorithms

Researchers from various scientific disciplines have attempted to forecast the spread of coronavirus disease 2019 (COVID-19). The proposed epidemic prediction methods range from basic curve fitting methods and traffic interaction models to machine-learning approaches. If we combine all these approaches, we obtain the Network Inference-based Prediction Algorithm (NIPA). In this paper, we analyse a diverse set of COVID-19 forecast algorithms, including several modifications of NIPA. Among the algorithms that we evaluated, the original NIPA performed best at forecasting the spread of COVID-19 in Hubei, China and in the Netherlands. In particular, we show that network-based forecasting is superior to any other forecasting algorithm.


Introduction
In December 2019, SARS-CoV-2, the virus that causes coronavirus disease 2019 , emerged in the Chinese province of Hubei. The number of COVID-19 cases in China rose dramatically to almost 80,000 by the end of February 2020. From China, COVID-19 quickly spread throughout the whole world, with almost ten million cases by the end of June 2020. Many countries imposed nation-wide lockdowns to slow down the spread of COVID-19. A reliable forecast of the pandemic outbreak is key for targeted disease countermeasures and for the appropriate design of exit strategies to lift lockdowns.
Unfortunately, just as weather forecasts, the prediction of epidemic outbreaks is subject to fundamental limits (Moran et al., 2016). One aspect is the limited availability of data, because epidemic time series are relatively short, and carrying out medical tests on a large * Corresponding author. scale is challenging. Also, the final number of infected cases is highly sensitive to initial perturbations . Nonetheless, many methods have been developed and applied to forecast the spread of COVID-19. Perhaps the simplest approach is based on fitting the number of infections to a sigmoid curve, such as the logistic function (Roosa et al., 2020;Verhulst, 1845), Hill function (Hill, 1910), or Gompertz function (Gompertz, 1825). Using nonlinear regression, the parameters of the sigmoid curve can be estimated. For the comparison of prediction algorithms in this work, we focus on the logistic function. The logistic function is of particular interest, because the logistic function is the (approximate) solution for the number of infected cases (Van Mieghem, 2016) in the Susceptible-Infected-Susceptible (SIS) epidemic model, and for the number of removed cases in the Susceptible-Infected-Removed (SIR) epidemic model (Kermack & McKendrick, 1927;. By fitting the number of infected cases to a sigmoid curve, we implicitly assume that the spread in a particular https://doi.org/10.1016/j.ijforecast.2020.10.001 0169-2070/© 2020 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). region is independent of other regions, which contrasts with the strong interconnectedness of our modern world. The interaction between different regions, which is due to the movement of people, is taken into account by network-based techniques.
The interaction can be described by a network G with N nodes. Each node i in the network G represents a particular region (country, province, municipality, or city), and the link a ij ∈ {0, 1} represents the existence of an interaction from region j to region i, specified by a link weight β ij denoting the infection probability from region j to region i. The self-infection probability within a region i is given by β ii , which we expect to be dominant over the other infection probabilities, because the interaction within a region is stronger than the interaction with other regions. The N × N infection probability matrix B, with elements β ij is, however, unknown and must be derived from past observations of the epidemic. We address this issue in more detail in Section 2.
Throughout this work, we often use ''the number of infected cases", which we understand as ''the number of cases reported by local authorities". Asymptomatic individuals, who do not feel sick and even do not know that they are infected and infectious, are not reported and can infect others unwittingly. To gain an understanding of the percentage of asymptomatic cases, one possibility is to test the population at random with, for example, blood tests. For COVID-19, the fraction of asymptomatic cases is estimated to be as large as 80% (Day, 2020). Since the number of asymptomatic cases cannot be determined on a daily basis, we confine ourselves to the number of reported cases in this work.
Most epidemic models forecast the number of infected cases as a point forecast (generally: the mean of a distribution) rather than a complete distribution. All models in this work were designed to provide point forecasts, but can be generalised to provide prediction intervals. We discuss this topic further in Section 2.
The focus of this work is the comparison of a diverse set of methods for forecasting the spread of COVID-19, ranging from fitting closed-form epidemic curves and comprehensive machine-learning algorithms to networkbased approaches. We focus on the spread of COVID-19, but we emphasise that all methods can be applied to general epidemic outbreaks. We show that pure machine-learning and network-agnostic algorithms or epidemiological models are inferior to algorithms that combine multiple approaches and rely on the underlying network topology. In particular, the Network Inferencebased Prediction Algorithm (NIPA) is superior to any other algorithm that we evaluated. In Section 2, we explain eight forecast algorithms for predicting the future number of COVID-19 cases. In Section 3, we demonstrate their performance in two selected regions-Hubei, China and the Netherlands-and discuss the strengths and weaknesses of each algorithm. Finally, we summarise our findings in Section 4.

Prediction algorithms
The spread of COVID-19 can be measured in terms of the daily number of reported cases. We model the course of the epidemic with an SIR compartmental model, where each individual is either susceptible (healthy), infected (can infect the susceptible), or removed (recovered or died). We denote the (discrete) time by k = 1, . . . , n, where n is the total number of observation days. The first COVID-19 case was reported on day k = 1. Given that nearly all governments report their epidemic data once a day, we take a time step of one day as a natural choice and investigate the effect of the time step on the prediction accuracy in Appendix G. The SIR epidemic model with time-varying spreading parameters is given by: (Kermack & McKendrick, 1927;Prasse & Van Mieghem, 2020a;Youssef & Scoglio, 2011)).
and the fraction of susceptible individuals follows as Here, β ij [k] ≥ 0 denotes the infection probability from region j to region i at time k, and δ i > 0 denotes the curing probability of region i.
The spread of COVID-19 cannot be described exactly by the SIR equations 1, (2) and (3). The COVID-19 pandemic evolves in continuous time, whereas the SIR model evolves in discrete time, with a time step of one day. Additionally, the SIR model is unable to describe phenomena like personal social distancing, nation-wide lockdowns, and the availability of vaccinations. Each of these model assumptions introduces model errors. Prior to the introduction of several forecasting algorithms, we explain how model errors can be used to obtain prediction intervals for the forecasted number of infected cases.
As described in Prasse, Achterberg, Ma and Van Mieghem (2020), we obtain the fraction of susceptible , and removed R i [k] individuals in region i from the observed infections y i [k]. We aim to find the best possible forecastŷ i [k] for the cumulative number of infected cases y i [k] for region i and time k. In this work, we discuss eight prediction methods.

Potential generalisation to prediction intervals
Before introducing the different prediction methods, we emphasise that this work focuses on short-term point forecasts. Long-term epidemic behaviour is very random, and providing forecast intervals is essential to give a complete picture of the long-term viral spread (Cirillo & Taleb, 2020). Extending the point forecast methods in this work to prediction intervals is outside the scope of this work. Nonetheless, we consider it valuable to conceptually discuss an extension of the SIR equation (1) to allow for the computation of prediction intervals. A real epidemic does not follow the SIR model (1) exactly. Instead, the infection state I i [k] evolves from time k to k + 1 as The details of the outlined method for obtaining prediction intervals are beyond the scope of this paper. Two particular challenges are the determination of the distribution of the model errors w[k] and the implementation of a computationally efficient sampling method.

Sigmoid curves
The logistic function is a well-known example of an epidemiological sigmoid curve (Van Mieghem, 2016;Verhulst, 1845). We assume the cumulative number of infected cases y i [k] in region i at time k to follow a logistic function: where y ∞,i is the long-term fraction of infections, K i is the logistic growth rate, and t 0,i is the inflection point, also known as the epidemic peak. The parameters y ∞,i , K i , and t 0,i are estimated for each region separately using a nonlinear curve fitting procedure, which is explained in Appendix F. Other sigmoid curves, like the Hill function and Gompertz function, are also discussed in Appendix F.

Long short-term memory
Recurrent neural networks (Elman, 1990) (RNNs) have been used in various tasks related to sequences (Goodfellow, Bengio, & Courville, 2016), time series analysis and forecasting, speech recognition or natural language processing (Young, Hazarika, Poria, & Cambria, 2018), and they have been demonstrated to achieve state-of-the-art performance. LSTM networks (Hochreiter & Schmidhuber, 1997) are specific types of RNNs that resolve the long-standing problem of long-term dependencies. LSTM introduces additional input, output, and optional forget gates as interfaces with additional weights on the top of standard input data and hidden weights in the standard RNN unit. There are several variations (Gers & Schmidhuber, 2001;Gers, Schmidhuber, & Cummins, 2000) of LSTM networks, such as LSTMs with or without a forget gate and a ''peephole connection", (Jozefowicz, Zaremba, & Sutskever, 2015). For the internal mechanism between the gates and the exact mathematical relations, we refer the reader to Gers et al. (2000) or Yu, Si, Hu, and Zhang (2019). Here, we utilise the most common mechanisman LSTM with a forget gate. In the simulations, we use an LSTM with sequence and hidden sizes both equal to four in a single LSTM layer (e.g., it is possible to stack a few LSTM layers, which leads to more overfitting), a learning rate of 0.1, and the Adam optimiser (Kingma & Ba, 2014), with mean squared error loss in 2000 epochs of training.

Network inference-based prediction algorithm (NIPA)
Network-based approaches take into account the interactions between different regions. However, the contact network G is unknown (and consequently also the infection probability matrix B) and must be inferred from the epidemic outbreak. NIPA was originally proposed in Prasse and Van Mieghem (2020a), and an adaption of NIPA was applied to the spread of COVID-19 in Hubei, China (Prasse, Achterberg, Ma et al., 2020) and Italy (Pizzuti, Socievole, . NIPA consists of two steps. First, the underlying infection matrix B is inferred from the epidemic outbreak. Second, the infection matrix B and the estimated curing rates δ i for node i are used to forecast the outbreak by iterating the SIR model on the estimated infection matrix B. Even though NIPA successfully forecasted the spread of COVID-19 in the Chinese province of Hubei, the underlying infection matrix B could not be inferred (Prasse & Van Mieghem, 2020b).

NIPA applied to each region separately
As a benchmark model, we apply NIPA to each region separately, which we name NIPA separate. NIPA separate is a machine-learning method based on the SIR model, but it does not consider the interaction between different regions. Table 1 All algorithms discussed in this paper. *If the algorithm is based on a phenomenological epidemic process, like the SIR model. **If the algorithm is able to forecast small perturbations in the global trend. ***If the spread between different regions is considered.

NIPA static prior
The formulation of NIPA can be extended to include knowledge of the underlying contact network. We use a time-independent traffic network (with the corresponding traffic intensity matrix M) to obtain a prior for the infection probability matrix B as We explain our motivation for the prior infection matrix B prior in Appendix B. The positive scalars c 1 , . . . , c N are unknown and are set by cross-validation. We assume that the true infection matrix B is normally distributed around the prior infection matrix B prior . Based on the prior infection matrix B prior and observations of the spread of COVID-19, we obtain the Bayesian estimate B posterior by solving the optimisation problem ) T at all times k = 1, . . . , n. Using the estimated infection matrix B posterior and the estimated curing rates δ i for region i, we forecast the outbreak by iterating the SIR model. For details on NIPA static prior, see Appendix C.

NIPA dynamic prior
During the COVID-19 pandemic, many countries have imposed some kind of lockdown, in which the free movement of people is significantly restricted. Thus, the true contact network G is not static but varies over time. We use a time-varying traffic matrix M[k] as an approximation for the prior infection matrix B prior [k], whose entries equal for all times k. The positive scalars c 1 , . . . , c N are unknown and are set by hold-out validation. We propose a Bayesian approach called NIPA dynamic prior to estimate the true infection matrix B[k] from the time series of infected cases y i [k] and the prior infection matrix B prior [k]. Using the estimated time-varying infection matrix B posterior [k] and the curing rates δ i for each region i, we forecast the outbreak by iterating the SIR model. Appendix D explains the technical details of NIPA dynamic prior.
One challenge to NIPA dynamic prior is the unavailability of the contact network in the future. Hence, we assume that the traffic matrix will remain constant after the last observation point n: B prior [n + k] = B prior [n] for all k > 0. We summarise all prediction algorithms in Table 1.

Evaluation of the prediction performance
We evaluate the prediction accuracy of the methods discussed in Section 2 by forecasting the spread of COVID-19 in a selected number of regions. We set the maximal forecast horizon to six days, because of the difficulty of predicting epidemic outbreaks .

Each prediction algorithm produces a forecastŷ i [k] for the cumulative number of infected cases y i [k] for region
i at time k. To quantify the prediction error at time k, we use the symmetric mean absolute percentage error (sMAPE) which is commonly used in forecasting (Hyndman & Koehler, 2006). Furthermore, we quantify the percentage error (PE) as follows: for region i and time k to investigate over-and underestimations. We consider the spread of COVID-19 in two regions: the cities in Hubei, China, and the provinces in the Netherlands. These regions cannot be regarded as full representatives of the spread of COVID-19, let alone general infectious diseases. Rather, these regions illustrate the strengths and weaknesses of our methods.

Hubei, China
We evaluate the prediction accuracy first in the Chinese province Hubei. In December 2019, the first cases of COVID-19 were detected in Wuhan, the capital of Hubei. The first case outside Wuhan was reported on January 21. From January 24 onwards, the whole province Hubei was under lockdown, prohibiting any non-urgent travel. On February 15, the local government in Hubei changed the diagnosing policy, causing an erratic increase in the number of reported cases on February 15. Therefore, we restrict ourselves to the period from January 21 to February 14. The reported cases are provided by the Health Commission of Hubei (2020). The majority of COVID-19 patients were reported in Wuhan, as shown in Fig. 1. We removed the region Shennongjia from our analysis, because of the small number of infections in that region.
For NIPA static prior, we require a traffic network describing the interactions between the cities in Hubei.  The sMAPE error in Fig. 2 tends to decrease as time evolves, because a growing amount of data is available. Furthermore, the total number of infected cases quickly increases, whereas the daily infected cases increase at a lower rate, indicating sub-exponential growth (Maier & Brockmann, 2020;. Sub-exponential growth will inevitably reduce the sMAPE error, because sMAPE is a relative error metric. On the other hand, the prediction accuracy decreases rapidly if the forecast horizon is enlarged. In particular, the number of cases five and six days ahead around February 1 cannot be predicted accurately, which is illustrated by Figs. 2(e) and 2(f), respectively.
In general, the logistic function performs worse than the other algorithms. There may be several reasons for this. First, by fitting a logistic curve, we assume the number of cases to follow the SIR model closely (Kermack & McKendrick, 1927;. Hence, we do not allow any individual or governmental responses to COVID-19, which typically flattens the (logistic) curve. Second, the logistic function ignores the spread between regions, which further deteriorates the prediction accuracy. Third, the logistic function is symmetric around the epidemic peak at k = t 0 ; the increase and decrease in the number of cases around the peak is equal. Most epidemic outbreaks of COVID-19 show a rapid increase and a more gradual decrease in the daily number of cases. A possible reason for this is that most lockdowns are enforced immediately, whereas lockdown measures are lifted gradually. Occasionally, the Hill function (Hill, 1910) and Gompertz function (Gompertz, 1825) are used to predict epidemic outbreaks, because they allow asymmetry around the epidemic peak. In this work, we focus on the logistic function because of its relation to the solution of the SIR and SIS models, and we discuss the Hill function and the Gompertz function in Appendix F.
The performance of LSTM is fairly good, but LSTM fails to find an accurate forecast around January 31. Since the time series is the shortest at the left-most part of Fig. 2, less data is available to train the LSTM. Pure machinelearning algorithms are known to yield a lower prediction accuracy than other methods if the time series is short (Makridakis, Spiliotis, & Assimakopoulos, 2020).
The prediction accuracy of all NIPA methods in Fig. 2 is similar, although NIPA static prior is considerably worse around February 4 for predictions of three or more days ahead. A possible reason is that the impact of the nationwide lockdown on January 24 is captured incorrectly by the static prior, whereas the original NIPA method has more freedom to adjust its contact network accordingly and NIPA dynamic prior receives a more tailored, time-varying prior during the lockdown situation. Another reason is that the prior network (dynamic or static) may deviate significantly from the true infection matrix. Under ideal circumstances, namely when the epidemic outbreak exactly follows the SIR model, we show that NIPA static prior outperforms NIPA in Appendix E. Fig. 2 also shows that the negligence of the network interaction by the NIPA separate model decreases the prediction accuracy compared to NIPA. Hence, a networkbased approach appears beneficial for forecasting. We summarise the results in Section 4.
Another interesting topic is forecast bias: the tendency to systematically overestimate or underestimate the true number of infected cases. Using the Percentage Error (PE), we estimate the bias for all prediction algorithms for region i at time k. The surface error plots in Fig. 3 show the PE as a function of time for a four-days-ahead prediction. The logistic function and LSTM show the largest deviation around the mean, especially around February 1, which is in agreement with Fig. 2. Furthermore, Fig. 3 illustrates that the logistic function and LSTM systematically underestimate the true number of cases. On the other hand, NIPA static prior appears to overestimate the true number of cases. A possible reason for this is the following. The static network is taken to be proportional to the traffic flow before the lockdown measures. When a lockdown is introduced, the static prior remains constant, so the algorithm overestimates the true result. After some time, the newly collected data shows evidence that the prior is not very accurate, so NIPA static prior ignores the prior and uses the data instead, which improves the forecast accuracy again.

The Netherlands
As a second case study, we regard the spread of COVID-19 in the Netherlands. The first patient, who had visited Italy the week before, was diagnosed on February 27. After February 27, the number of cases grew rapidly, as depicted in Fig. 4. The epidemic peak was observed at the end of March, and the daily number of cases subsequently dropped. We consider the spread of COVID-19 at a provincial level, for which data is available from the Dutch National Institute for Public Health and the Environment, called RIVM (RIVM, 2020). The Netherlands is subdivided into 12 provinces, for which the RIVM reports the daily number of new infections. Since the number of infected cases increased more gradually in the Netherlands than in Hubei, China, the total epidemic period is longer and more data points are available. A more gradual increase in the number of cases should be beneficial for the prediction accuracy.
For NIPA static prior, we require a traffic network as an approximation for the interaction between the provinces. Statistics Netherlands (Centraal Bureau voor de Statistiek) reports the number of people m ij working in province i and living in province j, averaged over one year (CBS, 2018). We use the Google Mobility Data ''Workplaces" to estimate the time-varying traffic network for each province in the Netherlands (Google LLC, 2020). Google reports the percentage decrease of traffic p i [k] on day k in province i compared to an ordinary day between January 3 and February 6, 2020. During the lockdown, we expect p i [k] < 1 because of the lockdown measures. Then, we construct the time-dependent traffic matrix as follows: The prediction accuracy for the Netherlands is outlined in Fig. 5. Before April 1, the situation in the Netherlands is similar to Hubei, where the NIPA methods perform the best, but there are large deviations in the prediction accuracy. After April 1, the accuracy of the NIPA methods is nearly identical to each other. In other words, the  influence of the initial static/dynamic network on the prediction is small. The main reason for this is that the NIPA algorithms are trained on a growing amount of infection data as time advances. Among the best performing methods over the whole period are original NIPA and NIPA separate, whereas the logistic function and LSTM show the worst performance.
The prediction accuracy of NIPA separate and NIPA are comparable, except at the left-hand side of Fig. 5. A possible reason for this is that the spread of the coronavirus

Table 2
The performance of all algorithms discussed in this paper. The Netherlands is abbreviated as NL. *As input, each algorithm requires the population size N i of each region i and a time series of the infected cases y i [k] in each region i at any time k.

Algorithm
Additional input* Error ( After imposing the lockdown at the end of March, the interaction between provinces decreased significantly, so the spread of the coronavirus mainly took place within each province.

Conclusion
We compared the prediction accuracy of eight algorithms designed to forecast the spread of COVID-19. We summarise the results in Table 2. The error in Table 2 was obtained by averaging over all sMAPE forecast errors for forecast horizons between one and six days. Fitting a sigmoid curve, like the logistic function, performed the worst among the methods considered. The main reasons for the low prediction accuracy are the imposed symmetry around the epidemic peak and the negligence of the interaction between regions. Other sigmoid curves, such as the Hill function and the Gompertz function, performed slightly better than the logistic function, but performed worse than most other algorithms. The LSTM machinelearning algorithm is not based on any phenomenological epidemic processes, nor does it consider provincial interactions. Table 2 shows that the prediction accuracy of LSTM is comparable to the Hill and Gompertz functions.
The Network Inference-based Prediction Algorithm (NIPA) is a combination of machine learning and phenomenological epidemiology (SIR model), and it considers the interaction between different regions. Table 2 illustrates that the prediction accuracy of NIPA is better than that of any other algorithm. Applying NIPA for each region separately (NIPA separate) yielded a forecast error comparable to that of LSTM. We thus conclude that a network-based approach is beneficial for accurate forecasts. We also showed that choosing a time-varying or static prior close to the true contact network may improve the forecast accuracy of NIPA. Surprisingly, the inclusion of a time-varying or static prior in NIPA on real infection data does not improve the forecast accuracy for the considered regions. Among several reasons, the chosen prior might be an inaccurate estimate of the true contact network.
In a practical setting, such as the current COVID-19 pandemic, policymakers might prefer to anticipate to worst-case prediction of the number of infected cases. In that case, an asymmetric error metric that penalises underestimations more significantly than overestimations may be more suitable.

Acknowledgments
LM is supported by the China Scholarship Council. This work was supported by the Universiteitsfonds Delft in the program TU Delft COVID-19 Response Fund, The Netherlands.

Appendix A. SIR epidemic model
The SIR epidemic model is defined in Definition 1. The COVID-19 pandemic does not exactly follow the SIR epidemic model. Instead, at any time k, the fraction of COVID-19 infections in region i obeys  Lemma 4 (Prasse, Achterberg, Ma et al., 2020). Suppose that ≤ 1 at any time k ∈ N for any node i.
Proof. We prove Lemma 4 by induction. Suppose that at time k for any node i it holds that Under Assumption 3, it holds that 0 ≤ δ i ≤ 1 and β ij ≥ 0.
Thus, we obtain from the SIR governing equation (1) and (A.6) that both I i [k + 1] and R i [k + 1] equal a sum of positive addends, which implies that Furthermore, we obtain for any node i that

Appendix B. Motivation for the static and dynamic prior
We intend to give a short motivation for the static prior in Eq. (6). Suppose that each individual has on average ⟨d⟩ contacts (here, ⟨·⟩ denotes the average) in the population. If a person is infected and that person's neighbours are healthy, the person can infect any of its neighbours independently with probability p. Hence, the total number of infections follows a Binomial distribution In case ⟨d⟩ is large and λ ≡ p⟨d⟩ is small, we can approximate (B.1) by a Poisson distribution If there are N visiting, infected individuals that may all infect the population independently, the resulting distribution is the sum of independent, identically distributed Poisson distributions, which is again a Poisson distribution with ⟨m⟩ = Nλ.
We denote the number of people living in region j and travelling for work to region i by m ij . Each individual has ⟨d⟩ contacts and can infect each individual with probability p. Then, region j has on average m ij ⟨d⟩p new infections, provided that no two individuals who visit the same region j have contact with the same people. In particular, the fraction of new infections that region i gets from region j is given by , we obtain Eq. (6).

Appendix C. Details on NIPA static prior
We assume that the infection matrix B is normally distributed around the prior B prior , whose elements equal b prior,ij = c i m ij : Here, c i denotes the proportionality constant, and the con- The normal distribution ( With the constraint in (C.3), we ensure that the predictions of the infections satisfy 0 ≤ I i [k] ≤ 1; see Lemma 4 in Appendix A. We define the (n − 1) × 1 vector V i and the (n − 1) × N matrix F i as follows (Prasse, Achterberg, Ma et al., 2020): . . . and We obtain the Bayesian estimate B posterior by solving a constrained linear least-squares problem. Proposition 6 is an adaptation of the Bayesian interpretation in Prasse and Van Mieghem (2020b).
Proposition 6. Under Assumptions 2 and 5, the Bayesian estimation problem (C.3) is equivalent to solving the optimisation problem for any region i, where the penalisation parameter equals Proof. The objective function of the optimisation problem (C.3) is equivalent tô In the following, we rewrite the two terms in (C.7). First, with (C.1), it holds that Neither the term log (α i ) nor the term log ( √ 2π σ i ) depend on the matrix B. Furthermore, the prior log (Pr [B]) is finite only if 0 ≤ β ij ≤ 1 for all regions i, j. Thus, the optimisation problem (C.7) is equivalent tô Second, since the model errors w i [k] are stochastically independent for different regions i, we can rewrite the second term in the objective of (C.9) as where the second equality follows from (A.1), and by defining Under Assumption 2, the model error w i [k] follows the normal distribution. Thus, it holds that (C.13) The term log is independent of the matrix B.
Thus, it follows from (C.10) and (C.13) that the second term in the objective of (C.9) can be replaced by where the equality follows from the definition of the vector V i and the matrix F i in (C.4) and (C.5), respectively. Hence, the optimisation problem (C.9) becomeŝ The problem (C.15) can be optimised independently for any region i. Thus, we obtain, after multiplication with 2σ 2 w , the equivalent optimisation problem for any region i as (C.16) By identifying ρ i = σ 2 w /σ 2 i , we obtain that (C.16) with the constraint ∑ N j=1 β ij ≤ 1 is equivalent to the constrained linear least-squares problem (C.6). □ The first term in the objective of (C.6) measures the fit to the observed epidemic data. The second term measures the deviation of the infection rates β ij from the prior (C.1). The scalar parameter ρ i balances the two terms: if the prior (C.1) is very accurate or the model errors w i [k] are large, then ρ i should be large. The optimal value of the parameter ρ i is equivalent to the ratio of the unknown variances σ 2 w and σ 2 i of the model errors w i [k] and the prior (C.1), respectively. The optimisation problem (C.6) is convex and can be solved efficiently (Boyd & Vandenberghe, 2004). To obtain the solution to (C.6) numerically, we make use of the Matlab command lsqlin. We stress the similarity of the optimisation problem (C.6) to the least absolute shrinkage and selection operator (LASSO) of Tibshirani (Tibshirani, 1996), which is the basis of NIPA without prior (Prasse, Achterberg, Ma et al., 2020). Instead of the second least-squares term in the objective of (C.6), LASSO considers the ℓ 1 -norm penalisation term In fact, NIPA without prior can also be interpreted as a Bayesian estimation approach (Prasse & Van Mieghem, 2020b).

C.1. Pseudocode
To solve the optimisation problem (C.6) for the infection rates β i1 , . . . , β iN , we must specify three unknown variables. First, we must specify the curing rate δ i of region i, which determines the fractions S i [k] and R i [k] of susceptible and recovered individuals, respectively (Prasse, Achterberg, Ma et al., 2020). Second, we must specify the parameter ρ i . Third, the proportionality constant c i of the prior (C.1) is also unknown. We perform cross-validation to set the three unknown variables δ i , ρ i , c i .
NIPA static prior is similar to NIPA without prior, except for two alterations. First, we solve the constrained linear least-squares problem (C.6) instead of LASSO. Second, in addition to the parameter ρ i and the curing rate δ i , for Bayesian NIPA there is one more unknown variable, namely the proportionality constant c i , which is a parameter of the prior distribution (C.1). To determine the constant c i , we consider 50 logarithmically equidistant candidate values in the set Ψ = {c min , . . . , c max }. The minimal and the maximal values are set to c min = 0.01 and c max = 100, respectively. We set the value of c i by cross-validation. To obtain the epidemic outbreak prediction of Bayesian NIPA, we execute (Prasse, Achterberg, Ma et al., 2020, Algorithm 1), where (Prasse, Achterberg, Ma et al., 2020

Appendix D. Details on NIPA dynamic prior
We assume that the time-varying infection rates β ij [k] are proportional to the known population flow m ij [k]. More precisely, we assume that the infection rates β ij [k] for all regions i, j, when i ̸ = j, equal for some unknown proportionality constant c i > 0. Furthermore, we assume that the self-infection probabilities β ii do not change over time k. With (D.1), the SIR model in Definition 1 yields that  To solve the optimisation problem (D.4) for the constants c i and β ii , we must specify two unknown variables. First, we must specify the curing rate δ i of region i, which determines the fractions S i [k] and R i [k] of susceptible and recovered individuals, respectively (Prasse, Achterberg, Ma et al., 2020). Second, we must specify the parameter ρ i . We perform hold-out cross-validation to set the unknown variables δ i and ρ i : The training set consists of the first 80% of the observations, and the validation set equals the last 20% of the observations. In pseudocode, NIPA dynamic prior is given by Algorithm 2.

Appendix E. NIPA static prior under perfect conditions
The original NIPA method is known to provide accurate predictions when the epidemic perfectly follows the SIR model (Prasse, Achterberg, Ma et al., 2020, Supplementary Material 1). Here, we intend to show that NIPA static prior performs even better if the prior matrix is close to the real infection matrix.
Suppose we generate data from an SIR epidemic as in Definition 1. We use a network with N = 10 nodes with an equal curing rate δ for each node: δ i = 0.2 for all i. We set the curing rate δ i in the NIPA algorithms to the exact curing rates δ i = 0.2, such that both NIPA and NIPA static prior will always estimate the curing rates correctly. We consider infection probabilities β ij that are uniformly distributed in the interval (0, 1). The effective reproduction number R 0 can be computed as (Van den Driessche & Watmough, 2002) ) . (E.1) We normalise B element-wise such that the basic reproduction number R 0 equals 2.0. Furthermore, we set the population size N i for each region i to a uniformly distributed number in the interval [10 5 , 10 6 ] and start with an initial y 1 [1] = 100 infected cases in node 1, and zero infected cases in the other nodes. Most importantly, we set the prior infection matrix B prior to the exact infection matrix B, multiplied by some noise Here, w ij is uniformly distributed in the interval [1,2]. The other parameters are the same as in the main article.
The result in Fig. E.6 is clear: NIPA static prior is able to capture the dynamics much better than NIPA. Hence, we conclude that NIPA static prior in combination with a good prior yields better prediction accuracy than the original NIPA method.

Appendix F. Sigmoid curves
In epidemiology, sigmoid curves are commonly used to forecast the future number of infected cases.

The logistic function was developed by Verhulst in 1845
to explain the growth of the population in a specific region (Verhulst, 1845). The logistic function is the most often used sigmoid curve in epidemiology, because the logistic function is the (approximate) solution of the SIS and SIR model . The logistic function assumes the cumulative number of infected cases y i [k] in region i and time k to follow where y ∞,i is the long-term fraction of infections, K i is the logistic growth rate, and t 0,i is the inflection point, which is also known as the epidemic peak.
The Hill function was introduced in 1910 to describe the binding of molecules to surfaces (Hill, 1910). Later, it was successfully applied to describe the spread of epidemics (Kiskowski & Chowell, 2016). The Hill function assumes the cumulative number of infected cases y i [k] in region i at time k to follow where y ∞,i is the long-term fraction of infections, K i is the Hill growth rate, n i is the Hill coefficient, and t 0,i is the inflection point, also known as the epidemic peak.
The Gompertz function was introduced in 1825 to describe human mortality in a general population (Gompertz, 1825). Later, the Gompertz function was also used to describe the spread of epidemics (Winsor, 1932). where y ∞,i is the long-term fraction of infections, c i is a displacement factor (comparable to the inflection point), and a i is the Gompertz growth rate.
We describe the curve-fitting procedure here for the logistic function, but the parameters for any curve can be estimated analogously. Suppose that we have a time series of the cumulative number of reported cases y rep,i [k] for time k = 1, . . . , n and for any region i. Then, we minimise the mean squared error for each region separately: where N i is the population of region i. We evaluate the nonlinear minimisation problem (F.4) by the command GlobalSearch in Matlab. As initial conditions, we provide y ∞,i = y(t obs ), K i = 1, t 0,i = t obs . The parameters (y ∞,i , K i , n i , t 0,i ) for the Hill function and (y ∞,i , c i , a i ) for the Gompertz function can be estimated analogously.

Appendix G. Influence of the time step on the prediction accuracy
In the discrete-time SIR model (1), we use the time step ∆t = 1 day. By approximating a continuous-time process (the COVID-19 pandemic) by a discrete-time process (SIR model) we make a model error. We investigate the influence of the time step on the prediction accuracy by comparing the NIPA prediction accuracy for various time steps, ranging from ∆t = 0.5 days to ∆t = 3 days. Since the number of infected cases is (generally) reported once a day, the data for the time step ∆t = 0.5 days is obtained by linearly interpolating the number of cumulative cases y i [k]. For time steps ∆t = 1 day and ∆t = 0.5 days, we smooth the raw data before calling the NIPA algorithm (Prasse, Achterberg, Ma et al., 2020).
For time steps ∆t = 2 days and ∆t = 3 days, there are two possible methods. Method (A) assumes that the cumulative number of cases y i [k] is reported every two (or three) days, and is unreported on the intermediate days. Then, we smooth the remaining data before the NIPA algorithm is used. In fact, we have omitted the data on the intermediate days. In contrast, method (B) first smooths all raw data. Thereafter, we only use the cumulative number of cases y i [k] every two or three days for a time step of two or three days, respectively. The main difference is that method (A) completely neglects the data on intermediate days, whereas method (B) first applies a smoother, and then neglects the intermediate data.
Figs. G.7 and G.8 show an exemplary situation from the Netherlands for three initial dates. The configuration for the time steps ∆t = 1 day and ∆t = 0.5 days is equal in both figures. At the beginning of the COVID-19 outbreak, as shown in Fig. G.7(a) for method (A) and Fig. G.8(a) for method (B), the prediction accuracy is similar for all time steps. The small amount of available data and the rapidly increasing number of cases hampers accurate forecasting. As the epidemic evolves, method (A) and method (B) start to deviate. By omitting data, as in method (A), the sMAPE error in Fig. G.7 increases more quickly for time steps of two and three days than for smaller time steps. Hence, removing data causes the prediction accuracy to decrease. On the other hand, method (B) in Fig. G.8 shows similar behaviour for all time steps. We conclude that if the amount of data is unchanged, the choice of the time step has limited effect on the prediction accuracy.