An iterative model of the COVID-19 virus spread based on population density-a case study of Poland

The correct recognition of spread patterns of the COVID-19 virus is crucial for effective decisionmaking on preventive measures to be taken and protection of society. A forecast of hazard development, substantiated by reliable virus propagation models, can support control of the infection risk and minimize its impact on the population. Decisions to close or limit social activities or to lift bans can then be made in the right area at the right time. This study proposes an iterative agent-based and compartmental model used to analyse the COVID‐19 virus spread. The contagion mechanism is based on population density data and Monte Carlo methods. The model parameters were optimized and verified using data about incidence in Poland between March 3 and November 24, 2020. To obtain optimum values of model parameters, a simulation and analysis were completed for three vaccination strategies begun on December 27, 2020. This study contains numerical results and spatial analyses that represent and confirm the accuracy of assumptions of the proposed model. Certain limitations of the proposed approach and planned directions of future work are also indicated in the final section.


Introduction
A new variant of coronavirus named SARSCoV-2 (Severe Acute Respiratory Syndrome Coronavirus) 1,2 began to spread by the end of 2019 in the city of Wuhan 3 , Hubei province, China 4 . The World Health Organization (WHO) announced in January 2020 a new coronavirus epidemic posing a hazard to public health on an international scale 5 . Considering the contagious disease caused by the new coronavirus, the WHO chose in February 2020 the official name -COVID-19 (an abbreviation of Coronavirus Disease 2019) 2 , and on March 11, 2020 announced the pandemic of COVID-19 6 . Three coronaviruses characterized by rapid transmission and pathogenic properties appeared over the last 20 years, causing the severe acute respiratory syndrome: SARS-CoV (Severe Acute Respiratory Syndrome) 7 , MERS-CoV (Middle East Respiratory Syndrome Coronavirus ) 8 and the recently identified new coronavirus, temporarily termed 2019-nCoV 1 , that on February 11, 2019 was finally named COVID-19 2 , and since February 14, 2020 the pathogen causing that disease is known as the SARS-CoV-2 9 virus.
Coronaviruses cause various diseases in mammals and birds, beginning with inflammatory bowel disease and conditions of the upper respiratory tract, ending with potentially fatal infections of the respiratory tract in humans 10 . There are coronavirus variants that can affect animals, and in exceptional cases the coronavirus can cross the inter-species barrier to colonize the human population 8 . It is highly probable that the new coronavirus could cross the barrier between an animal species and people to spread in the human population 11 .

Severity of the pandemic
The best-known historical pandemic was plague in the mid-14th century (Black Death, 1347-1351) that killed about 25% to 50% of the then population (about 200 million deaths) 12,13 . The dynamic development of medicine at the turn of the 20th and 21st centuries reduced the frequency of epidemics and pandemics. However, pandemics are not rare phenomena these days. Due to anthropogenic impact on the environment, that has intensified for decades, and the emergence of new conditions for pathogen transmission 9 , several pandemics occurred over the last ten years: HIV/AIDS (Human Immunodeficiency Virus Infection and Acquired Immunodeficiency Syndrome) 14 , A/H1N1 (species Influenza A virus of the genus Influenzavirus A) 15 , SARS (Severe acute respiratory syndrome-related virus of the genus Betacoronavirus) 16 , EBOLA (EBOV 7 ) (the hemorrhagic fever caused by the Ebola virus) 15 . Now in 2020, the humankind faces the most severe crisis caused by   18 . A pandemic is an outbreak of infectious disease that occurs over a wide geographical area and that is of high prevalence 19 . An epidemic afflicts numerous people, transmitted simultaneously between them, but remains limited to a region. A pandemic exposes to the hazard of infection entire countries, continents or the world. The correct recognition of spread patterns of the COVID-19 virus is crucial for effective decisionmaking on preventive measures to be taken and immediate protection of society. A forecast of hazard development, substantiated by reliable virus propagation models, can support control of the infection risk by minimizing its impact on the population. Decisions to close or limit social activities or to lift bans can then be made in the right area at the right time. Precise demographic data, good laboratory infrastructure (for performing tests) and logistics can contribute to the understanding of incidence, spread of the illness and its trends. This knowledge can support a rapid, precise and effective response of the public health sector to the COVID-19 pandemic 20 .

Models used to describe the pandemic
The contemporary modelling techniques and available computing power made a valuable contribution to the understanding of epidemic and pandemic spread phenomena 16 . The models currently used are divided into two categories: agent-based models [21][22][23] and compartmental models) [24][25][26] . The agent-based models (ABM) ) 21-23 represent computational structures on a system (macro) scale described by behaviours of agents on a micro scale. "Agents" may include both individuals and collective entities, employees of services, units, particles or distinguished quantitative components such as social groups, institutions or organizations. A typical ABM consists of three components: agents, the environment and the rules followed by an agent plus agent's mutual local interactions with other components of the system and environment. Due to the development of advanced computer techniques, agent-based modelling has become an attractive tool in research into complex and dynamic systems, especially in the life sciences 22 . The compartmental models 21,24 are characterized by a simple notation and effectiveness in the modelling of epidemiological hazards. They simplify mathematical modelling of contagious diseases. The population is assigned to compartments characterized by status labels, e.g. (susceptible, infectious, recovered) 25 . Groups of people can be transferred between compartments and exposed to factors that assign them defined status labels. The order of labels determined in a pattern is essential. The typical patterns used include: SIR (Susceptible, Infected, Recovered), SEIS (Susceptible, Exposed, Infected, Susceptible) and SEIR (Susceptible, Exposed, Infected, Recovered), qualified as repeatedly susceptible 27 . The models date back to the early 20th century and were developed in the study published by Kermack and McKendrick in 1927 28 . The models are usually described using simple differential equations but can also be expanded by stochastic (random) components that are more realistic and far more difficult to analyse. The models are built to predict the pattern of disease spread, define the current and total number of infected people, the number of fatal infections, the number or recovered patients and duration of the epidemic. The modelling process leads to estimates of various epidemiological parameters and of effects of preventive measures. It enables us to assess how various actions taken by the public health sector affect the outcome of the epidemic, what is the result of limited mobility, the result of direct protection measures and what is the most effective strategy of giving a limited number of vaccines in the population 24,29 . The authors of agent-based models frequently criticize the compartmental models due to their difficulties in reflecting the real characteristics and relations between various areas, regions, in describing population characteristics, such as population distribution 30,31 . This study proposes an iterative agent-based and compartmental model used to describe and analyse the spread of the COVID-19 virus. The contagion process has been expanded by Monte Carlo methods 32 . The model is based on random infection and mobility. It represents an expanded classical SIRS epidemic model 29 . The model takes into account the characteristics of real world described by the distribution of population in the analysed region. Importantly, it also considers interrelations between various unit areas and those interrelations represent crucial factors that accelerate the process of transformation of an epidemic into a pandemic and pandemic development. The model describes a two-dimensional network of unit areas wherein cells, due to their mobility characterized in a random manner, are capable of infecting cells in other unit areas. The proposed model is based on transmission of the disease described by a contagion function. The pattern of statuses that may be assumed by the cells in the population includes the labels "healthy", "susceptible", "infected", recovered", "dead" and "vaccinated". The healthy cells represent a set in the entire population that may be exposed to the risk of infection. The susceptible cells include the entire population of cells minus dead, recovered and vaccinated cells. It is assumed that the recovered and vaccinated cells will not join the group of susceptible cells. At present, there is no information about acquisition by people of perennial immunity to the SARSCov-2 virus, because it is known for a dozen of months only. The acquisition of a pool of antibodies following contact with the virus or effective vaccination gives immunity for at least several months 9,33,34 . The relationship between the time of acquired immunity and the dozen-month period of research into the SARSCov-2 makes the above assumption plausible. The infected cells can become recovered or dead. Vaccines can be received by healthy cells.

Data sets
A model of population density in Poland in 2020 was adopted on the basis of data obtained by the University of Louisville and Columbia University in the United States 35 and verified using data gathered in the Demographic Yearbook of Poland 36 . The generally accessible data set can be collected in the GEOTIFF or ASCII XYZ format with a resolution of 30 [arc]. That data resolution approximates population density per 1km 2 . Data mapping is done in WGS84. One pixel represents the number of people on 1 km 2 . In the described model, the analysed sample population was written as a matrix with dimensions 698x1202 elements, each representing a unit area of 1km 2 . According to the Demographic Yearbook of Poland 36 , the population of Poland stands at 38 382.7 thousand (Fig.1).

Figure 1. Population density per 1 km 2
The data describing infection spread (Fig. 2) and the real status of exposure in the period between March 3, 2020 and November 24, 2020 were implemented based on daily updated sets published at gov.pl "Serwis Rzeczypospolitej Polskiej" 37 .

Social and economic conditions during the pandemic
The social and economic conditions in Poland have been heavily influenced by pandemic pressure since March 2019. The most serious problems include the aversion to social contacts and limited mobility as well as restrictions imposed on social activities and functions. These responses are dictated principally by the fear of infection and unknown development of the disease that may be mild and asymptomatic but also serious and fatal. At present, nobody can predict the variant of disease development in an individual. The limitations imposed on activity of bars, restaurants, pubs, hotels and guest houses, restrictions affecting the gym and fitness industry and swimming pools arouse growing social discontent. On the other hand, the measures implemented by government, aimed to establish emergency health care centres, to limit social activities and functions, to provide businesses with funds, support continued existence of society. Due to replacement of the sanitary regime in Poland by a more liberal policy (May / June 2020) 9 abolishing the obligation to wear masks and keep social distance, and introducing completely free market, the spread of disease, contained until May 2020, was released in society. This is clearly illustrated by a rapid increase in the number of infections in Fig. 2. The period between March 2020 and May 2020 can be characterized as a time of a controlled increase in infections. Lifted bans and the effects of a less restrictive sanitary regime are reflected in the curve of infections with a delay. The bifurcation point is observed in October 2020. Thereafter, the curve illustrating the number of new infections indicates a completely new, dynamically growing trend.
Considering potential threats, recent research projects demonstrate that the health care system may collapse four weeks after the outbreak of coronavirus epidemic, and this involves an uncontrolled increase in infections and a dramatic growth in mortality rate. An example is provided by the growing problems of the health care system in the United States 38 and the collapse of the health care system in Italian Lombardy and in Spain 39 . At present, there is no reason to argue that without the measures aimed to maintain social distance (masks, physical distance, limited mobility) and without improved immunity acquired by vaccination the spread of disease will be contained and will not cause a dramatic growth in incidence and in fatal cases. The documented historical pandemics killed significant percentages of afflicted populations 19 .

A description of components of the COVID19 spread model considering population density -a case study of Poland
The concept of contagion model is based on an iterative description of infection spread, incorporating a random process (Eq. 1). The model is used to process information contained in a matrix of population with a defined density per 1km 2 . Only the susceptible cells can be infected. The number of infected cells among the susceptible cells is defined by the value of contagion function (Eq. 2) that depends on time. The contagion function has the form of two polynomials of degree 2. The bifurcation point 29,40 , 220 days that passed since the beginning of the pandemic between March 3 and October 8, 2020, is determined by the curve of infection cumulation in Fig. 2. (1) The structure of the proposed iterative model of SARSCoV-2 contagion is suitable for calculating all components / cell statuses (healthy, infected, dead, recovered and vaccinated) for individual cases in unit areas at a moment.

The random model of infection
The infection process is described by a random vector of contagion, disease spread. The coordinates of infection spread are sampled using the Monte Carlo method and a two-dimensional normal distribution (Eqs. [3][4][5][6][7][8]. For each unit area (an element of the matrix with the coordinates ), the process of sampling of the number of vectors of infection spread is repeated. Thus, each of the areas can affect itself and all other areas.
A multi-dimensional normal distribution can be described by: -a random vector created using the Monte Carlo method -the probability density function, -the matrix of covariance -the expected value vector, for the two-dimensional case.
The matrix of covariance: Final assumptions: The expected value vector was assumed a priori . This assumption results from the adopted contagion process originating in a region located symmetrically relative to the current element ( ) with an area of 1km 2 . The model assumes mobility, but a cell returns to its original place of stay within the same time unit. Consequently, only propagation of the disease takes place without changes in the density of cells per area unit. This assumption is adopted considering the avoidance of travel by cells due to the pandemic and the restrictions imposed on mobility. The assumption of limited mobility excludes cross-border contagion.
Assuming that the direction of infection spread is not distinguished, the adopted value of Pearson correlation coefficient equals 0, and hence the elements outside the main diagonal of the matrix of covariance equal zero. The values on the main diagonal were initially defined based on 41,42 and then determined using an optimization procedure. Variances, were adopted at a level of 40000 km 2 , and this in view of the restriction on mobility and the aversion to travel gives the same standard deviations in direction X and direction Y that equal 200 km.

A model of fatal cases and recoveries
The simplest available mortality model (Eq. 9) is applied in the described example. A constant, independent of time coefficient of fatal cases to infected cases is adopted. An analysis of the number of fatal cases shows that the coefficient equals 1.588% of the total number of infections in a 95% confidence interval (1.544% to 1.631%) 43 . Considering the iterative model of infection spread, the value of mortality coefficient was defined so that the total of fatal cases in the analysed time interval of 267 days equals the total of actual cases. The value adopted: 5.3494e-4. A similar method was used to estimate the number of recoveries (Eq. 10). A constant coefficient is adopted that describes the number of recoveries relative to the total of infected cases. An analysis of the number of recoveries shows that the coefficient equals 44.98% of the total number of infections in a 95% confidence interval (44.09% to 45.86%) 43 . In the model of infection spread, the value of recovery coefficient was defined so that the total of recoveries in the analysed time interval of 267 days equals the total of actual cases. The value adopted: 1.6994e-2.

Problem definition and optimization of parameters of the COVID19 spread model
Considering the large dimensions of the matrix describing the population (698x1202) and the hardly definable start values of the decision variable vector (an iterative model and the random nature of infections), the following steps were taken to define optimum values. In the first approximation, a matrix of population with smaller dimensions was assumed. In that initially limited population, optimization was performed for the periods between March 3 and October 8, 2020 (220 days) and between October 9 and November 24, 2020. The values of actual infection data were reduced in proportion to the population contained in the matrix with reduced dimensions. Assuming that the nature of population matrix described by the population density per 1km 2 only slightly differs from the entire matrix of population, the optimum values for the limited population matrix were determined as the initial values used in the proper optimization procedure. The optimum values were determined for the entire matrix of population in the following steps, using the starting points. The procedure led to determination of values for 6 decision variables of the contagion function and one parameter of the matrix of covariance. Thus, seven decision variables were defined. The procedure for optimizing the parameters of the contagion function was programmed to correspond to the sequence of actual events. The contagion process was initiated in Zielona Góra (Lubuskie province), on the first day, March 4, 2020 9 . The simulation model with the optimization and visualization module was developed in MATLAB by the author. In the optimization procedure, a criterion function (Eq.11) was adopted defined as a minimum of the total sum of squares of the difference between actual values of infection cases and computational values. A vector was sought that minimizes the value of the adopted objective function.  Table 1.

Verification of the model parameters and precision of simulation for the entire territory of Poland
The goodness of fit was determined by the value of slope (Eq.12) that in the case of complete conformity of the results of simulation with the actual data should assume the value of 1 ( Table 2). Also the extreme values for the 95% confidence interval and the value of the coefficient of determination are given.

Verification of calculation results on a regional scale
The simulation results were verified at a regional level separately for each province. The administrative criterion was adopted due to the nature of published information about the pandemic in Poland. Table 3 contains the values of slope (Eq.12) describing conformity of computational results with the actual data and the limits of confidence interval at the 95% level. Additionally, the coefficient of determination is given and simplified demographic data from 16 provinces with the population size and density per 1km 2

An analysis of simulation results
The structure of agent-based and compartmental model gives access to the results of calculation of all adopted cell statuses (labels) in unit areas, at all time intervals. Due to the high resolution (the number of population elements) of information about the population density, visualization of each analysed status is simple and immediate. The results of various computational cases are spatially depicted below.

A model of vaccination and adopted vaccination strategies
The model of vaccination (Eq.13) like the model of fatal cases and recoveries is built assuming a constant factor of proportionality. A constant, independent of time coefficient of vaccines given, defining the number of vaccinated cells relative to the number of healthy cells is adopted. In its initial assumptions, the Polish government planned to vaccinate 1 million people monthly. On average, 33 thousand people can be vaccinated daily. The value of vaccination coefficient is determined to obtain a total number of vaccinated people similar to that planned.   Table 4. Population structure in various vaccination strategies (in thousands) -forecasting An analysis of the calculation results obtained for the 3 adopted vaccination strategies (Table 4, Fig. 5) demonstrates how the structure of Poland's population changes in a 3-year pandemic time horizon. It also shows the scope of involvement in the vaccination process. Strategy I is the worstcase scenario for the population, considering such criteria as the number of deaths and infected people. Strategy I requires minimum involvement of health service personnel and resources and thus represents the cheapest solution.

Conclusion
The above description of the COVID-19 virus spread is obtained using an iterative agent-based and compartmental model with the population density as its key parameter. The contagion process is expanded by a random component based on Monte Carlo methods. Seven decision variables (model parameters) were defined, and their values calculated in an optimization procedure. The model parameters were optimized for data from the period between March 3, 2020 and November 24. 2020. The model was assessed and verified using data about contagion in the entire country and at a regional level of 16 provinces. The calculations and forecast analyses were done entirely for the territory of Poland, considering 3 vaccination strategies. The model represents an expanded classical SIR epidemic model. It may be described using the acronym SIR-V (Susceptible, Infected, Recovered -Vaccinated). The model is dynamic and does not include time delays. The model incorporates the properties of real world such as population distribution and describes a two-dimensional network of unit areas wherein cells by their mobility characterized in a random manner are capable of infecting cells in other unit areas. The proposed model is based on transmission of the disease described by a contagion function. The structure of the model is capable of grasping the effects of real measures, such as restrictions imposed or lifted bans, by defining a suitable form of the contagion function. The model assumes mobility, but a cell returns to its original place of stay within the same time unit. Consequently, only propagation of the disease takes place without changes in the density of cells per area unit. The adopted contagion function is independent of the coordinates of element in the population matrix. Even with this simplification, the results of modelling are confirmed both on a national and regional scale. The results of model verification on a national scale demonstrate that computational results are consistent with real data at a level of coefficient of determination R 2 between 0.924 and 0.996. The value of coefficient of determination for regional data is contained between 0.910 (zachodniopomorskie) and 0.995 (opolskie). The iterative model with the proposed, unique for the territory of Poland, forms of contagion, recovery and mortality functions, based on population density data, represents a flexible approach that can be used in any country or territory. The model is limited in that it ignores time delays in the process of infection, recovery, and acquisition of immunity due to contact with the virus or vaccination. If the relationship between delay times and the simulation period is negligible, the delays have no material effect on the model usefulness.
The structure of the model can be expanded to incorporate delays characteristic of various model components. Conformity of computational results with the real data on a regional scale confirms the material effect of the population density on the virus spread. A change in the form of criterion function from the national into the regional level (with the normalization procedure applied) can improve conformity of calculation results with reality. Another step includes transition from a "national"contagion function to "regional" contagion functions with contagion process expansion by regionally defined random processes. These notes also indicate the directions of future development work.