From a single host to global spread. The global mobility based modelling of the COVID-19 pandemic implies higher infection and lower detection rates than current estimates.

Background: Since the outbreak of the COVID-19 pandemic, multiple efforts of modelling of the geo-temporal transmissibility of the virus have been undertaken, but none succeeded in describing the pandemic at the global level. We propose a set of parameters for the first COVID-19 Global Epidemic and Mobility Model (GLEaM). The simulation starting with just a single pre-symptomatic, yet infectious, case in Wuhan, China, results in an accurate prediction of the number of diagnosed cases after 125 days in multiple countries across three continents. Methods: We have built a modified SIR model and parameterized it analytically, according to the literature and by fitting the missing parameters to the observed dynamics of the virus spread. We compared our results with the number of diagnosed cases in sixeight selected countries which provide reliable statistics but differ substantially in terms of strength and speed of undertaken precautions. The obtained 95% confidence intervals for the predictions fit well to the empirical data. Findings: The parameters that successfully model the pandemic are: the basic reproduction number R0, ~4.4; a latent non-infectious period of 1.1. days followed by 4.6 days of the presymptomatic infectious period; the probability of developing severe symptoms, 0.01; the probability of being diagnosed when presenting severe symptoms of 0.6; the probability of diagnosis for cases with mild symptoms or asymptomatic, 0.001. Also, the higher the testing rate per country, the lower the discrepancy between data (diagnosed cases) and model. Interpretation: Parameters that successfully reproduce the observed number of cases indicate that both R0 and the prevalence of the virus might be underestimated. This is in concordance with the newest research on undocumented COVID-19 cases. Consequently, the actual mortality rate is putatively lower than estimated. Confirmation of the pandemic characteristic by further refinement of the model and screening tests is crucial for developing an effective strategy for the global epidemiological crisis.

From a single host to global spread. The global mobility based modelling of the COVID-19 pandemic implies higher infection and lower detection rates than current estimates. Since the outbreak of the COVID-19 pandemic, multiple efforts of modelling of the geo-temporal transmissibility of the virus have been undertaken, but none succeeded in describing the pandemic at the global level. We propose a set of parameters for the first COVID-19 Global Epidemic and Mobility Model (GLEaM). The simulation starting with just a single pre-symptomatic, yet infectious, case in Wuhan, China, results in an accurate prediction of the number of diagnosed cases after 125 days in multiple countries across three continents.

Methods
We have built a modified SIR metapopulation transmission model and parameterized it analytically, according to the literature and by fitting the missing parameters to the observed dynamics of the virus spread. We compared our results with the number of diagnosed cases in sixteen selected countries which provide reliable statistics but differ substantially in terms of strength and speed of undertaken precautions. The obtained 95% confidence intervals for the predictions fit well to the empirical data.

Findings
The parameters that successfully model the pandemic are: the basic reproduction number R 0 ,~4·4; a latent non-infectious period of 1·1. days followed by 4·6 days of the presymptomatic infectious period; the probability of developing severe symptoms, 0·01; the probability of being diagnosed when presenting severe symptoms of 0·6; the probability of diagnosis for cases with mild symptoms or asymptomatic, Introduction A novel coronavirus SARS-CoV-2 has already spread into 186 countries and territories around the world (as of 21 March 2020). With over 250 thousand confirmed infections and over 10 thousand deaths, it became a global challenge. COVID-19, the disease caused by this coronavirus, was characterised as a pandemic by WHO on 11th of March 2020.
While a number of different measures to contain the virus have been implemented by countries all over the world, their effectiveness remains to be seen. The models used to inform decision-makers are differing significantly in their basic assumptions because it is the first coronavirus of such an impact in terms of the number of cases. Also the existing modelling approaches often use biased data for tuning parameters or assessing models quality. Until an effective treatment is available, the accuracy of these models and the decisions made on their basis are the major factors in reducing overall mortality in the COVID-19 pandemic.
In this study, we present putatively the first global model of SARS-CoV-2 spread, that within confidence intervals accurately depicts the current state of diagnosed cases of COVID-19 for multiple countries at once. Implications on the transmissibility and policymaking are also discussed.

Research in context
Evidence before this study Multiple efforts of calculating the transmissibility of the SRAS-Cov-2 virus and its geo-temporal modelling have been undertaken, but none of the models succeeded to describe the pandemic at the global level. For those models the estimates of the basic reproduction number of the virus were typically obtained using only Chinese data on the number of diagnosed cases. Additionally the actual prevalence of the virus remains unknown, as many infections are mild, asymptomatic or with atypical symptoms. In fact, many COVID-19 cases pass unnoticed (in China, over 50% according to the research). This hampers successful modelling of the pandemic.

Added value of this study
This study presents the first global modelling of COVID-19 pandemic that builds on top of successful modelling framework GLEAM. The basic reproduction number for SARS-CoV-2 used in the simulation is 4 · 4. It is higher than the value proposed by WHO, but best-fits the observed number of diagnosed cases over 125 days in multiple countries around the globe. Our analysis also provides the estimation of the global rate of total diagnosed to undiagnosed cases of 0 · 0061. The set of parameters used in our simulation form a solid foundation for further modelling of the pandemic.

Implications of all available evidence
Our model implies that the current consensus on the basic reproduction number of SARS-CoV-2 and its prevalence are misestimated. The overall global data on the pandemic dynamics seems strongly biased by large regions where official statistics may not reflect accurately the actual state of the epidemic, and by the fact that many COVID-19 cases may go unnoticed. The basic reproduction rate of the virus should be confirmed on the basis of reliable data, and its prevalence determined by conducting properly designed screening tests. Our model, if confirmed, could be used as a tool for forecasting and optimizing non-drug interventions and policymaking.

Modelling software
The model is based on The Global Epidemic and Mobility Model (GLEaM) framework 1 , used through GLEAMviz software 2 . GLEaM model integrates sociodemographic and population mobility data in a spatially structured stochastic disease approach to simulate the spread of epidemics at the worldwide scale. It was previously used for real-time numerical forecast of global spreading of A/H1N1 3 , and the accuracy of this modelling was later confirmed 3 .

Data sources
The reference data about the number of SARS-CoV-2 diagnosed patients in the period from Jan 22, 2020, to Mar 16, 2020, was downloaded from the Johns Hopkins University of Medicine Coronavirus Resource Center GitHub repository https://github.com/CSSEGISandData/COVID-19 .
Other data sources, such as subpopulation selection, commuting patterns, or air travel flows used during simulation are embedded in the GLEAM software and well described by its developers.

Model parametrization
Below and in ( Table 1 ) we present two subsets of model parameters: 1) reliable and evidence-based derived from literature, and 2) knowledge and analysis-based estimations.
The average latency period ( lp ) of 5·6 days is a consensus of different estimations calculated previously 4 .
Due to 1) long lp , effectively much longer than reported for other coronaviruses, and 2) known cases of presymptomatic transmission 5,6 , for the modelling purposes we decided to split the latency period into two parts: 1) average latent non-infectious period ( lnip ) of 1·1 days (based on the time of infectivity for other viruses 7 ) and 2) average presymptomatic infectious period ( pip ) of 4·5 days. This split produces two parameters used in the model: 1) latency rate for the non-infectious period -non-infectious epsilon ( niε ): , iε 1/lnip n = and 2) latency rate for the infectious period -latency rate infectious epsilon ( iε ): .
As the Republic of Korea provides high quality, reliable data and conducted a large number of tests during the pandemic, we decided to use Korean proportion of severe to diagnosed cases as a base for the probability of developing the severe condition ( pS ) and we set it to 0.01 . We assumed that patients with mild symptoms, in contrast to those in severe condition, are still capable of travelling. For model simplicity, we decided to merge into one compartment mild and asymptomatic cases.
We decided to set the probability of detection of severe infection ( pDS ) to 0.6 , in order to accurately mimic two obstacles typically preventing proper diagnosis. Firstly, the majority of patients with a severe course of the infection are either chronically ill or above 60 8 -their symptoms might be mistaken with those caused by their general health condition and not reported on time. Secondly, the model is supposed to reflect the average illness detection around the globe which includes many countries with low quality or underfinanced healthcare.
Another parameter of the model, pDM is the probability of being diagnosed with COVID-19 when expressing either mild symptoms or an asymptotic illness course. This parameter depends on previously defined pS and pDS , as well as the rate of total diagnosed to undiagnosed cases ( tDR ): Knowing the limitations of previous modelling attempts 9-15 , we decided to test a radically different COVID-19 epidemiologic paradigm, i.e. to significantly lower tDR . This means that in our model we assume a higher proportion of undetected cases in comparison to other models proposed so far. Taking into account that none of them was capable of providing a plausible global simulation of the pandemic, plus the fact that the potential low detectability has already been discussed in the literature 16 , we . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint .
decided to test such a possibility in simulation by setting the lowest possible tDR . Its relation to pDM limits its minimum to: For previously set pS and pDS values, tDR must be greater than 0.006, thus the value used in our simulation was set to 0.0061.
Another important and deeply interconnected parameters required by the model are as follows: the effective contact rate, β ; its reduction level for patients who developed severe symptoms of the disease but were not diagnosed, rβ ; and average recovery time since symptoms development μ .
The parameter β is derived from the time a host remains infectious, d , and the basic reproduction number of the virus, R 0 : The estimation of R 0 is a topic widely discussed in the literature, with values ranging from 1·4 to 6·49 [17][18][19][20][21][22] . However, following the assumption of much higher than the currently assumed rate of undiagnosed and asymptomatic cases, we decided to use in our model a higher rate of transmissibility, yet well within the range of 2-5, modelled for SARS 7 . The assumed R 0 value leading to presented results is 4·4.
In our study μ is derived from a safe quarantine period for diagnosed cases 6 . As the safe quarantine time is estimated to be 10 days 6 , we assumed μ to last on average for 7 days from symptom development to recovery. The addition of μ with previously estimated pip (presymptomatic infectious period) results in d equal to 11·5 days , and parameter β equal to 0.38261 .
We decided to set rβ to 0.5, following the assumption for this parameter used in GLEaM modelling of 2009 influenza outbreak 1 . Patients who were diagnosed with the SARS-CoV-2 infection are assumed isolated and as such not spreading the disease any further.

Model compartmentalisation
To model the virus spread, we modified the compartmental SIR metapopulation transmission model to represent the nature of the COVID-19 epidemic.
In our model, we use seven different population compartments ( Figure 1 ).
1. Susceptible population -equal to the general global population. We assume no existing immunity to infection. 2. Latent non-infectious -infected population in the first incubation stage, not yet infectious. 3. Presymptomatic infectious -infected population already infectious, but without developed symptoms. 4. Mild symptoms -joint populations of asymptomatic cases and those with inconspicuous symptoms.
5. Severe symptoms -population of cases with symptoms affecting their travel ability. 6. Diagnosed -population identified as infected with SARS-CoV-2 virus. This is the reference line for the model accuracy. 7. Recovered -joint populations of recovered and fatal cases.
The prepared model served as an input for 10 runs (a maximum available in free tier) of GLEaM Monte Carlo analysis based on human mobility, integrating population and two (local and air) mobility layers.

Results
The simulation was started on Nov 12, 2019, with a single presymptomatic individual located in Wuhan, China, and the development of the pandemic spread was modelled for 125 days. The model does not have any information on already implemented movement restrictions and preventive measures undertaken by different governments. As overall data on the pandemic dynamics around the globe is likely to be biased by regions, often considerable in size and population, for which official statistics might be inaccurate, we decided not to compare overall model results with global data, but to limit the analysis of modelling results to sixteen countries across four continents (see Table 2 ) which are, in our belief: a) divergent in the proportion of the tested population, quality of healthcare, and strength of undertaken preventing measures; b) likely to provide the public with real data; c) reporting number of cases high enough to assume their population exchange with the rest of the world did not significantly change the pandemic dynamics.
The obtained 95% confidence intervals of predicted numbers of diagnosed patients were compared with empirical data. In Figure 2 we present a percentage difference over time between the number of reported confirmed cases and confidence intervals limits for modelled predictions. Positive values state that the model overestimates the number of diagnosed cases, negative values indicate the underestimations of the model. Observed numbers of cases that are within the model CIs are equal to 0. For selected countries the model predictions fit well to the observed data.
A notable spread of the model accuracy between the countries is negatively proportional to the number of tests performed per million citizens reported as of March 9, 2020 (see Figure 3 ). Spearman correlation coefficient calculated for the number of performed tests and average percentage difference between modelled and reported numbers of diagnosed cases is -0 · 778 (95% confidence interval -0 · 945 : -0 · 291). This agrees well with intuition: the more tests are carried in the country, the larger becomes its local tDR and the model starts underestimating the number of detected cases, and vice versa .

Figures 4 -19 confront the number of actual confirmed COVID-19 cases with confidence intervals for the modelled number of diagnosed cases.
Some countries present epidemic dynamics different from the model, e.g. The Republic of Korea, Japan, or Italy, however, the direction of these deviations may be explained by the measures undertaken by their governments or societal response. We believe that modelling efforts including manual, country level parameter modifications depending on specific events, intervention actions, and society response to the danger would greatly improve accuracy of the model, but it is outside of the scope of this work.

Discussion
The presented model has multiple implications concerning the major characteristics of the COVID-19 pandemic, such as the basic reproduction number of the virus R 0 (higher than previously assumed, yet not above the values estimated for other coronaviruses), and the rate of diagnosed cases tDR ( much lower than assumed so far, especially for cases expressing mild symptoms and asymptomatic). This would indicate that the vast majority of the COVID-19 infections are so mild that they pass unnoticed. This is not implausible, considering the fact that there are 1 · 9 billion children aged below 15 years in the world (27% of the global population) and predominantly (ca. 90%) the course of their infections is mild or asymptomatic 23 . Additionally, they gather in large groups at schools on a regular basis which facilitates further disease transmission. Also, some COVID-19 cases may show atypical symptoms (e.g. diarrhoea) 24 which hinders correct diagnosis. Taking all this into account, plus the results of our model, one may risk a hypothesis that the virus is more prevalent in the global population than shown in official statistics at the moment, and consequently, its mortality rate is much lower.
To verify this hypothesis further actions are required. At first, the model should be simulated with a larger number of iterations, which will narrow obtained confidence intervals and allow further refinement of the parameters. Also, a simulation with the tDR parameter increasing over time or geographically diverse might better reflect the actual virus detectability in the course of the pandemic. Finally, the real spread of the virus should be assessed empirically by conducting a sufficient number of tests on fully random samples (currently most tests are limited to individuals with strong and typical symptoms). Only after obtaining a solid measurement of the actual prevalence of the virus, one might draw conclusions about its true mortality rate.
We emphasize that our conclusions are a hypothesis based on a single computational model and without empirical verification, they may serve as a platform for further research. At this stage, by no means should they be used as a reason for governmental decisions on lifting the precautions. Even if the true mortality of the virus is indeed lower than announced by the media, many people remain in the high-risk group. Lack of population resistance facilitates their contact with the virus and may lead to a rapid increase of severe cases in a short period of time (as seen in Italy) leading to the collapse of the healthcare system, which affects the entire society and results in many additional deaths not related to the virus itself. Careful use and tuning of non-drug intervention methods, constant balancing of the disease spread and healthcare capacity, protecting the most vulnerable individuals, farsighted anticipation and agility in decision making may altogether be able to minimize the number of deaths without resulting in the global economic breakdown.

Declaration of interests
We declare no competing interests.  Table 1 : Summary of all the parameters used in the deployed model.

Tables and figures
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint   Transition from symptomatic groups occurs at μ rate. Individuals who developed severe symptoms do not travel within and between modelled subpopulations and may be either diagnosed with probability pDS , or to recover with probability of 1-pDS . Individuals whose mild (or non-existent) symptoms are not stopping them from traveling may be diagnosed with probability pDM or to recover with probability (1-pDM) . The diagnosed individuals are considered isolated and effectively non-contagious and recover with rate μ . The recovery does not discriminate between true recovery and fatal cases .
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03. 21.20040444 doi: medRxiv preprint Figure 19 : An overlay of modelled confidence interval for diagnosed cases and reported values in Vietnam.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.