The influence of passenger air traffic on the spread of COVID-19 in the world

Countries in the world are suffering from COVID-19 and would like to control it. Thus, some authorities voted for new policies and even stopped passenger air traffic. Those decisions were not uniform, and this study focuses on how passenger air traffic might influence the spread of COVID-19 in the world. We used data sets of cases from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University and air transport (passengers carried) from the World Bank. Besides, we computed Poisson, QuasiPoisson, Negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models with cross-validation to make sure that our findings are robust. Actually, when passenger air traffic increases by one unit, the number of cases increases by one new infection.


Introduction
COVID-19 is a deadly virus that is at the base of a pandemic in the world since December 2019. Actually, in the beginning, 27 infections have been detected in China from Wuhan City (Ayittey et al., 2020). On 30/01/2020, the World Health Organization (WHO) declared COVID-19 as a pandemic and alarmed about the public health emergency (World Health Organization et al., 2020). This situation speeds up many countries to step into the research on how that virus can spread and be controlled. Experts also found that the virus could be transmitted person-to-person (Chan et al., 2020). Consequently, face mask wearing and social distancing (Alice et al., n.d.), home quarantine and confinement and sanitary cordon (Pan et al., 2020), contact tracing (Hellewell et al., 2020), travel restrictions (Chinazzi et al., 2020), and many other decisions have been set to decrease and control the spread of the pandemic.
The spread of the first COVID-19 cases in the world has been imported from other countries Hellewell et al., 2020;Yuan et al., 2020). Besides, it has been found that human mobility can influence significantly the COVID-19 epidemic in China (Kraemer et al., 2020;Zhao et al., 2020). Especially, we think that the influence of means of transportation on the spread of COVID-19 in the world is quite possible. At the beginning of the pandemic, the airport of Wuhan did not stop any passenger flights (Bogoch et al., 2020b;Wilson and Chen, 2020). Taking into account the global passenger air traffic network, it is easy to discover how unknown contagious diseases can be spread in the world (Wilson and Chen, 2020). Actually, air transport contributed to the spread of many diseases in the world (Tatem et al., 2006;Colizza et al., 2006;MacFadden et al., 2015). Moreover, some researchers also discovered that airline travel could influence the spread of viruses such as influenza (Grais et al., 2003;Brownstein et al., 2006;Hosseini et al., 2010), Ebola , Zika (Bogoch et al., 2016), and Dengue (Tian et al., 2017). All those works support our hypothesis to think that the number of passengers from air traffic is related to the spread of the pandemic. Even while China was overcoming the peak, new infections were noted in other regions of the world with significant increases . Besides, there is also a recent similar work (Lau et al., 2020) that focused on the association between air traffic and the coronavirus outbreak,but the methods used were just about linear correlation. Many infected cases were imported from China to the other continents (Kim et al., 2020;Rodriguez-Morales et al., 2020). And we should recall that numerous countries did not strictly stop passenger air traffic while the virus was still infecting people.
Considering the aforementioned points, we also notice that there is no scientific body of knowledge (with robust modeling process and crossvalidation) about the plausible influence of air transport on the spread of COVID-19. Besides, we also hypothesized in this study that passenger air traffic has a significant influence on the number of COVID-19 infections in the world. This results of this paper are going to be technical adding information with strong pieces of evidence to control the spread of the pandemic in the world.

Sample description
In this work, we used a data set of two variables such as aggregated global cases (GC) and worldwide recent data about passenger air traffic (PAT). The variables were accessed on 12/03/2020 and are from the Center for Systems Science and Engineering (CSSE) of Johns Hopkins University (JHU) (https://github.com/CSSEGISandData/COVID-19) and the latter has been downloaded from (World Bank, 2018). GC is a count of the total confirmed cases in a country per day, and PAT count of the total passenger air traffic in 2018 (it is the very recent data set available) per country. The data cover the period from 23/01/2020 to 13/07/2020.
The quality and accuracy of WHO data sets about COVID-19 are based on the developed technical guidance by topic (World Health Organization, 2020a) related to the virus. Every country is to respect all those processes, otherwise their data provided are not validated. Moreover, a confirmed case is a person that got laboratory confirmation of COVID-19 infection, whatever the symptoms and clinical signs. The data sets about passenger air traffic are international and domestic aircraft passengers registered in countries and aggregated by the World Bank.

Methodology
The goal is to propose a model that uses predictors to approach the dependent variable and check whether there is significant influence or not. The variable of interest in this work is a counting variable (with zeros at some dates), that is, the total of confirmed cases GC, denoted by (y i ). Actually, to model counting variables, researchers use many approaches such as Poisson model (PM), quasi-Poisson model (QPM), Negative binomial model (NBM), zero-inflated models (ZIM) (Lambert, 1992), and Hurdle models (HM) (Mullahy, 1986). HM and ZIM both provide computation of a Bernoulli and Poisson probabilities and give flexibility to model a zero outcome. In addition, we are suspecting overdispersion because at the beginning of the pandemic there was great variability (disproportion among countries) in terms of total cases. Consequently, in the case of the current work, we will compute all the models without Hurdle ones because it is similar to the ZIM. In addition, in the context of COVID-19, we think that the zero case means the absence of the pandemic or the non detection in a given country. With Hurdle models, one of the hypotheses stipulates that zeros should be from only one point, and this does not fit our study.

Poisson and quasi-Poisson regression
In Poisson regression model, the variance is assumed to be equal to the mean. Besides, in quasi-Poisson model, the variance is hypothesized to be the dispersion parameter times the mean (i.e., the variance is proportional to the mean). Be N i , X i , μ i that are, respectively, the number of total cases, the passenger air traffic, and the mean number of total cases in a given country i, we have: With that general modeling, we then have the following specificities: Such that θ is the dispersion parameter, α is the intercept, β is the influence of X on N, and b]ε i represents the error term. Let us note that PM is QPM with θ=1. When θ > 1, we state that there is over-dispersion relative to Poisson.

Negative binomial regression
We use the negative binomial regression model when we suspect overdispersion in the modeling. Actually, it is a generalization of PM that uses the assumption that the variance is a quadratic function of the mean. In other words, when the parameter λ of the PM follows the gamma distribution of parameter (r, ρ), we use NBM meaning a Poisson-gamma mixture. Thus, we have: Pr Such that Γ is the gamma function.

Zero-inflated regressions
In this work, we can notice that there is an excess of zeros cases in the beginning of the pandemic. It is a type of regression that researchers use to handle zero-inflated count data. ZIM use the mixing of a process (logistic regression in this study) that generates zeros and another one (PM, QPM, NBM) that models counts.
With π i the probability to get zero in a country i, we have the following possible models:

Analysis process
The checking of the influence of PAT on the spread of COVID-19 is a very crucial point that requires a very strong method before any final conclusion. Then, taking into account that we have daily case count per country and only a data of PAT representing the air traffic per country, we decided to check the significance influence of PAT on GC with every day of the data set. It means that we computed to check our findings 174 times for each model. In addition, to be sure about the goodness of fit of our models, we implemented cross-validation method. The summary of the process is on Fig. 1.

Preliminary checking
The variable of interest is the number of cases per country in the world. The daily evolution of that pandemic is on Fig. 2.
The hypothesis in this work is that Air traffic has influenced the spread of COVID-19. Thus, we computed and illustrated the Pearson correlation between those two variables (daily cases and Passenger Air Traffic) on the study period (from 22/01/2020 to 13/07/2020). It is illustrated on Fig. 3.
The minimum value as Pearson correlation coefficient is 0.05 (from 01/ 03/2020 to 04/03/2020) and the maximum 0.94 (from 07/05/2020 to 25/ 05/2020) with a mean of 0.66. These results motivated us to deepen our analysis with the models for better understanding of the plausible daily influence of air traffic on COVID-19 spread.

Modeling
Let us recall that from 22/01/2020 to 09/04/2020 we have zeros cases in some countries. After that date, every country in our data set was with at least one case. These situations of many zeros (128 on 22/01/2020 and 1 on 09/04/2020) motivate us also to use zero-inflated models on the period with zeros. To interpret any model result, we did: • Firstly, after the initial computation of the models, we have noticed that PM, QPM, and NBM regressions converge (i.e. the algorithm gets the coefficient estimates such that the values become approximately the same with iteration), however, ZIP and ZINB did not (only a part of ZIP from 24/03/2020 to 09/04/2020 converged). According to our main objective, that is, to check the influence of air traffic on the spread of COVID-19, consequently, we cannot consider a model that does not converge because it does not allow to output the p-value to interpret the influence. • Secondly, we noticed that during the study period, all the four retained models (PM, QPM, and NBM about every day and ZIP only from 24/ 03/2020 to 09/04/2020) have p-values that are all smaller than 0.05 except QPM from 22/02/2020 to 17/03/2020. All the coefficients are near zero and consequently their exponential gives one. It means that our finding is robust whatever the model. When the air traffic increases of one unit, the number of cases in a country increases of one too.
Though all the retained models (with significant p-value) give the same results, we can however have an idea on the variability among coefficients per model on Fig. 4.
Actually, we can notice that in the beginning, NBM tends to estimate higher coefficients and it is a result of higher influence when we use that model. After getting estimates with the train data sets, we computed the forecasts with the test data and with the root mean square error (RMSE), we output the best model. Moreover, in Table 1, the column of best models considering p-values is the one taking into account the process to leave firstly models that did not converge before any choice. Meanwhile, the column of best models without considering p-values focuses on only best fitting without firstly checking any p-values but only the smallest RMSE in general after cross-validation. Especially, from Table 1, we can notice that regardless of p-values, the best models when we have an excess of zero are the zero-inflated ones. And this is helpful because we were expecting the ZIM for the period with excess of zeros and due to convergence we did not choose them. In other words, if there was convergence, the best models for the period with zeros should be the ZIM.

Discussion
The principal target of this study is to check whether passenger air traffic had a significant influence on the number of COVID-19 infections with a robust process. In the results, we noticed that the higher passenger air traffic, the higher the number of infections. Our finding is quite similar to the one in the recent work (Lau et al., 2020) related to air traffic and the pandemic. Wuhan City is the town in which COVID-19 was firstly mentioned (Ayittey et al., 2020) and many other countries' first cases were imported. A recent study (Bogoch et al., 2020a) from Wuhan in China, found that commercial air travel can influence the international dissemination of pneumonia of unknown aetiology. Even in the following work (Wilson and Chen, 2020), the authors mentioned that travellers can give wings to COVID-19 too. In the beginning of the pandemic, all the countries did not rigorously stop passenger air traffic, and it explains the finding of this work because it allowed imported transmissions to trigger the local transmission. Many works (Grais et al., 2003;Brownstein et al., 2006;Hosseini et al., 2010;Bogoch et al., 2015Bogoch et al., , 2016Tian et al., 2017) have also mentioned that air traffic can facilitate a virus spread as in this study. When we also take into account spatial parameters with passenger mobility, we can understand that when a passenger traveled to a country, he is able to infect or to be infected. And once back to his country, the incidence of COVID-19 increases.
In addition, the spatial concerns in this work are considered thanks to each country of the data set. Furthermore, the conclusion in this work is from the comparison and variability among all the countries affected by the pandemic. For instance, though Africa is predicted to be significantly affected by the pandemic (Lone and Ahmad, 2020), according to our data, Africa is with the smallest passenger air traffic, and this also explains the small number of cases compared to other regions. Actually, this work outputs an explanation of how passenger air traffic contributes to the relative number of infections in each country. To control the pandemic, authorities should strictly stop Passenger Air Traffic to their countries, excepted emergencies. Once internal transmissions have been controlled, they can progressively open their air traffic and highly strengthen the screening tests for passengers. Passenger air traffic should be considered when there are contagious public health issues. Additionally, this work is the first one that computed 5 different models on 174 days with cross-validation to check how robust is the finding of Passenger Air Traffic on the spread of COVID-19 and proposed an appropriate model. This work is quite important for researchers in the domains of applied statistics, transportation, and public health.
However, only QPM does not confirm the influence of air traffic on the number of cases of COVID-19 from 22/02/2020 to 17/03/2020. We can recall that WHO declared COVID-19 outbreak as a pandemic on 11/03/2020 (World Health Organization, 2020b). Consequently, the absence of influence in that period can be explained by the fact that countries were facing the pandemic crisis and might change significantly their daily air traffic. Even on Fig. 3, we can notice that it is in the same period there are smaller correlation coefficients between the number of cases and air traffic passengers. This study has some limitations that are mostly related to the cross-sectional PAT data set. We think that daily time series data of PAT and COVID-19 cases will give stronger pieces of evidence. We also lack information about means of transportation of COVID-19 imported cases. Consequently, a research of those information (daily PAT time series and imported cases' means of transportation) will provide a deeper insight for the current work.

Declaration of competing interest
None.  Innocent Koovi, an independent researcher, who helped us in the proofreading of this paper.