The global transmission of new coronavirus variants

Coronavirus Disease 2019 (COVID-19) has caused tremendous losses to the world. This study addresses the impact and diffusion of the five major new coronavirus variants namely Alpha, Beta, Gamma, Eta, and Delta lineage. The results of this study indicate that Africa and Europe will be affected by new coronavirus variants the most compared with other continents. The comparative analysis indicates that vaccination can contain the spread of the virus in most of the continent, and non-pharmaceutical interventions (NPIs), such as restriction on gatherings and close public transport, will effectively curb the pandemic, especially in densely populated continents. According to our Global Prediction System of COVID-19 Pandemic, the diffusion of delta lineage in the US shows seasonal oscillation characteristics, and the first wave will occur in October 2021, with the record of 323,360, and followed by a small resurgence in April 2022, with the record of 184,196, while the second wave will reach to 232,622 cases in October 2022. Our study will raise the awareness of new coronavirus variants among the public, and will help the governments make appropriate directives to cope with the new coronavirus variants.


Introduction
COVID-19 has been transmitted to 192 countries and regions. Unlike the severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS), which have relatively high fatality rate (Chen et al., 2020a, b;Lonergan and Chalmers, 2020), COVID-19 has a high infection rate but less lethal. Early research showed that SARS-CoV-2 has two major lineages, namely L and S (Tang et al., 2020). As a single-stranded RNA virus (V'kovski et al., 2020), SARS-CoV-2 has developed many mutations. Most of the mutations will not have severe consequences of the spread and mortality rate of the virus (Chen et al., 2020a, b). However, there are several mutants that caused global concerns. Hence, it is important to have better and clear ideas on the diffusion of the new coronavirus variants and the effective methods to contain the spread of the pandemic.
There are five major lineages have been identified and spread across the world so far. The earliest one named Alpha (α) was identified in the UK in September 2020, and Beta (β) and Gamma (γ) were identified in South Africa and Brazil in September 2020 and October 2020, respectively . The Eta (η) and Delta (δ) lineages were both identified in November 2020 in the UK and India, respectively.
Compared to the original lineage, the new coronavirus variants show a higher transmissibility and resistance to antibody (Greaney et al., 2021;Volz et al., 2021a, b), for instance, the percentage of people that infected with alpha lineage in the UK had raised from 0.1% in early October to 49.7% in late November (Leung et al., 2021). At the early stage of COVID-19 pandemic, the reproduction number in Italy, Spain, France and Germany were all larger than two (Yuan et al., 2020), and caused mass outbreaks in these countries. Nevertheless, Alpha lineage can increase the reproduction number , and it also has a higher transmission rate, which is 71% higher than original lineage (Tang et al., 2021), and 95% of the COVID-19 infections in the England would be infected by this new coronavirus variant by the 15th February 2021 (Davies et al., 2021). The alpha lineage has spread to other countries fast, and research implied that it might become the dominant lineage in Japan in a short time (Murayama et al., 2021). Beta lineage, on the other hand, may have a good ability to infect immunized people (Planas et al., 2021). Gamma lineage was found to have a strong resistance to neutralization antibodies . The eta and Delta lineages have the same mutations with Beta and Alpha lineages, therefore, they also exhibit higher transmission speed and the resistance ability to vaccines. As the RNA virus can replicate itself inside host cells and many mutations will also be produced during the replication process, therefore, with more people infected, the chance for new variants to appear will become higher. A fast and in-time surveillance should be promoted to adapt the fast mutation of SARS-CoV-2, so that the negative impact of this pandemic can be minimized (Liu et al., 2021a, b).
The new coronavirus variants have been spread to many other countries from their originates. Alpha lineage has been detected in at east 114 countries (Davies et al., 2021), and the rest of the variants are also spreading to many countries. Europe is the most affected area in the world due to the open border law, which provides a good opportunity for the spread of the virus (Pillai et al., 2020). Previous studies pointed out that human mobility and the number of overseas travelers were positively correlated with the number of the infections (Chinazzi et al., 2020). Importation and exportation risks are considered to be important factors on transmitting the virus during the pandemic (Nakamura and Managi, 2020). Research found that 76% of the Brazilian strains were imported from Europe in the early 2020 (Candido et al., 2020). Other than human mobility, some environmental and atmospheric factors can also affect the diffusion of COVID-19. Some research studies found that high level of air pollution, such as PM 10 and ozone, and low wind speed would increase the number of infected cases (Coccia, 2020b(Coccia, , 2020c(Coccia, , 2021a(Coccia, , 2021c. Additionally, air pollutants can increase the lethality of COVID-19, and postpone the recovery time (Domingo et al., 2020). With the development of vaccines, it seems that there is a great opportunity that the pandemic can be mitigate effectively. However, the hesitancy of vaccination makes the situation become uncertain, and it has become the main obstacle of vaccine uptake (Schaffer DeRoo et al., 2020). There are various reasons that lead to hesitancy of vaccination, such as ethnicity and party identification of people (Viswanath et al., 2021), not enough trust on vaccine safety (Syed Alwi et al., 2021), misleading by fake information online (Kanyike et al., 2021), lack of reliable information resources on vaccines (Murphy et al., 2020); immature strategy of vaccine distribution (Freed, 2021). To accelerate the promotion of vaccines, the government should establish a good communication with each communities, for example, emphasizing the importance of achieving herd immunity through vaccination (Schwarzinger et al., 2021), and develop corresponding strategies to increase publics' trust on vaccines (Elhadi et al., 2021;Seale et al., 2021;Echoru et al., 2021). The optimal allocation of vaccines is also helpful in promoting the vaccine campaign, for instance, the health care stuff should be provided with the vaccines in the initial phase (Dooling et al., 2020).
In this study, we demonstrated and predicted the impact of new coronavirus variants, and discussed the effectiveness of vaccines and NPIs in containing the pandemic. This study will raise the awareness of hazardness of the new coronavirus variants among public, and help policy makers to implement effective measures on containing the pandemic.

Sample and data
The new coronavirus variants sequences and total sequenced number are provided by https://cov-lineages.org (O'Toole et al., 2021). They are sampled from all the countries that have identified the any new coronavirus variants. The new variants sequences data is updated on a daily basis, due to some reasons, the website missed a few days data, thus, we used linear interpolation imputation method to impute missing data, and they were collected as of 23rd June 2021. The inverse probability weighting technique was employed to weigh the number of new coronavirus cases in each country to make the results less biased. The calculation formulas are: P testi/sequencedi = n testi/sequencedi N total (where n is the number of tested/sequenced people in each continent, and N is the total number of tested/sequenced people in five continents. i = Europe, Asia, Africa, Americas, Oceania); w testi/sequencedi = 1 P testi/sequencedi ; n weighted = w testi/sequencedi ⋅n real . The flight data was obtained from https://zenodo. org (Strohmeier et al., 2021). The daily new COVID-19 cases, test capacity, vaccination rate, and non-pharmaceutical interventions data, including vaccination policy, restrictions on internal movements, restrictions on gatherings, and close public transport indexes, are downloaded from https://ourworldindata.org (Ritchie et al., 2021). These data are sampled country-wise, but with data of some countries are not available. The COVID-19 data was downloaded from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Dong et al., 2020).

Measures of variables
The non-pharmaceutical interventions are quantified with the index from zero to five. Vaccination policy: 0-No availability, 1-Availability for one of the following: key workers/clinically vulnerable groups/ elderly groups, 2-Availability for two of the following: key workers/ clinically vulnerable groups/elderly groups, 3-Availability for all of the following: key workers/clinically vulnerable groups/elderly groups, 4-Availability for all three plus partial additional availability (select broad groups/ages, 5-Universal availability). Restrictions on internal movements: 0-No measures, 1-Recommend movement restriction, 2-Require closing (or work from home) for some sectors or categories of workers, 3-Restrict movement. Restrictions on gatherings: 0-No restrictions, 1-Restrictions on very large gatherings (the limit is above 1000 people), 2-Restrictions on gatherings between 100 and 1000 people, 3-Restrictions on gatherings between 10 and 100 people, 4-Restrictions on gatherings of less than 10 people. Close public transport: 0-No contact tracing, 1-Recommend closing (for significantly reduce volume/route/means of transport available), 2-Require closing (or prohibit most citizens from using it).

Model and data analysis procedure
All the statistical analyses were performed in R software (version April 1, 1103). Violin plots were employed to illustrate the distribution of new variants cases in different continents, and the skewness, dispersion, and outlier were analyzed in detail in results part. Analysis of variance (ANOVA) and Tukey's Honestly Significant Difference (Tukey-HSD) test were used to demonstrate the relationship between the diffusion of new variants cases and their corresponding reported location (continent level). In addition, Tukey-HSD test was also used in comparative analysis for comparing the effectiveness of vaccination and NPIs on containing the spread of new coronavirus variants.
The Global Prediction System of COVID-19 Pandemic (GPCP) was employed for predicting the development of the new coronavirus variants. The base model used in this system is modified SEIR model. The simplest epidemic model which is widely used is SIR (S: susceptible, I: infectious, and R: recovered), while the drawback of this model is that the incubation phase is ignored, which means that each identified susceptible individual will be immediately categorized into infectious group, and then recovered (Li et al., 1999). By contrast, the classic SEIR (S: susceptible, E: exposed, I: infectious, and R: recovered) model has been improved by considering a latent period, which is the class E. However, both of these two models have limitations for the real situation (Annas et al., 2020;Peng et al., 2020). In order to fit the real data better and predict the development of the pandemic more accurately, we chose to use a modified SEIR model with three new-added coefficients, namely P (protected cases), Q (confirmed and quarantined cases), and D (death cases) (López and Rodó, 2021). The model can be expressed as follows: S + P + E + I + Q + R + D = N (total population). The GPCP system took the influence of the environmental factors, such as temperature and humidity into account, thus, the transmission rate β in the equations above is calibrated as and F(RH 2m ) are the probability distribution functions of temperature and humidity obtained by Huang et al. (Huang et al., 2020a, b), where T 2m represents the temperature at 2 m above ground level, and H 2m is the humidity at 2 m above the ground level. β 0 is the transmission rate when temperature and humidity are excluded, and β 1 and β 2 represent the transmission rates when temperature and humidity factors are included, respectively. Moreover, the weekly and seasonal cycle of COVID-19, and public behaviors and government policies are parameterized in the GPCP system to make more precise prediction (Liu et al., 2021a, b;Huang et al., 2021).

The development and distribution of the new coronavirus variants
There are at least eight regions have reported the discovery of the new coronavirus variants so far, namely California (US), Kent (UK), Manaus (Brazil), New York (US), Kampala (Uganda), Maharashtra (India), Nelson Mandela Bay (South Africa), and Osun (Nigeria). As shown in Fig. 1, we can see that the new variants are located almost in every continent, except Oceania and Antarctica. The bar plots beside each location show the daily average new cases before (green) and after (red) the appearance of the new variants. Previous study showed that the optimal temperature zone for spreading the coronavirus is between 5 and 15 • C (Huang et al., 2020a, b), whereas, the huge difference between 2 bars still implies that despite the cold weather and mass gatherings have accelerated the spread of the virus, the new coronavirus variants also increase the transmission rate, thus, led more people get infected. Additionally, the timeline of the identification of each variant lineage indicates that most of the new variants appeared in the late 2020 (September to December), which means that with more people infected by the virus, the chance of mutation will be increased as well.
Alpha lineage, which is one of the variants of concern lineage, is the widest spread lineage due to its higher transmissibility rate than other non-concerned variants lineage (Volz et al., 2021a, b). We used the number of countries in each continent that identified the new lineages divided by the total number of countries that identified the new lineages to show the percentage that each continent accounted for. Fig. 2 indicates that among all countries that identified α lineage, European countries take up 34%, following by Asia (26%), Africa (20%), Americas (18%), and Oceania (2%). As the originate of the α, the UK has the highest number of variant sequences, with the cumulative sequences almost double of the second country. It shows the fast spread of α in the UK, which became the dominant lineage within six months since it was After the large-scale emergence of α lineage, another new lineage named Beta lineage was detected breaking out in South Africa. Fig. 3 shows that β lineage has been spread to more 90 countries and regions as of 8th June 2021, which is the second widest spread lineage. It originated in the Nelson Mandela Bay in South Africa, but was spread to most of the European countries. Unlike α lineage, the number of β lineage  cases are not as many as α, with the heavily affected countries concentrated in Europe, United States, and South Africa. Although the number of variant sequences is less than it is of α lineage, β lineage shows a stronger resistance to the vaccines (Abu-Raddad et al., 2021), which should be given enough attention before it causes irreparable losses. As shown in Fig. 4, the bar plot indicates that the number of β variant sequences is only about 1/10 of α. The proportion of each continent is similar to the α, with Europe takes up 33%, following by Asia (25%), Africa (27%), Americas (12%), and Oceania (3%). Fig. 4 shows another major lineage labeled Gamma has been spread to most of the American and European countries, whereas, Africa has not reported any cases yet, as of 8th June 2021. γ lineage developed mutations at the same three receptor-binding domain residues with β lineage, therefore, it also shows resistance to vaccines . Although γ lineage originated in Brazil, it affected the US most. The number of variant sequences in the US is more than two times of Brazil. The range of the affected countries by γ lineage are smaller than α and β, where 54 countries and regions are affected as of 8th June 2021. As shown by the bar plot, except for the US and Brazil, the rest of the countries have small number of variant sequences compared with α and β. The countries near Brazil, such as Argentina and Chile, were affected more severer than other South American countries. The pie chart shows that Europe and Americas take up 74% of the global confirmed cases, while Asia takes up 22% and Oceania only takes up 4%.
Eta lineage, also known as B.1.525, was first identified in the UK, and became the dominant lineage in Nigeria quickly. It has the same E484K mutation with the β lineage, thus, η lineage exhibits the similar resistance to vaccines (Zhou and Wang, 2021). Fig. 5 shows that η lineage has been spread to 60 countries and regions. Europe and North America are still the hot spot regions. The bar plot shows that United States has the highest record in the number of variant sequences. European countries still take up most of the places in the top-10 countries, with seven European countries ranked in the top-10 affected countries, and one African country and one Asian country. Pie chart shows that Europe, Africa, and Asia are the most affected continents, with 87% contribution to the confirmed cases in total. Americas and Oceania only take up 11% and 2%, respectively. Similar to α lineage, the spatial distribution of η lineage indicates that the countries locate near the UK are affected heavier than other countries. Fig. 6 shows the distribution of the Delta lineage. Although it was first identified in November 2020, it did not cause too much concern until early 2021. δ lineage was originated in India, and shows a fast spread trend. Delta lineage is estimated to be about 60% more transmissible than alpha lineage (Callaway, 2021) and can reduce the neutralizing level of vaccines (Lustig et al., 2021). As one of the non-B.1.1.7 variants, delta lineage has the immune escape ability (Mishra et al., 2021), thus, replaced the α lineage becoming the dominant lineage in the UK (Wall et al., 2021). It has been spread to 62 countries and regions as of 8th June 2021. Europe and Asia are the two most affected continents by δ lineage, and they took up 74% of the total countries that identified delta lineage. While, Africa takes up 11%, and Americas and Oceania take up 10% and 5%, respectively. The bar plot shows that the UK has the most number of variant sequences, which is more than six times of India. Different from the previous lineages, Asian countries were hit by δ lineage heavily, as there are three Asian countries rank in the top-10 countries. Nevertheless, Europe is still the most affected continent by δ lineage. The fast spread speed of δ lineage and its ability of reducing the effectiveness of vaccines have caused great concern globally, and it might trigger the third wave of the epidemic.
The overall distribution of the above mentioned five lineages is shown in Fig. 7. We can observe that Europe is the most affected continent by the variants. The UK has the most number of variants sequences among all countries, since two lineages were originated in the UK, and followed by the US. Europe, Africa and Asia take up the similar proportion in terms of the number of countries identified with new coronavirus variants. Except for a few countries and regions, the new coronavirus variants have been spreading to almost all the countries in the world at a fast speed. Therefore, the new variants are playing an important role in triggering the third wave of the pandemic.  Fig. 8 shows the number of flights from the UK, South Africa, India, and Brazil to other countries. We summarized the top-10 destination countries for travelers from the UK, South Africa, India, and Brazil. According to the bar plots, we can observe that due to the travel restrictions, inbound travel becomes the first choice for many people. As for outbound travel, we can see that the most popular destinations for  English travelers are almost all European countries, and the US is the only non-European country. Similar with South African and Brazilian travelers, European countries are also the popular destinations, while for Indian travelers, the majority of the most popular destinations are Asian countries. This finding explains the reason why Europe is the hot spot region for Alpha, Beta, Gamma, and Eta variants, and Asia is heavily affected by Delta lineage. Human mobility can increase the chance of getting infected by the virus. With the higher transmissibility rates of the new coronavirus variants, human-to-human transmission will become faster and easier than before, which will lead to a larger outbreak. Therefore, the government should implement more strict restrictions, such as close public transport and restrictions on internal movements, to  contain the spread of the epidemic.

Statistical analysis on the new coronavirus variants
We collected the data of vaccination policy, restrictions on internal movements, restrictions on gatherings, and close public transport in each country, and conducted the ANOVA analysis and Tukey-HSD test to compare the effectiveness of each factor on diffusion of the new variants cases continent-wise. The results of ANOVA analysis that presented in Table S1 to Table S5 illustrate that vaccination policy can significantly affect the diffusion of the new variants cases in majority continents, such as Europe, Asia, and Oceania. The Tukey-HSD test results further show that it will be significantly effective if the universal vaccination is achieved. While, for most of the countries, the universal vaccination remains unpractical. Therefore, non-pharmaceutical interventions became the priority. The results in Table S1 to Table S5 show that restrictions on internal movements (P-values: Europe: 0.000434, Asia: 0.021903, Africa: 0.000126), restrictions on gatherings (P-values: Asia: <2e-16, Africa: 1.42e-10), and close public transport (P-value: Americas: 0.00318) can contain the spread of new variants cases significantly. For populated continents, such as Europe, Asia, Africa, and Americas, restrictions on human mobility and gatherings are useful in containing the virus. As for Oceania, which has a small population, vaccination is a more effective way to contain the virus. Therefore, non-pharmaceutical interventions should be promoted and implemented before universal vaccination is achieved. Fig. 9 shows the distribution of the new coronavirus variants in different continents. The weighted data was used to make the results less biased. From the violin plot we can observe that Africa has the highest median value, and the distribution is left skewed, which indicates that most of the African countries tend to have larger number of variants sequences. The median values of Asia and Americas are close to each other, while the density plots of these two continents show different trends. The density plot of Americas indicates that the higher probability appears at the right side, which means that Americas has higher number of variants sequences. Whereas, the violin plot of Asia shows that the variability of the number of variants sequences in Asia is large, but density plot and median value indicate that most of the data are concentrate at the lower part. As for Europe, the distribution is more symmetric than other continents, and most of the European countries have similar number of variants sequences. Oceania, which has the third highest median value, has two peak values, one is at the left side, and another one is at the right side. It is due to not too many countries locate in Oceania, and the higher variants sequences are mainly observed in Australia and New Zealand, and the rest of the countries have small number of variant sequences.
In addition, the result of ANOVA also illustrated that continent is a significant contributor to the spread of new coronavirus variants (Pvalue < 2e-16). Where the null hypothesis (H 0 ) is: The mean values between each group are the same (μ Europe = μ Asia = μ Africe = μ Americas = μ Oceania ); the alternative hypothesis (H 1 ) is: The mean values between each group are not the same (μ Europe ∕ = μ Asia ∕ = μ Africe ∕ = μ Americas ∕ = μ Oceania ). To have a deeper understanding on how the location affects the diffusion of new coronavirus variants, we conducted the Tukey's Honestly significant Difference (HSD) test to compare the mean differences between each group. The detailed results are shown in Table S6. According to Table S6, we can conclude that when a country is from Africa, it is highly likely that it will be affected by the new coronavirus variants more severely than other countries (P-value: Americas-Africa: 0.0000000, Asia-Africa: 0.0000000, Europe-Africa: 0.0000001, Oceania-Africa: 0.0000000). Fig. 10 shows the prediction result of the diffusion of the delta lineage in the US from September 2021 to December 2022. According to the report from the CDC in the US, it is estimated that more 90% of the infected cases were contributed by delta lineage in the US ("COVID Data Tracker Weekly Review", 2021). It is worthwhile to make prediction on the diffusion of delta lineage in the US in the next year. The prediction result in Fig. 10 demonstrates that the development of delta lineage exhibits a seasonal oscillation, with the appearance of outbreaks in the autumn and spring. Based on our prediction, the first wave will occur in October, with the highest record reaches 323,360. A small resurgence will occur in April 2022, with the number of cases reach to184,196. With the cold weather arrives in late October, another wave will come, and the peak value will be 232,622, and the pandemic will not be contained in the next year. Researchers found that delta lineage would result in a higher hospital admission rate and would also increase the chance of developing to severe cases for patients compared with alpha lineage (Twohig et al., 2021). Although, the neutralizing level of vaccines were found to be reduced against delta lineage, researchers estimated that the efficacy of Pfizer and AstraZeneca against delta variant are 83% and 61%, respectively, after 14 days since the second dose was completed, while the efficacy would decrease to 35.6% and 30.0%, respectively, after 21 days if only the first dose was injected (Nowroozi and Rezaei, 2021). Therefore, vaccination is important in containing the spread of delta lineage.

Discussion and conclusion
As one of the most severe public health crises, COVID-19 has shown its destructive power to the world. Vaccines has been playing a vital role in defeating infectious diseases, such as small box, polio, etc. (Harrison and Wu, 2020). Increasing the acceptance of vaccination among the publics is a big challenge for many countries. It is hard to achieve herd immunity without the help of vaccines, as it will overwhelm the healthcare system (Brett and Rohani, 2020). Therefore, governments should put much effort on acknowledging people that vaccination is the key role in achieving herd immunity (Verger and Peretti-Watel, 2021;Frederiksen et al., 2020). Other than vaccination, non-pharmaceutical interventions (NPIs) remain effective in containing the diffusion of the virus. Limiting human mobility, such as lockdowns, canceling public event, closing schools, and restrictions on gatherings, can reduce the infections effectively (Flaxman et al., 2020;Askitas et al., 2021). Although, the aforementioned NPIs can curb the pandemic, they came along with some negative social impacts. The longer period of lockdown did not decrease the mortality rate, but also generated negative impacts on economic growth (Coccia, 2021f). In order to cope with the next potential pandemic, governments should establish a mature crisis management, which includes containment strategies and mitigation strategies (Coccia, 2021d), and research found that smaller population size, mature governance and health system can be considered as useful indexes for designing preparedness during a pandemic (Coccia, 2021e). In addition, increasing the health expenditure and forming a set of environmental strategy, including improve the air quality and improve the urban ventilation, are also important in terms of reducing the prevalence and fatality rate (Coccia, 2020a(Coccia, , 2021b. The statistical models and prediction results in this study demonstrate that the new coronavirus variants, especially the delta lineage, is spreading at a fast speed, with Africa and Europe are the most affected continents, and few countries can escape from the invasion of new coronavirus variants. NPIs are considered to be effective methods to mitigate the pandemic. Lian et al. (2021) pointed out that effective lockdown policy can reduce the number of confirmed cases in approximate 15 days. But their negative impacts on the societies urge governments to accelerate the vaccination campaigns. Therefore, governments should let the public aware of the importance of vaccination.
The conclusions of this study are general and broad. There are some factors that are not included in the model, for example, the demographic data (gender, income, age, etc.). The continent-scale is a relatively large scale, and this study can be further narrowed down to country-scale, or some specific scenarios (indoors and outdoors). However, our study still provides insights in terms of the spread and distribution of new coronavirus variants. The recent outbreak of COVID-19 occurred in February 2021 implied that the main cause of this outbreak is new coronavirus variants. This study can provide suggestions on containing the spread of the virus, and can also set an alarm to the public for the next possible outbreak, and let people and governments be prepared and reduce the losses of lives.

Credit author statement
Jianping Huang designed the study and contributed to the ideas, interpretation and manuscript writing. Yingjie Zhao contributed to software, data collection, data analysis, figure plotting and manuscript writing. Li Zhang contributed to data collection and software. Siyu Chen contributed to manuscript writing and reviewing. Jinfeng Gao contributed to software. Hui Jiao contributed to figure plotting. All of the authors contributed to the discussion and interpretation of the manuscript. All of the authors reviewed the manuscript.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.