How the global health security index and environment factor influence the spread of COVID-19: A country level analysis

The progress of viral diseases such as the new coronavirus (COVID-19) can be influenced not only by social isolation policies, but also by climatic factors. Understanding how these factors affect the progress of the pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) may be essential to know the risks each country is facing because of the disease. In this study, we verified the existence of a relationship between the basic reproduction number (R0) of SARS-CoV-2 with different climate variables, while also considering the Global Health Security Index (GHS). We collected data from confirmed cases of COVID-19 along with their respective GHS notes and climate data, from December 31, 2019 to April 13, 2020, for 52 countries. The generalized additive model (GAM) was applied to explore the effect of temperature, relative humidity, solar radiation index, and GHS score on the spread rate of COVID-19. The countries that showed similarity to each other were grouped into clusters using the Kohonen self-organizing map methodology to investigate the importance of each variable in the dissemination of the disease. The temperature variable presented a linear relationship (p < 0.001) with the R0, with an explained variation of 36.2%, while the relative humidity variable did not present a significant relationship with the R0. The response curve of the solar radiation variable presented a significant nonlinear relationship (p < 0.001) with an explained variation of 32.3%. The GHS index variable, with a significant nonlinear relationship (p < 0.001), presented the largest explanatory response in the control of COVID-19, with an explained variation of 38.4%; further, it was observed that the countries with the largest GHS index scores were less influenced by climate variables.


Introduction
Coronaviruses (CoV) are a large family of viruses that affect humans, targeting mainly their respiratory systems. These viruses stand out for having the largest genomes among all RNA viruses, with approximately 30,000 nucleotides, and primary transmission based on zoonotic transfer [17]. Among the coronaviruses that can infect humans, two had been previously identified as causing severe diseases, these being the Severe Acute Respiratory Syndrome (SARS-CoV), and Middle East Respiratory Syndrome (MERS). In late December 2019 a group of patients in China with an unknown initial diagnosis presented symptoms similar to SARS, leading researchers to identify a new coronavirus (SARS-CoV-2), which was later named COVID-19 by the World Health Organization (WHO) [33]. The new virus has a high transmissibility rate and its propagation occurs mainly by direct contact or by droplets generated from coughing or sneezing from contaminated individuals [3]. The rate of transmission can be measured by the basic reproduction number (R0); this calculation indicates the average number of infections generated from an infected individual in a fully susceptible population. When the R0 exceeds the numerical value of one, it indicates that the number of infected individuals multiplies rapidly and can result in an epidemic [7]. The R0 of a highly contagious disease, such as Measles, ranges from 13 to 18, while that of the common flu is 1.3; the estimate for COVID-19 is between 2 and 3 [27].
The basic reproduction number has been used to guide government decisions aimed at containing the disease outbreak, and the best option to curb its advance has been to adopt social isolation measures since a vaccine or drug that demonstrates total efficacy is not yet available [32].
The manner in which the disease has progressed in each country reveals the discrepancy in health system capacity, political security, and socioeconomic risk of each country; further, it highlights that efficient measures are required to mitigate the pandemic. These discrepancies were measured through the Global Health Security Index (GHS) that indicates a country's capacity to detect and respond to an emerging outbreak. It estimates a global average for 195 countries, by considering the following factors: capacity for early detection and reporting of epidemics; a prepared and robust health sector to treat patients and protect health workers; commitment to improve country capacity, and funding plans to adhere to global standards [19].
Although coronaviruses have been the subject of scientific studies since 1968, this new strain of the virus presents some characteristics that have not yet been well determined, such as its survival rates in outdoor and indoor environments. It is known that viruses do not replicate outside the living cells of their hosts [3]. However, infectious viruses such as SARS-CoV can survive on contaminated environmental surfaces [9]. Studies conducted during the SARS epidemic showed that days with lower temperature, relative air humidity, and solar radiation increased the number of cases of the disease by 18 times when compared to opposite climatic conditions [15]. In the COVID-19 pandemic it was observed that in March 2020, most of the confirmed cases were registered in countries with low temperatures and humidity [31]. Although the behavior of COVID-19 in different climatic conditions is not well known thus far, understanding the stability of the virus in these various environments is of utmost importance to understand the transmission of this new infectious agent.
Thus, the aim of this study was to verify the relationship between the dissemination rate of COVID-19 and climate information, such as environmental temperature, relative humidity, and solar radiation, based on the compilation of data obtained in several countries, whilst also considering the Global Health Security Index (GHS) of the selected countries.

Epidemiological data
We collected publicly-available data on 1,033,796 confirmed COVID-19 cases from 52 countries having more than 100 confirmed cases of the disease (Appendix A), in different climate regions ( Fig. 1), for the period from December 31, 2019 to April 13, 2020. The data were extracted from online reports on the European Centre for Disease Prevention and Control (ECDC) website [6]. (See Table A.)

Climate data
Data were collected on temperature ( • C), relative humidity (%), and solar radiation (kWh/m 2 ) during the months of December 2019 to April 2020 (Appendix A). The data used to calculate the daily averages for the months studied were extracted from the Weather Spark online web service [25].

Global Health security index (GHS)
In order to minimize the differences in effects between developed and underdeveloped countries, including from their significant cultural differences, scores used in this study were based on the Global Health Security Index (GHS) (Appendix A). The GHS index is a project of the Nuclear Threat Initiative (NTI) and the Johns Hopkins Center for Health Security (JHU), and it was developed along with the Economist Intelligence Unit (EIU). The GHS indicates a country's ability to detect and respond to an emerging outbreak. The indicators and sub-indicators used for developing the scores can be found online on the GHS Index website [19]. It is known that the way in which this index is calculated can enter as a bias in these analyzes, however according to [5] the GHS index uses the arithmetic mean of the data for the six categories under analysis, so the weights are distributed equally among them. When analyzing each of these categories separately and using different types of statistics, it is possible to conclude that the items where the rapid response to a pandemic and risk detection is foreseen, factors evaluated in this article, have the least impacts for both statistical tests.

Estimation of the basic reproduction number (R0)
To quantify the transmission of COVID-19, we use serial intervals with the log-normal distribution. The mean and standard deviation of the series of intervals were 8.4 and 3.8 days, respectively. Since the estimation of R0 is difficult due to this being a pandemic still in progress, it was agreed to start the calculations from ten cases for each country in order to obtain results closer to the reality of the disease's course.
All analyses were done with the R software (version 3.6.0) using the "R0" package developed by [1].

Generalized additive model (GAM)
When working with nonlinear relationships, as is the case with climatic and epidemiological factors, it is necessary to be mindful of the choice of models that can best represent this association. We used the generalized additive model (GAM) that represents a semiparametric extension of the generalized linear model (GLM), which has been successfully employed to study nonlinear relationships [14,16,20,22,30]. The general model of GAM as given by Hastie and Tibshirani [10] is as follows: where g(.) are the values of the basic reproduction number; β 0 is the overall average response; s j is the smoothing function of the covariates x ij ; n is the total number of covariates; and ε i is the i-th residue, with var.
(ε i ) = ө 2 assumed to be normally distributed. The choice of the smoothing parameters was made through the restricted maximum likelihood (REML) method. The predictor variables analyzed in this study were the GHS index, temperature (temp), solar radiation index (SR), and relative humidity (RH). In our study, the model was developed as the following: This GAM model was implemented through the "mgcv" package [29] in the R environment (version 3.6.0). The statistical tests were bilateral.

Self-organizing maps
It was agreed to group the countries that displayed similar predictor variables into clusters using Kohonen's methodology of self-organizing maps [12], in order to investigate how important each variable was in the dissemination of the disease. This methodology is based on a two-tier network that is able to organize the analyzed data from a random start. The results show the natural relationships that exist between the patterns provided to the network, combining an entry layer with a competitive layer trained by the unsupervised learning algorithm [11]. The data were standardized to prevent variables with longer intervals from coordinating the map organization.
All analyses were performed using the R software (version 3.6.0). The "Kohonen" package [26] was used to construct the networks. Fig. 1 shows the estimates of R0 values of the 52 countries studied. The probability estimates of R0 ranged from 1.1 to 3.47, with the lowest estimate being for Thailand and the highest for Russia. These values may vary according to the estimation method used as well as the population and containment structure of each country. Because it is a disease with a short progression time and there is insufficient data on this aspect, the general estimates regarding of COVID-19 are still considered biased. However, it is expected that these estimation errors will be minimized as more data are amassed. Thus, considering these conditions, the results of this study corroborate the initial WHO estimate that the R0 of SARS-CoV-2 is between 2 and 3 [28].

General additive model (GAM)
The response relationship curves of climatic factors and GHS index with R0, including the visual diagnosis of the residual, are found in Fig. 2. The GHS and Solar Radiation variables presented nonlinear relationships (p < 0.001) and explained variations of 38.4% and 32.3%, respectively. The temperature variable presented a decreasing linear relationship (p < 0.001) and an explained variation of 36.2%. The relative humidity variable did not present any significant relationship with R0 (p > 0.05).
When analyzing the behavior of the curve presented by the temperature variable (Fig. 2b), it was found that each increase of one unit led to a decrease of 0.001 in the value of R0. The other climate variable that presented significant association was Solar Radiation (Fig. 2a), showing a curved pattern similar to the GHS index (Fig. 2c). When analyzing the behavior of Solar Radiation, we observed an increase of 0.152 R0 from its point of origin, reaching its critical point of 3.35 R0, with a subsequent gradual decrease of 0.099 R0, until it reached the Considering the conditional density of the residual represented in Fig. 2, the presence of the homoscedasticity component is verified for the variables temperature and solar radiation, presenting a constant variance of the residuals. The same cannot be observed in the GHS index variable, which shows a higher variability of the residual in lower values of the index, confirming the presence of heteroscedasticity. It was not possible to have a constant distribution of the residual even through the use of smoothing functions. Thus, lower GHS values present lower confidence bands, even though the Confidence Interval (CI) is centered on the regression function.

Self-organizing maps
The optimal network topology was found using five columns and seven rows (5 × 7), totaling 35 neurons. One thousand training iterations were performed in order to select the best net topology, culminating in the formation of five clusters. Fig. 3 show the list of 52 countries studied and the cluster each one was grouped into. The averages obtained in each cluster for the variables R0, GHS index, temperature (Temp), and solar radiation index (SR) are shown in Table 1.

Discussion
In this study, we sought to understand if climatic factors and the GHS index is associated with the increase or decrease in the transmissibility of the virus. When analyzing the temperature variable, it was observed that its increase led to a decrease in R0. The results corroborate previous studies conducted during the SARS epidemic, where [4] demonstrated that high temperature synergistically inactivated the viability of SARS, while low temperatures increased the survival rate of the virus in contaminated environments. Low temperatures also impact the immunological health of hosts, increasing the chance of lung infections and therefore causing hosts to become more susceptible to viral diseases [21] such as COVID-19. Recently, similar studies have been carried out with COVID-19, which also show negative associations between the number of confirmed cases and climate temperatures [18,24].
Another climatic factor evaluated in this study, solar radiation, presented a negative relationship with the transmissibility of COVID-19. The ultraviolet-B (UV-B) wavelength range has a fundamental effect on the human body, specifically in the synthesis of vitamin D [2]. This, in turn, is strongly related to immunity. In a controlled experiment, Cannel et al. observed that patients who received higher doses of vitamin D were three times less likely to develop flu symptoms. Thus, individuals with lower vitamin D levels have a significantly greater chance of being infected by respiratory diseases [23]. The level of vitamin D manufactured by the human body is dependent on the duration of the skin's exposure to the sun's UV-B rays [2]. Studies conducted with the H1N1 virus showed that countries with higher rates of solar radiation had lower numbers of influenza-infected people, while countries with lower rates of solar radiation had higher numbers of patients with the disease [8].
However, the variable that presented the greatest explanation for the corresponding R0 response was the GHS index. Countries in which this index was higher, i.e., with greater capacity to detect and respond to emerging disease outbreaks, such as those in Clusters 3 and 4, presented similar R0 values. The exception was Cluster 2, which presented a low R0 value and a low GHS index. The countries that comprised this cluster also had the smallest confidence bands due to the greater variability of the residuals, which can be explained by sub-notifications resulting from the different capabilities for COVID-19 detection and notification [13].
Discrepancies in the results for temperature among countries in Clusters 3 and 4 (with a higher GHS index) indicate that climatic difference was not the determining factor for the change in R0 in these cases. It is noted that in countries with lower temperatures (Clusters 4 and 5), prone to greater spread of the virus, the discordant values of R0 are mainly due to GHS scores. In countries with higher temperatures (Clusters 1 and 3), differences between R0 values were lower when compared to colder countries, but were also influenced by GHS indices.
Finally, it can be observed that the R0 index was low in countries with an advanced health system capacity, lower socioeconomic risk, and greater political security, demonstrating that they possess more capacity to control the outbreak regardless of the climate variables. This differs from countries with lower GHS scores, in which the climate has a greater influence on transmissibility (Clusters 1 and 5).
Several limitations of the present study exist. First, given the novelty of COVID-19, it is possible that the onset of the disease and other event data have been treated differently between countries. Second, our data Response curves for the effects of GHS, Temperature, and Solar Radiation. The x-axis contains the explanatory variables and the y-axis represents the smoothing functions for the R0 values. The relationship between the variables is indicated by the solid lines, and the dotted lines represent the 95% confidence interval (CI). p(y|x) is the probability of the conditional density of the residual. includes a number of different observations from confirmed cases between countries, which may cause a bias in the R0 value. Finally, this study was made possible only through the open sharing of data from confirmed cases in the countries under study, and there may be bias due to the notification and disclosure being the responsibility of each country. Continuous communication of dates and other details related to exposure and infection is crucial to promote scientific understanding of the virus, the infections it causes and measures that can be used to contain and mitigate the spread of the epidemic.

Conclusions
The results indicate the existence of a negative relationship between the incidence of COVID-19 and local temperature. A higher solar radiation index was also associated with a lower degree of dissemination of the virus. The variable with the highest explanatory response in the control of COVID-19 was the GHS index, as countries with the lowest values of this indicator also demonstrated greater influences of climate variables on the transmissibility of SARS-CoV-2.

Declaration of Competing Interest
None. Fig. 3. Map representing the 52 countries studied, grouped into clusters using the Kohonen methodology. The colors represent the five clusters formed.

Table 1
Averages of the variables basic reproduction number (R0), GHS index (GHS index), temperature (Temp), and solar radiation index (SR) of the clusters formed using the Kohonen methodology.