Global risks of infectious disease outbreaks and its relation to climate

Infectious disease outbreaks are recurring events which can lead to a large number of fatalities every year. Infectious disease outbreaks occur infrequently and the role of global warming and modes of climate variability for those outbreaks is not clear. Here we use an extreme value statistics approach to examine annual spatially aggregated infectious disease fatality data to compute their probability to occur using generalized Pareto distribution (GPD) models. The GPD provides a good model for modeling the fatality data and reveals that the number of fatalities follows a power-law. We find that the magnitude of Covid-19 is of an expected level given previous fatality data over the period 1900–2020. We also examined whether including co-variates in the GPD models provide better model fits. We find evidence that a pure linear trend is the best co-variate and, thus, has increased the propensity of large outbreaks to occur for most continents and world-wide. This suggests that mainly non-climate factors affect the likelihood of outbreaks. This linear trend function provides a crude representation of socio-economic trends such as improved public health. However, for South America the Atlantic multidecadal oscillation modulates the outbreak propensity as the best co-variate, showing that climate can play some role in infectious disease outbreaks in some regions.


Introduction
Infectious disease outbreaks occur occasionally and are caused by micro-organisms such as viruses, bacteria, fungi and parasites. Many infectious diseases are easily cured nowadays but as the current Covid-19 pandemic shows, major outbreaks still occur with a major human toll, threatening to overwhelm our health care systems and economies (Watts et al 2020). Climate variability and anthropogenic global warming are thought to also affect infectious disease outbreaks. Watts et al (2020) show that as a result of ongoing climate change, such as changes in global and local temperature, precipitation and relative humidity, the likelihood of transmission of such infectious diseases as dengue fever, malaria, and pathogenic Vibrio bacteria have significantly increased. Also Senior (2008) argues that anthropogenic climate change may increase the number of diseases outbreaks and their consequences especially in poorer countries.
However, not only climate and weather related factors may influence the number of infectious disease incidents. One example of such a non-climate related factor is the increased proximity between the human population and wild-life. Allen et al (2017) show that land-use changes may increase the risk of zoonotic disease transmissions. Jones et al (2008) show that not only wildlife host richness and weather conditions, but also increased human population density, may increase the probability of infectious diseases emergence. Changed human behaviour patterns are another factor which can increase the propensity of infectious disease outbreaks. Desbordes (2021) provide evidence that cross-border effects, such as movements of people and goods in particular, negatively affect health care and can contribute to the severity of the outbreaks.
On the other hand, Lafferty (2009) argues that there is a little evidence that climate change affects the emergence of infectious diseases. Liang and Gong (2017) summarizes the ongoing debate and attributes the mixed results that may show positive, negative or uncertain impacts of climate change on the occurrence of infectious diseases to different effects of climate change, different types of pathogens and different spatial and time scales used in the research.
Furthermore, improved health-care systems provide an obvious countermeasure to the increase of infectious diseases and may, as some other socioeconomic co-variates, influence the observed relation between climate change and incidents of infectious diseases. While there exists a vast literature on the relation between climate change and emergence of infectious diseases, many studies are limited to a particular region or particular diseases. Yi et al (2019) based on a literature review states that temperature and relative humidity may increase the incidents of bacterial and viral diseases in China. The exact growth will depend on the disease, the geographical region of China among other factors. For example in the case of the dysentery disease a temperature increase by 1 • C may lead to a 3.6%-14.8% increase in the incidence in southern China. Waits et al (2018) surveys the literature on climate change and infectious diseases in the Arctic. They identify increased temperature and precipitation as the most important climatic factors and tick-borne diseases, tularemia, anthrax, and vibriosis as the diseases that are mainly affected by climate change in the Arctic. Prachumsri (2011) investigates the impact of climate change on malaria incidents in Thailand. Amuakwa-Mensah et al (2017) show that temperature has a negative linear effect on the number of infectious and parasitic disease patients in Sweden. Depending on the season the relation can be non-linear. Coates and Norton (2021) revise the literature and show that climate change may increase the occurrence of dermatological diseases caused by different pathogens. Varo et al (2019) survey the literature on the relation between atmospheric phenomena such as hurricanes and cyclones on cholera disease.
In our study we use a global perspective and put the current Covid-19 pandemic into its historical context. Most previous studies focused on particular diseases or regions. Here we take a global perspective on infectious disease outbreak risk by aggregating the number of fatalities from all infectious disease outbreaks for whole continents and the whole world. In that sense our study may not be that detailed as some specific studies but will provide valuable insights whether the aggregated effects of climate change and modes of climate variability affect the propensity of infectious disease outbreaks. This is of special importance when taking into account the mixed partial evidence of regional studies. We also concentrate on disaster events in our study as these events may have the highest impact on health-care systems and economies not only due to the recorded numbers of infected people but their concentration in time. Since infectious disease outbreaks occur episodically, and not every year, it is best to model them in a probabilistic fashion. As many outbreaks can affect rather small numbers of peoples and a few can cause a pandemic an extreme value statistics approach seems appropriate, therefore, the GPD modeling approach is used here.

Data
We use the EM-DAT data base for infectious disease outbreaks and number of fatalities. We use the category 'biological' , sub-category 'epidemic' with subtypes 'bacterial disease' , 'parasitic disease' and 'viral disease' for the period 1900-2020. EM-DAT includes all disasters from 1900 until the present, conforming to at least one of the following criteria: • 10 or more people dead • 100 or more people affected • The declaration of a state of emergency • A call for international assistance Hence, EM-DAT provides estimates of the confirmed number of fatalities due to infectious disease outbreaks. We last accessed EM-DAT on 4th January 2021. At that time EM-DAT did not record any Covid-19 related fatalities. For Covid-19 fatalities we used data from https://ourworldindata.org/. We last accessed that site on 27 March 2021.
In order to make the fatalities data comparable across continents and time we normalize them to fatalities per 100 000 residents. We use population data from the World Bank data base https:// data.worldbank.org/. We last accessed that site on 18 January 2021. The World Bank population data covers the period 1960-2020. For the period 1900-1959 we use data from the OECD compiled in Maddison (2001). Those estimates are only for the years 1870, 1913 and 1959. Since continentally aggregated population data are very smooth we use a spline interpolation to compute estimates for the missing years. Maddison (2001) does not provide explicitly data for the Oceania region, thus, we focus on the continents Africa, Asia, Europe, North America and South America. By whole world we mean the aggregate of those continents excluding Oceania.
To see whether climate change and modes of climate variability affect the infectious disease outbreaks we use the following index time series (figure 1): • The global mean surface temperature (GMST) based on the HadCRUT 5.0.1.0 data set (Morice et al 2020). GMST represents the effects of anthropogenic global warming. • The El Nino-Southern Oscillation (ENSO) index which is the most important mode of climate variability of the coupled atmosphere-ocean system (Wang andFiedler 2006, Timmermann et al 2018). We use the ENSO3.4 index. ENSO constitutes variations in winds and sea surface temperatures over the tropical eastern Pacific ocean. ENSO affects much of the tropics and subtropics and its teleconnections reach also the extratropics (Feldstein and Franzke 2017). ENSO and its teleconnections represent the strongest year-toyear fluctuations of the global climate system. The ENSO3.4 index is detrended. • The Atlantic multidecadal oscillation (AMO) index (Knight et al 2006). The AMO represents the variability of the sea surface temperature of the North Atlantic Ocean on the timescale of several decades. The AMO affects many regional climate impacts such as Brazilian and African Sahel rainfall, Atlantic hurricanes and North American and European summer climate (Knight et al 2006). The AMO index is detrended. • The Pacific decadal oscillation (PDO) index (Mantua and Hare 2002). The PDO represents the ocean-atmosphere climate variability centered over the mid-latitude Pacific basin. The PDO affects regional temperatures and precipitation in the Pacific region, Eastern Sibiria, Australia and also the Indian summer monsoon (Mantua and Hare 2002). The PDO index is detrended. • We also consider a simple linear time trend as a co-variate to see whether changes can be attributed to other changes of possible a socio-economic nature. We expect that the health care availability and quality also has trends as well as the mortality of the considered diseases. In order to try to distinguish them from the trend in global mean surface temperature (GMST) we added this linear trend function as a co-variate.
The climatological data sets have been downloaded from https://climexp.knmi.nl. We tested the modes of climate variability for stationarity using the augmented Dickey-Fuller and the Kwiatkowski-Phillips-Schmidt-Shin tests (Leybourne and Newbold 1999). GMST is non-stationarity due to the global warming trend, while ENSO, the AMO and the PDO are stationary. We also tested for co-integration using the Engle-Granger test (Rao 1997) between GMST and the fatalities data. Co-integration tests for long-term correlations between time series. When considering linear trends in the test, it turns out that GMST and the fatalities data for the continents are not co-integrated.

Generalized Pareto distribution
Since infectious diseases are very episodic phenomena we use a probabilistic approach to investigate them. We use a threshold exceedance approach using the GPD (Coles 2001). We estimate the data using the Maximum Likelihood method from the R package extRemes (Gilleland and Katz 2016). The GPD hast the following functional form: where z = x−µ σ and x are the fatalities per 100 000 residents, σ > 0 denotes the scale parameter, ζ the shape parameter and µ the threshold. We used the goodness-of-fit test by Villaseñor-Alva and González-Estrada (2009) to test whether the GPD fits the fatalities data well. This test reveals that the test cannot reject the null hypothesis that the fatalities data stems from a GPD distribution (p-values > 0.312; see table 1). Hence, the GPD is a reasonably good choice for analyzing the fatality data. Furthermore, we also perform a non-stationary extreme value analysis using co-variates. In this analysis we make the scale parameter dependent on the co-variates. In order to ensure positivity of the scale parameter we perform a log-transformation ϕ = log(σ) and allow the following dependence of the scale parameter on the co-variates where cov i is the ith co-variate. Because of the sparseness of the fatalities data we only allow one co-variate at a time. To decide which co-variates improve the fit of the GPD with respect to the stationary GPD model we carry out a model selection exercise using the Bayesian information criterion (BIC) (Burnham andAnderson 2003, Franzke andTorelló i Sentelles 2020). The most parsimonious model is the one with the lowest BIC value. If the difference between the model with the lowest BIC value and any other model is larger than 4, then this model has considerable less empirical support (Burnham and Anderson 2003). If the difference is between 0 and 2 both models have substantial empirical support (Burnham and Anderson 2003). We present the data in form of return levels and return periods. A return period is the average time between two events of a specific magnitude. The return period is equivalent to the reciprocal of the probability that events occur in any given year. A return level is the magnitude of an event that occurs on average once every y-years. The concept of the return period and return level assumes that the system is stationary (Cooley 2013, Rootzén andKatz 2013). For non-stationary systems effective return levels can be computed (Cooley 2013, Rootzén and Katz 2013, Franzke and Czupryna 2020. The return levels depend now on the co-variate. For example, effective return levels can now be dependent on whether we have a positive or negative anomaly of ENSO. That way we can identify the impact of ENSO on the number of fatalities and make potentially long-term probabilistic predictions assuming that we are able to make skillful long-term ENSO predictions.

Results
The standardized fatality numbers are shown in figure 2. One pronounced event is the Spanish flu which shows large numbers of fatalities in Europe and North America during 1918-1920. Africa suffered from a big outbreak of 'sleeping sickness' in 1901 in Uganda (Koerner et al 1995). In 1920 also Bubonic plaque and Cholera outbreaks caused many fatalities. Overall, in the period 1900-1920 many big outbreaks occurred in Africa and Asia, while Europe and North America were mainly affected by the Spanish flu in that time period. For Africa, Asia and Europe Covid-19 is not even the biggest event, as it is for North and South America.

Extreme value analysis
We fit GPDs to normalized fatalities time series with and without the 2020 Covid-19 cases to see whether the GPD parameters are overlapping or not. If they would not overlap we claim that Covid-19 is an unexpected event, using a metaphor for a rare and hard to predict event-a Black Swan event (Taleb 2007)which would be outside of the expected range of prior events. By that we mean that based on past observations we would not have expected an event of this magnitude. An investigation of figure 2 already shows that Covid-19 is not the largest event for most continents; it is only the largest event for both North and South America. The GPD parameter estimates together with their error bounds are given in table 2. As it can be seen for all continents the parameter estimates together with their error estimates overlap. Thus, including Covid-19 does not change the GPD in such a strong way, that we could speak about two significantly different distributions. From this we conclude that Covid-19 is not outside of the expected range of the fatalities of an infectious disease, when taking the existing data into account. In a probabilistic way it was predictable in its size, see also Taleb and Spitznagel (2020). However, including Covid-19 increases the value of the shape and scale parameters for all continents.
What was the probability of a event of Covid-19's magnitude to occur? World wide it was a 1-in-88 year event before Covid-19, considering Covid-19 it is a 1-in-72 year event for number of fatalities per 100 000 residents (table 3). For Asia such an event is most likely to occur. For Europe and North America it is a much rarer event of about 1-in-200 years. For South America it is the most extreme event. While the magnitude of Covid-19 is not unexpected considering the statistics of past events; including it makes it still an event much more likely to occur.
That such outbreaks are more like to occur in Asia than on other continents is likely due to that Asia is a hot spot of environmental changes, rapid socio-economic change and has a high population density: about 30% of the global population lives in East and Southeast Asia (Horby et al 2013). While health care has improved over the last few decades, infectious diseases are still a problem in this region (Horby et al 2013).
Furthermore, all shape parameters are positive (table 2). This indicates that the fatalities data are power-law distributed with even extremely large numbers of fatalities still having a non-negligible probability to occur. Furthermore, a power-law also indicates that there is not a typical number of fatalities for such outbreaks, rather the distribution is scalefree .
The GPD allows us to compute return levels (figures 3 and 4). First focus on figure 3 which reveals that the GPD model can represent the fatality data well. The return levels of the empirical data are almost always inside the uncertainty bounds of the GPD model which again confirms that the GPD model fits the fatalities data well. Furthermore, figure 4 reveals that in Asia the likelihood of an outbreak with large numbers of fatalities to occur is largest with a 1-in-100 year event causing on average 175 fatalities per 100 000 residents. The next most affected continent is Africa with 33 fatalities per 100 000 for a 1-in-100 year event and then Europe with 15 fatalities per 100 000 residents. North American faces the least threat of a major infectious disease outbreak with 11 fatalites per 100 000 residents for a 1-in-100 year event. However, for the whole world we can expect 446 fatalities per 100 000 residents for a 1-in-100 year event.
In order to see whether global warming or modes of climate variability have an impact on infectious disease outbreaks and their number of fatalities we also estimate non-stationary GPD models with covariates in the scale parameter. We then use the BIC to identify the best fit models for the different continents (table 4).
The most relevant co-variate is the linear trend for most continents. The linear trend provides a decisively better model than any other co-variate according to Burnham and Anderson (2003). Since the linear trend provides also a much better fit than GMST we conclude that global warming is not directly affecting the propensity of infectious disease outbreaks. This conclusion is additionally supported by a heuristic proposed by Burnham and Anderson (2003); the differences between BIC values for linear trend and GMST models for Africa, Asia, Europe and the whole world are larger than 4. This suggests that factors such as increased human wildlife contact or urbanization, where infections can more rapidly spread, are likely responsible for this trend. The exception is South America where the AMO is the most relevant co-variate. We get qualitatively similar results when using the Akaike information criterion.
In figure 3, we display the effective return levels. As can be seen, for most continents the risk of infectious disease outbreaks has increased considerably (world-wide, Africa, Europe and North America) while it has only decreased for Asia. For South America, the negative phase of the AMO increases the likelihood of infectious disease outbreaks. The AMO affects South American rainfall, (Knight et al 2006, Ting et al 2011, Jones and Carvalho 2018 and has been linked to disease outbreaks (Zell et al 2008).

Discussion
The current Covid-19 pandemic raises the question whether anthropogenic global warming is increasing the propensity of infectious disease outbreaks. Here we used an extreme value statistics approach to address this question. We find that a GPD fits the fatalities data well. Using co-variates we find that a general linear trend of the scale parameter in the GPD provides the best model using a model selection exercise based on the BIC. This general linear trend model is substantially better supported by the data than GMST. This suggests that mainly nonclimate related changes have changed the likelihood of infectious disease outbreaks. Possible causes are increased human-wildlife contact, increasing urbanization and travel. The fact that global temperature increase-when compared with a linear trendhas a relatively small impact on the propensity of infectious disease outbreaks, does not rule out the possibility that further global temperature increase may impact the propensity of outbreaks in the future.
Infectious diseases flourish and spread best in their optimal climatic range (Liu-Helmersson et al 2014, Lafferty and Mordecai 2016, Blagrove et al 2017, Mordecai et al 2017, Rohr and Cohen 2020. Due to global warming, new areas can become more hospitable for microbes and viruses where they could not thrive before (for example Malaria (Caminade et al 2014)). Here we assume that global mean surface temperature is a good proxy for this and other climate variables such as moisture and precipitation. How strong a limitation this is needs to be researched since also socio-economic factors, land use and urbanization can influence the habitability of infectious diseases. This calls for further and more regional studies.
By examining the fatalities data with and without Covid-19 fatalities we find that the current Covid-19 event was predictable in a probabilistic sense. The number of Covid-19 related fatalities is inside the expected range from historical events. This means that Covid-19 is not a 'Black Swan' event (Taleb 2007). Covid-19 may be better described as a 'Dragon King' (Sornette 2009) a metaphor for a large event of a unique origin or 'Perfect Storm' (Glette-Iversen and Aven 2021) a metaphor for a large event amplified by the joint occurrence of rare phenomena.
In most continents the effective return levels of the number of fatalities has increased with the exception of Asia. Whether this is related to better health care resulting from the recent economic development needs further research.
To better understand which effects have caused the increase in infectious disease risk we need to examine a spectrum of a socio-economic data like quality of public health care, social life patterns, social networks structure underlying the contacts with other Table 4. BIC values for GPD fits to fatalities data. BIC values in bold font denote the best fit model, while BIC values in italics denote models which still have substantial support according to the rule of thumb by Burnham and Anderson (2003 people, and agricultural practices for example. Unfortunately such data is only available for recent decades. Thus, studies over longer periods of time are not feasible.

Data availability statement
No new data were created or analysed in this study.