The overall mortality caused by COVID-19 in the European region is highly associated with demographic composition: A spatial regression-based approach

The demographic factors have a substantial impact on the overall casualties caused by the COVID-19. In this study, the spatial association between the key demographic variables and COVID-19 cases and deaths were analyzed using the spatial regression models. Total 13 (for COVID-19 case factor) and 8 (for COVID-19 death factor) key variables were considered for the modelling. Total five spatial regression models such as Geographically weighted regression (GWR), Spatial Error Model (SEM), Spatial Lag Model (SLM), Spatial Error_Lag model (SEM_SLM), and Ordinary Least Square (OLS) were performed for the spatial modelling and mapping of model estimates. The local R2 values, which suggesting the influences of the selected demographic variables on overall casualties caused by COVID-19, was found highest in Italy and the UK. The moderate local R2 was observed for France, Belgium, Netherlands, Ireland, Denmark, Norway, Sweden, Poland, Slovakia, and Romania. The lowest local R2 value for COVID-19 cases was accounted for Latvia and Lithuania. Among the 13 variables, the highest local R2 was calculated for total population (R2 = 0.92), followed by death crude death rate (R2 = 0.9), long time illness (R2 = 0.84), population with age>80 (R2 = 0.59), employment (R2 = 0.46), life expectancy at 65 (R2 = 0.34), crude birth rate (R2 = 0.31), life expectancy (R2 = 0.31), Population with age 65-80 (R2 = 0.29), Population with age 15-24 (R2 = 0.27), Population with age 25-49 (R2 = 0.27), Population with age 0-14 (R2 = 0.23), and Population with age 50-65 (R2 = 0.23), respectively.


Introduction
The global pandemic caused by Coronavirus , a new genre of acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global health concerns for its unpredictable nature and lack of adequate medicines (WHO, 2020;Ma et al., 2020;Gorbalenya et al., 2020). Since no medicine is available yet to diagnose this novel disease, the rate of mortality and casualties due to COVID-19 is unimaginably rising worldwide from its first emergence in December 2019 in Wuhan, China. However, according to WHO, 2020, the rate of COVID-19 deaths depends on the immunity of a person, as most of the COVID-19 infected persons have experienced mild to moderate respiratory unwellness and cured without requiring special treatment. As of May 02, 2020, 3 272 202 cases and 230 104 deaths of COVID-19 reported in 215 Countries (including the applied case definitions adopted for COVID-19 and various testing strategies adopted by different countries) (WHO, 2020).
Considering its surmount impact on overall human development, the United Nations, 2020, declared the disease as a social, human, and economic crisis. Most of the developing countries are experiencing the impacts and burden of this virus on the national economy. However, the negative socioeconomic consequences of COVID-19 are not only limited to developing countries, but the disease morbidity had also severely impacted the western developed countries as well (United Nations, 2020). The Congressional Research Service (2020) analyzed the economic impact of COVID-19 and predicted a 24% reduction of annual global gross domestic product (GDP), a 13% to 32% decline in global trade (Mollalo et al., 2020).
The demographic factor plays a crucial role in shaping the pattern of COVID-19 positive cases and deaths across the globe. According to UNDESA (2019) and WHO (2020), the inter(national) migrants, especially those involved in low-income jobs, are the most affected and vulnerable to death and infection of COVID-19. As of 22 April 2020, the migrants accounted for 10% of the total population for 10 out of the 15 countries having the highest number of COVID-19 cases. However, in many cases, the migrants performed a crucial role in tackling the COVID-19 emergency by working in several critical sectors (UN DESA, 2020;WHO, 2020). Ageing factor is also found crucial in controlling COVID-19 deaths and spreading. The high number of COVID-19 deaths and infection in Italy may be linked with the demographic structure of the country. The median age of the population in Italy is 46 years, and nearly a quarter of its population over the age of 65 earmarked the country as number 4 th with a higher proportion of the old age population. The same pattern is evident in Spain (the median age of the population is 43.9 years and more than 25000 COVID-19 deaths reported so far in Spain) (Slate, 2020;Population Europe, 2020). According to WHO (2020), in Europe, more than 95 percent of people who have died due to COVID-19 have been over 60. The longtime illness and existing respiratory disease history are also found associated with COVID-19 deaths. Zhou et al., (2020) study in Wuhan, China found that patients with existing respiratory diseases, including hypertension, diabetes, and coronary heart disease, etc. are the most vulnerable to COVID-19 deaths. Almagro & Orane-hutchinson, (2020) developed a regression model to evaluate the statistical significance between the control (neighborhood characteristics, occupations) and response variables  in New York City's neighborhoods. This study found that occupations were substantially explaining the observed COVID-19 patterns as people with high-level social outreach and higher social interaction were more vulnerable to be infected to the virus. Several other studies have also evaluated the association between the explanatory variables such as neighborhood characteristics (Borjas, 2020); age structure (Dowd et al., 2020Kulu & Dorey, 2020; psychological interventions (Duan & Zhu, 2020); pre-existing health records (Fu et al., 2020); population flows and control measures (Kraemer et al., 2020); the influence of social and economic ties (Mogi & Spijker, 2020) and COVID-19 cases and deaths across the globe.
This study further advances the assessment of the impact of demographic parameters on the spread of COVID-19 cases and deaths across Europe by adopting spatial regressionbased approaches. Spatial regressions models have been used extensively in many virus studies ranges from local to a global scale (Zhao et al., 2020;Guo et al., 2020;Liu et al., 2020;Liu et al., 2020). Diuk-Wasser et al., (2006) evaluated the spatial distribution of mosquito vectors for West Nile virus in Connecticut, the USA using logistic regression models. Kauhl et al., (2015) have evaluated the spatial distribution of Hepatitis C virus infections and associated determinants using Geographically Weighted Poisson Regression (GWPR) model. Kauhl et al., (2015) advocated the uses of Geographic Information Systems (GIS) and spatial epidemiological methods for providing viable screening interventions with identifying spatial hotspots/clusters as well as demographic and socio-economic determinants that have a strong association with the casualties caused by the virus. Linard et al., (2007) study on determinants of the geographic distribution of Puumala virus and Lyme borreliosis infections in Belgium found that the environmental and socio-economic factors play a crucial role in determining the spatial variation in disease risk. Mollalo et al., (2020) performed GIS-based spatial modelling to evaluate the impact of socioeconomic, behavioural environmental, topographic, and demographic factors on COVID-19 incidence in the continental United States and found that different explanatory variables including income inequality, median household income, the proportion of black females, and the proportion of nurse practitioners, etc. largely control the spatial distribution of COVID-19 cases in the USA. Malesios et al., (2020) study evaluated the spatiotemporal evolution patterns of the bluetongue virus outbreak on the island of Lesvos, Greece, and found a strong spatial autocorrelation between the spread of bluetongue virus and farms located nearby. Since COVID-19 a novel virus and no study is available so far that evaluate the close association between the demographic determinants and spread of COVID-19, this study has made an effort to address the mentioned research gap and to provide effective solutions for future preparedness for COVID-19 like situation.

Data source and variable selection
Initially, a total of 28 demographic variables have been considered for the modelling and spatially explicit mapping of model estimates. The demographic data for the European region was collected from Eurostat 1 . The description of the variables chosen in this study is given in Table. 1. Using the regression models, including stepwise, forward, and backward regression models, a total of 13 (for COVID-19 case factor) and 8 (for COVID-19 death factor) variables were selected for the analysis ( Serbia, and Turkey) were removed from the analysis due to data unavailability. After filtering, a total of 31 European countries were selected for spatial regression modelling and mapping.

Spatial regression modelling
The spatial regression models (SRM) have been used extensively for evaluating demographic pattern analysis (Chi & Zhu, 2008), estimating land surface temperature (Jain et al., 2019;Chakraborti et al., 2018), urban air quality monitoring (Fang et al., 2015), ecosystem service valuation (Sannigrahi et al., 2020a;Sannigrahi et al., 2020b). Understanding the spatial effects such as spatial autocorrelation, spatial stationarity, and heterogeneity of a feature distribution is one of the fundamental applications of spatial regression models. In this study, total five spatial regression models include Geographically weighted regression (GWR), Spatial Error Model (SEM), Spatial Lag Model (SLM), Spatial Error_Lag model (SEM_SLM), and Ordinary Least Square (OLS) models were implemented to evaluate how the demographic factors are shaping the pattern of COVID-19 case/deaths across Europe. Among these five regression models, the global interaction between the demographic factors and COVID-19 cases/deaths were analyzed using OLS, SEM, SLM, and SEM_SLM models as these model are not impacted by spatial autocorrelation or homogeneity in the feature space. The local association between the control and response variables was calculated using the GWR model.
The GWR model is a local spatial regression model that assumes that traditional 'global' regression models such as OLS, SEM, SLM, etc. may not be effective enough do describe spatial variation of interactions, especially when spatial process varies with spatial context (Chen et al., 2018;Oshan et al., 2019). Unlike OLS, SEM, SLM spatial regression models, the GWR model depends on the assumption of spatial non-stationarity and heterogeneity in feature space and quantifies the locally varying parameter estimates (Fotheringham et al. 1996;Brundson et al., 2002). GWR calculates the location-specific interaction among the control and response variables after integrating the spatially referenced data layers (Brundson et al., 2002;Lugoi et al., 2019).
Where i Y is the response variable (COVID-19 case/death in this case), o  , i  , and  are the model parameters, a and bis the geographical coordinates (latitude and longitude), and X is the explanatory variables (demographic variables). (Brunsdon et al., 1996) suggested that the GWR can easily compute locally varying parameter estimates, and thus found to be highly effective to produce detailed spatially explicit maps of locational variations in relationships.
The OLS is a type of global regression models that examine the (non)spatial relationships between the set of control and response variables with the fundamental assumption of homogeneity and spatial non-variability (Sun et al., 2020;Oshan et al., 2019;Mollalo et al., 2020;Ward and Gleditsch, 2018): Where i and yi are the COVID-19 incidence parameters, β0 is the intercept, xi is the vector of The SLM is based on a "spatially-lagged dependent variable" and assumes the close association between the response and control variables. Additionally, SLM also assumes dependency between the independent variables, which denotes that an independent variable could be influenced by another independent variable in the neighbourhood region (Z. Wu et al., 2020). Therefore, spatial lag function, which computes the influence of adjacent independent variables on another independent variable, can be used as a new independent variable in spatial regression modelling (Z. Wu et al., 2020). The SLM incorporates spatial dependency between the parameters into the regression model (Anselin, 2003;Ward and Gleditsch, 2018;Mollalo et al., 2020;Z. Wu et al., 2020).
where ρ is the spatial lag parameter, and Wi is a vector of spatial weights. The weight matrix The SEM assumes spatial dependence in the OLS residuals, which is generated from the OLS modelling error term as OLS, often ignoring the spatial dependent independent variables in the modelling (Guo et al., 2020;Z. Wu et al., 2020;Mollalo et al., 2020). Therefore, the residuals of OLS are decomposed into two spatial components-error term and a random error term (for satisfying the assumption in the modelling).
where ui and uj are the error terms at locations i and j, respectively, and λ is the coefficient of spatial component errors.
The GWR model was performed using the ArcGIS Pro 2.5.0 3 . The other spatial regression models, i.e., SEM, SLM, OLS, SEM_SLM, were performed in GeoDaSpace software 4 . All the statistical analysis was performed in R studio 5 (an integrated development environment for R), Python, XLSTAT 6 , and SPSS 7 software. Mapping and data visualization was done in ArcGIS Pro and R studio.

Results
The spatial distribution of COVID-19 cases and deaths are presented in Fig. 1 Bulgaria, Greece, Slovenia, and Austria, respectively (Fig. 1).
The spatially varying local R 2 and intercept were computed using the GWR model for both COVID-19 case and death factors (Fig. 2). Considering local R 2 for the case factor, the highest association between the demographic variables and the COVID-19 case was observed in Italy and the UK. The moderate local R 2 was observed for France, Belgium, Netherlands, Ireland, Denmark, Norway, Sweden, Poland, Slovakia, and Romania. The lowest local R 2 value for COVID-19 cases was accounted for Latvia and Lithuania (Fig. 2). The intercept value was found in the western European region (Portugal, Spain, France, Ireland, UK, Netherlands, Belgium, Germany, and Denmark). The association between the demographic variables and 6 https://www.xlstat.com/en/ 7 https://www.ibm.com/analytics/spss-statistics-software COVID-19 death was also computed, and the said association was found highest in Italy, Greece, Bulgaria, Croatia, Slovenia (Fig. 2). Using 13 and 8 filtered demographic variables, the GWR model explained 92% and 93% model variances for COVID-19 cases and COVID-19 deaths (Table. 2). The adjusted R 2 was found higher for the COVID-19 death factor, which suggests the accuracy of the GWR model in explaining the spatial distribution of total COVID-19 deaths and its association with the demographic structure of the country. Additionally, the high local R 2 value derived from the GWR model for COVID-19 cases and deaths is also exhibiting the influence of population characteristics on the spread of the Corona pandemic.
Considering the spatial association between the Tot_Pop and COVID-19 cases, the highest local R 2 value was observed for Italy, the UK, Slovenia, and Croatia. The moderate association between Tot_Pop and COVID-19 case was found in Spain, France, Ireland, Germany, Belgium, Netherlands, Norway, Denmark, Sweden (Fig. 3), whereas the lower association between Tot_Pop and cases was accounted for Estonia and Latvia (Fig. 3). The local R 2 values estimated for Pop0_14, Pop25_49, Pop50_64, Pop65_79, Pop>80, LIlln_Edu, and D_CDR, was found minimum over the western European region, and relatively higher R 2 values for these variables were accounted in the northern and eastern European region (Fig. 3).
The spatial association between the 8 demographic variables (considered for COVID-19 death) and COVID-19 deaths were analysed and presented in Fig. 4 and = 0.26), and Inf_Mor (R 2 = 0.24) haven't produced any significant association with COVID-19 death factor. The spatially varying Local R 2 values of the demographic variables were found maximum in the southern and southeastern European region (Italy, Greece, Bulgaria). While a lower spatial R 2 value was recorded in the western region, except the Emplo variable (Fig. 4). For the COVID-19 case factor, the coefficient of determination (R 2 ) value was recorded as 0.79, while for the death factors, the linear model has explained 64% model variances (Fig. 5, Using the combination of demographic variables and the GWR model, the prediction of COVID-19 cases and deaths was performed and presented in Fig. 6. For the case factor, 93% model accuracy was observed between the actual and predicted COVID-19 cases. For the death factor, the accuracy was 97% between the predicted and actual COVID-19 death. The GWR based prediction for both case and death factors suggesting the effectivity of spatial regression models in explaining predicting the casualties caused by any epidemic/pandemic in the future time. Fig. 7 and Fig. 8 explained the linear association between the demographic variables and COVID-19 cases and death. For the case factor, a total 5 out of 13 variables have been strongly associated with the spread and COVID-19 cases. While for the death scenario, Tot_Pop and D_CDR were found to be highly associated with the COVID-19 death factor. The correlation among and between the demographic variables and COVID-19 cases and deaths are presented in Fig. 9. Among the causative factors, high correlation were observed for Tot_Pop, D_CDR, Pop>80, and LExp, respectively. Fig. 10 shows the similar pattern of association as Tot_Pop, D_CDR, LIlln_Edu, LExp are found to be key determinants and explained the maximum model variances.
The overall summary of the five spatial regression models are reported in Table. 7.

Discussion
The spatial distribution of COVID-19 deaths and positive cases were mapped, and its association with key demographic variables were evaluated to understand how the  Pneumonia and its association with COVID-19 and found that patients with exiting respiratory illness were more susceptible to COVID-19.

Conclusion
In this study, the spatial association between the demographic variables and COVID- Fotheringham, A., Charlton, M., & Brunsdon, C. (1996).       The linear association between demographic variables and COVID-19 case and death. Fig. 6 The predicted values of COVID-19 case and death derived from the GWR model. Fig. 7 The linear association between 13 demographic variables and the COVID-19 case. Fig. 8 The linear association between 8 demographic variables and COVID-19 death. Fig. 9 The correlation among and between the demographic variables and COVID-19 case and death.