Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach

Highlights • The spatial association between socio-demographic composition and COVID-19 deaths and cases were evaluated.• Four spatial regression models were implemented for spatial regression modelling.• Socio-demographic composition significantly impacting the overall casualties caused by COVID-19.• The spatially predicted COVID-19 cases and deaths were found highly consistent with actual estimates.


Introduction
The global pandemic caused by Coronavirus (COVID-19), a new genre of acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global health concern for its unpredictable nature and lack of adequate medicines (Gorbalenya et al., 2020;Ma et al., 2020;World Health Organization, 2020). Since no medicine is available yet to treat this novel disease, the rate of mortality and casualties due to COVID-19 is unimaginably rising worldwide from its first emergence in December 2019 in Wuhan, China. However, according to World Health Organization (2020), the rate of COVID-19 deaths depends on the immune system of a person, as most of the COVID-19 infected persons have experienced mild to moderate respiratory unwellness and cured without requiring special treatment. As of July 8th, 2020, 11,635,939 confirmed cases and 539,026 deaths of COVID-19 reported in 215 Countries (including the applied case definitions adopted for COVID-19 and various testing strategies adopted by different countries) (World Health Organization, 2020). Considering its surmount impact on overall human wellbeing, the United Nations, 2020, declared the disease as a social, human, and economic crisis. Most of the developing countries are experiencing the impacts and burden of this virus on the national economy. However, the negative socioeconomic consequences of COVID-19 are not only limited to developing countries, but the disease morbidity had also severely impacted the western developed countries as well (United Nations, 2020). The Congessional Research Service (2020) analyzed the economic impact of COVID-19 and predicted a 24 % reduction of annual global gross domestic product (GDP), an additional 13%-32% decline in global trade (Mollalo, Vahedi, & Rivera, 2020).
The socio-demographic factor plays a crucial role in shaping the pattern of COVID-19 positive cases and deaths across the globe. According to UNDESA (2019) and World Health Organization (2020), the inter(national) migrants, especially those involved in low-income jobs, are the most affected and vulnerable to death and infection of COVID-19. As of 22 April 2020, the migrants accounted for 10 % of the total population for 10 out of the 15 countries having the highest number of COVID-19 cases. However, in many cases, the migrants performed a crucial role in tackling the COVID-19 emergency by providing essential services in several critical sectors (United Nations Department of Economic & Social Welfare, 2020;World Health Organization, 2020). The aging factor is also found crucial in controlling COVID-19 deaths and spreading. The high number of COVID-19 deaths and infection in Italy may be linked with the demographic structure of the country. The median age of the population in Italy is 46.5 years, and nearly a quarter of its population is over the age of 65 (CIA, 2018). The same pattern is evident in Spain (the median age of the population is 43.9 years and more than 25,000 COVID-19 deaths reported so far in Spain) (Population Europe, 2020;Slate, 2020). According to World Health Organization (2020), in Europe, more than 95 percent of people who have died due to COVID-19 have been over 60. The long-time illness and existing respiratory disease history are also found associated with COVID-19 deaths. Zhou et al. (2020) study in Wuhan, China, found that patients with existing respiratory diseases, including hypertension, diabetes, and coronary heart disease, etc. are the most vulnerable to COVID-19 deaths. Almagro and Orane-hutchinson (2020) developed a regression model to evaluate the statistical significance between the control (neighborhood characteristics, occupations) and response variables (COVID-19 incidence) in New York City's neighborhoods. This study found that occupations were substantially explaining the observed COVID-19 patterns as people with high-level social outreach and higher social interaction were more vulnerable to be infected to the virus. Several other studies have also evaluated the association between the explanatory variables such as neighborhood characteristics (Borjas, 2020); age structure (Dowd et al., 2020;Kulu & Dorey, 2020); psychological interventions (Duan & Zhu, 2020); pre-existing health records ; population flows and control measures ; the influence of social and economic ties (Mogi & Spijker, 2020) and COVID-19 cases and deaths across the globe.
This study further advances the assessment of the impact of demographic and socio-economic parameters on the spread of COVID-19 cases and deaths across Europe by adopting spatial regression-based approaches. Spatial regressions models have been used extensively in many epidemiological studies ranges across the scale (Guo, Wu, & Chen, 2020;Guo, Yang et al., 2020;Liu, Sun, & Feng, 2020;Liu, Zhang, Jin, & Liu, 2020;Zhao, Zhan, Yao, & Yang, 2020). Diuk-Wasser, Brown, Andreadis, and Fish (2006)) evaluated the spatial distribution of mosquito vectors for West Nile virus in Connecticut, the USA using logistic regression models. Kauhl et al. (2015) have evaluated the spatial distribution of Hepatitis C virus infections and associated determinants using Geographically Weighted Poisson Regression (GWPR) model. Kauhl et al. (2015) study also advocated the uses of Geographic Information Systems (GIS) and spatial epidemiological methods for providing viable screening interventions with identifying spatial hotspots/clusters as well as demographic and socio-economic determinants that have a strong association with the casualties caused by the virus. Linard et al. (2007) study on determining the geographic distribution of Puumala virus and Lyme borreliosis infections in Belgium found that the environmental and socio-economic factors play a crucial role in controlling the spatial variation in disease risk. Mollalo et al. (2020) performed GIS-based spatial modelling to evaluate the impact of socioeconomic, behavioural, environmental, topographic, and demographic factors on COVID-19 incidence in the continental United States and found that different explanatory variables including income inequality, median household income, the proportion of black females, and the proportion of nurse practitioners, etc. largely control the spatial distribution of COVID-19 cases in the USA. Malesios et al. (2020) study evaluated the spatiotemporal evolution patterns of the Bluetongue virus outbreak on the island of Lesvos, Greece, and found a strong spatial autocorrelation between the spread of Bluetongue virus and farms located nearby.
The traditional statistical approaches, including principal component analysis (Varraso et al., 2012), clustering (Merlo et al., 2006), factor analysis (Meigs, 2000), single/multiple regression (Blyth, Kincaid, Craigen, & Bennet, 2001), multivariate regression (Lewis & Ward, 2013), etc. have been used extensively for epidemiological studies to identify the determinants that regulate the incidence, prevalence, and overall mortalities caused by any viruses. However, all these traditional statistical approaches are based on one fundamental assumption: that samples that have been used in these models are independent of one another (Kauhl et al., 2015;Wu, Chen, Cai et al., 2020). This classical and straightforward statistical assumption and ignorance of spatial dependency in parameter estimates led these approaches unreliable while addressing spatial dependencies in the observations. On the other hand, the spatial regression models (SRM), such as spatial lag model (SLM), spatial error model (SEM), spatial autoregressive model (SAM), spatial durbin model (SDM), geographically weightage regression (GWR), etc. were found highly effective and reliable when variables are locally varying, spatially dependent, and autocorrelated. Unlike ordinary regression, the spatial regression approach considers the spatial autocorrelation among the observation. Moreover, the spatial regression models can effectively estimate the influence of independent factors on target variables by differentiating the spatial dependence by including the lag and error components of independent features (Kauhl et al., 2015;Wu, Chen, Cai et al., 2020;Yang & Jin, 2010). These functional capabilities make the spatial regression models a promising alternative for spatial epidemiological studies. Only a few studies are available so far that investigated the close association between socio-demographic determinants and the spread of COVID-19 using spatial regression approach. Therefore, this study has made an effort to address the mentioned research gap and to provide effective solutions for future preparedness for COVID-19 like situation. The main objectives of this study are: (1) to identify the key socio-demographic driving factors that have a substantial impact on the overall pattern of COVID-19 casualties; (2) implementing global and local spatial regression models to assess the spatial association between the driving factors and COVID-19 cases/death.

Data source and variable selection
The COVID-19 cases and deaths data was retrieved from 31st December 2019 to 29th April 2020 from European Union Open data portal 1 . Few European countries (Albania, Andorra, Bosnia and Herzegovina, Czech Republic, Faroe Islands, Guernsey, Jan Mayen, Jersey, Liechtenstein, Macedonia, Monaco, Montenegro, San Marino, Serbia, and Turkey) were discarded from the analysis due to data unavailability. COVID cases and death per 100,000 population was considered for the predictive modelling and subsequent interpretation.
The socio-demographic data for the European region was collected from Eurostat 2 . Initially, a total of 28 socio-demographic variables have been identified for regression modelling. The description of the variables is given in Table S1. These variables have gone through all types of horizontal and cross-sectional adjustments to make the data comparable between different countries. The European Union Statistics on Income and Living Conditions 3 (EU-SILC) usually provides two types of data, (i) cross-sectional data, that takes a particular time or a time frame into consideration, and (ii) longitudinal data, concerning the changes of individual component overtime or a time period. The detailed background methodology about how these predictors were computed can be found on Eurostat 4 .
Using the stepwise forward regression approach, a total of 2 (for COVID-19 cases) and 3 (for COVID-19 deaths) variables were selected for the final analysis. The log transformation approach was adopted to address the scale effect and skewness in the datasets. To further ensure data normality, four different tests were employed, i.e., Shapiro-Wilk, Anderson-Darling, Lilliefors, and Jarque-Bera. All these four tests collectively indicate that data are normally distributed; hence, we accepted the null hypothesis, which assumes that "the variable from which the sample was extracted follows a normal distribution." For the COVID-19 deaths, a four parameters model was developed using three explanatory variables-income (Inc), poverty (Pov), and total population (TotPop). For the COVID-19 cases, a three-parameter regression model has been developed by incorporating 2 variables in the model, i.e., income (Inc) and poverty (Pov). These filtered variables had acceptable (<2) variable inflation factor (VIF) values and explained substantial model variances. All these variables exhibited spatial nonstationarity and hence produced a spatially dependent output in different modelling set-ups. Additionally, partial least square regression (PLSR) and principal components regression (PCR) modelling was conducted for identifying the key variables and to develop multivariate regression models for COVID-19 cases and deaths.
All four spatial regression models have produced statistically significant estimates at different probability levels. Additionally, both global (spatial autocorrelation) and local (Getis-Ord-Gi hotspot/cold spot) analysis was carried out for evaluating global and local distribution and significance of the features. Different spatial dependence tests, including Moran's I (error), Lagrange Multiplier (lag), Robust LM (lag), Lagrange Multiplier (error), Robust LM (error), Lagrange Multiplier (SARMA), etc. was performed to evaluate the spatial dependencies in observation and relevance of spatial regression modeling in this study.

Spatial regression modelling
The spatial regression models (SRM) have been used extensively for evaluating demographic pattern analysis (Chi & Zhu, 2008), estimating land surface temperature Jain et al., 2019), urban air quality monitoring (Fang, Liu, Li, Sun, & Miao, 2015), ecosystem service valuation Sannigrahi, Zhang, Pilla et al., 2020). The specific application of spatial regression models is to understand the spatial effects such as spatial autocorrelation, spatial stationarity, and heterogeneity of feature distribution. In this study, total four spatial regression models, i.e., Geographically Weighted Regression (GWR), Spatial Error Model (SEM), Spatial Lag Model (SLM), and Ordinary Least Square (OLS) models were implemented to evaluate how the socio-demographic factors are shaping the pattern of COVID-19 case/deaths across Europe. Among these four regression models, the global interaction between the socio-demographic factors and COVID-19 cases/deaths were analyzed using OLS, SEM, SLM models as these model are not impacted by spatial autocorrelation or homogeneity in the feature space. The local association between the control and response variables was calculated using the GWR model.
The GWR model is a local spatial regression model that assumes that traditional 'global' regression models such as OLS, SEM, SLM, etc. may not be effective enough do describe spatial variation of interactions, especially when spatial process varies with spatial context (Chen, Shen, & Wang, 2018;Oshan, Li, Kang, Wolf, & Stewart Fotheringham, 2019Mollalo et al., 2020). Unlike OLS, SEM, SLM models, the GWR model depends on the assumption of spatial non-stationarity and heterogeneity in feature space and quantifies the locally varying parameter estimates (Brunsdon, Fotheringham, & Charlton, 2002;Fotheringham and Oshan, 2016). GWR calculates the location-specific interaction among the control and response variables after integrating the spatially referenced data layers (Brunsdon et al., 2002;Lugoi, Bamutaze, Martinsen, Dick, & Almås, 2019;Fotheringham and Oshan, 2016). The GWR model will be ineffective and may produce biased estimates if spatially autocorrelated regression residuals are statistically significant, or one or more control variables exhibit unexpected spatial variation among the regression coefficients.
Where y i is the value of response variables, β i0 is the intercept, β ij is the jth regression parameter, X ij is the value of the j th explanatory variable, ε i is the random error. Parameter approximation for different explanatory variables was made using Fotheringham and Oshan, 2016 defined equation: Where β ∧ refers to the vector of the parameter estimates (m ×1), X denotes the matrix of the explanatory variables (n × m), W(i) is the local spatial weight matrix (n × n), y is the vector of the response variable (Fotheringham and Oshan, 2016;Mollalo et al., 2020).  suggested that the GWR can easily compute locally varying parameter estimates, and thus found to be highly effective to produce detailed spatially explicit maps of locational variations in relationships. Regarding the kernel selection and defining local weight matrix in the GWR model, an adaptive bi-square (based on nearest neighbor) kernel selection approach, which is found more accurate than a fixed distance-based kernel parametrization, was adopted for GWR modelling. For optimum bandwidth selection using the nearest neighbor's information, model inbuilt golden search function was used. Additionally, the selection of bandwidth and parametrization of the number of nearest neighbour was made by verifying the AIC values. The other spatial regression models (OLS, SEM, SLM) are global in nature, and therefore, no local spatial weight is parameterized for these models. However, for defining the global spatial weight, the first-order Queens' contiguity approach was adopted.
The OLS is a type of global regression models that examine the (non) spatial relationships between the set of control and response variables with the fundamental assumption of homogeneity and spatial nonvariability (Mollalo et al., 2020;Oshan et al., 2019;Sun, Wang, & Wang, 2020;Ward & Gleditsch, 2018): Where i and yi are the COVID-19 incidence parameters, β 0 is the intercept, xi is the vector of selected demographic variables, β is the vector of regression coefficients, and εi is a random error. The fundamental function of OLS is to optimize the regression coefficients (β) by reducing the sum of squared prediction errors (Anselin & Arribas-Bel, 2013;Mollalo et al., 2020;Oshan et al., 2019). The usual OLS method assumes that the residual errors are homogenous and un-correlated, and thereby the traditional OLS has proven to be inefficient when the errors are heterogeneous and spatially correlated and lead to a bias in regression coefficient estimation (Goodchild, Parks, & Steyaert, 1993;Yang & Jin, 2010).
The SLM is based on a "spatially-lagged dependent variable" and assumes the close association between the response and control variables. Additionally, SLM also assumes dependency between the independent variables, which denotes that an independent variable could be influenced by another independent variable in the neighbourhood region (Z. Wu, Chen, Han, Ke, & Liu, 2020). Therefore, spatial lag function, which computes the influence of adjacent independent variables on another independent variable, can be used as a new independent variable in spatial regression modelling (Wu, Chen, Han et al., 2020). The SLM incorporates spatial dependency between the parameters into the regression model (Anselin, 2003;Mollalo et al., 2020;Ward & Gleditsch, 2018;Wu, Chen, Han et al., 2020).
where ρ is the spatial lag parameter, and Wi is a vector of spatial weights (a row of the spatial weights matrix). The weight matrix (W) of SLM indicating the neighbors at location i and connects one independent variable to the explanatory variables in feature space (Anselin & Arribas-Bel, 2013;Mollalo et al., 2020).
The SEM assumes spatial dependence in the OLS residuals, which is generated from the OLS modelling error term, as OLS often ignoring the independent variables with spatial dependence in the modelling (Guo, Wu et al., 2020;Guo, Yang et al., 2020;Mollalo et al., 2020;Wu, Chen, Han et al., 2020). Therefore, the residuals of OLS are decomposed into two spatial components-error term and a random error term (for satisfying the assumption in the modelling).
where u i and u j are the error terms at locations i and j, respectively, and λ is the coefficient of spatial component errors.
The local model (GWR) was performed in ArcGIS Pro 2.5.0 5 . Additionally, the multiscale geographically weighted regression (MGWR) python package was used for estimating spatial local correlation coefficients (LCC), local condition number (CN), local variation inflation factors (VIF), local variation decomposition proportions (VDP) (Oshan et al., 2019). The other spatial regression models, i.e., SEM, SLM, OLS, were performed in GeoDa and GeoDaSpace software 6 . All the statistical analysis was performed in R studio 7 (an integrated development environment for R programming language), Python, XLSTAT 8 , and SPSS 9 software. Mapping and data visualization was done in ArcGIS Pro, Python, and R studio. The bivariate local Moran's I and multivariate local Geary C cluster and outlier analysis was performed using GeoDa software.

Results
The spatial distribution of COVID-19 cases and deaths are presented in Fig. 1. The highest number of cases (per 100,000 population) were observed in Luxembourg, Spain, Belgium, Ireland, Switzerland, Italy, etc. In comparison, comparably lower COVID-19 cases (per 100,000 people) were accounted in Bulgaria, Greece, Slovakia, Hungary, Poland, Latvia, Lithuania, respectively. In addition to this, moderate levels of COVID-19 cases (per 100,000 people) were counted for United Kingdom, Netherlands, Portugal, France, Sweden, and Austria ( Figs. 1 and 2). Considering the COVID-19 deaths (per 100,000 population) across Europe, the maximum counts were recorded in Belgium, Spain, Italy, France, United Kingdom, and moderate level morbidity was observed in Netherlands, Ireland, Sweden, Switzerland, Luxembourg, etc. Moreover, considerably lower COVID-19 deaths (per 100,000 population) were reported in Slovakia, Greece, Latvia, Malta, Croatia, Lithuania, Poland, Cyprus, Romania, Hungary, Slovenia, respectively ( Figs. 1 and 2). These heterogeneous distributions of COVID cases and deaths can be linked with the socio-demographic pattern of the country.
The spatially varying local R 2 and coefficient values for each explanatory variable were computed using the GWR model (Figs. 3-6, Table S2). These figures are collectively demonstrating the profound impact of socio-demographic components on overall COVID patterns in Europe. Considering local R 2 values for the cases, the highest association between socio-demographic variables and the COVID-19 cases was observed in Germany (R 2 = 0.93), Austria (R 2 = 0.93), Slovenia (R 2 = 0.92), Switzerland (R 2 = 0.92). The moderate local R 2 was observed for Italy (R 2 = 0.90), Luxembourg (R 2 = 0.90), Poland (R 2 = 0.90), Denmark R 2 = 0.89), Croatia (R 2 = 0.89), Belgium (R 2 = 0.87), Slovakia (R 2 = 0.87), and Netherlands (R 2 = 0.86). The lowest local R 2 value for COVID-19 cases was accounted for Ireland (R 2 = 0.56), Portugal (R 2 = 0.58), United Kingdom (R 2 = 0.66), and Spain (R 2 = 0.67) (Fig. 3). In addition to the Local R 2 approximation, the spatially varying coefficient values for the explanatory variables were also analyzed and presented in Fig. 4. For income factor, all the European countries exhibited a positive coefficient with different intensity, found highest in Germany, Belgium, Netherland, Italy, Austria, Slovenia, Switzerland. In contrast, comparably lower coefficient values have been approximated for Spain, Portugal, Ireland, Norway, Sweden, Finland (Fig. 4). On the other hand, a two-parameter GWR model, which has been developed for evaluating the association between poverty and COVID cases in the European countries, has produced a negative coefficient for many cases (Fig. 4). This indicates that poverty and COVID cases are negatively associated. For the COVID deaths, four-parameter regression models were developed using the local GWR model (Fig. 5). The highest association between the explanatory variables, i.e. poverty, income, total populations and COVID death was accounted for Italy (R 2 = 0.71), Croatia (R 2 = 0.68), Slovenia (R 2 = 0.67), Austria (R 2 = 0.65), Hungary (R 2 = 0.64), and Greece (R 2 = 0.64). Conversely, the minimum association between these variables and COVID deaths were found in United Kingdom (R 2 = 0.002), Ireland (R 2 = 0.07), Netherland (R 2 = 0.44), Cyprus (R 2 = 0.45), Lithuania (R 2 = 0.46), Latvia (R 2 = 0.48) (Fig. 5). In addition to the regression estimates, the spatially varying and autocorrelated coefficient estimates of the three explanatory variables were also measured and documented in Fig. 6. Among the three variables, income and total population have exhibited a positive coefficient in the regression modelling. As expected, a negative association between poverty and COVID deaths was found in Northern European region, especially in Norway, Finland, Sweden, Estonia, Latvia, Lithuania, respectively (Fig. 6).
The spatial distribution of four confirmatory components, i.e., local correlation coefficient, local condition number, local variation decomposition proportions, and local variation inflation factors, were computed using python programming, and the same is presented in Fig. 7. All these factors collectively indicate the robustness of the multiparameters GWR model conceptualized for this research. The lower values of spatial LCC, condition number, VDP, and VIF are suggesting that the independent variables chosen for the modelling does not have multi-collinearity problems among the observation and have substantial explanatory power to explain maximum model variances. Additionally, for all three explanatory variables, a negligible spatial local correlation and VIF values were computed, which lies within the acceptable threshold (Fig. 7). Based on these unbiased explanatory variables, a set of multi-parameter local GWR models was performed. For COVID cases, the three-parameter regression model has explained 88 % model variances, and for COVID death, the four-parameter regression model was 5 https://www.esri.com/en-us/arcgis/products/arcgis-pro/resources 6 https://geodacenter.github.io/GeoDaSpace/ 7 https://rstudio.com/ 8 https://www.xlstat.com/en/ 9 https://www.ibm.com/analytics/spss-statistics-software    (Table 1). Additionally, both models are statistically significant at P = 0.05 significance level.
The individual impact of the socio-demographic variables (2 for COVID-19 cases and 3 for COVID-19 death) on COVID incidences are also evaluated using both global (OLS) and local (GWR) regression model (Tables 2-4). Among the 2 variables approximated for the COVID cases, the highest local R 2 was calculated for income (R 2 = 0.82), followed by poverty (R 2 = 0.74) ( Table 2). However, the interaction effect of these variables was found much higher (R 2 = 0.85) than the individual effect. For COVID deaths, the total population factor exhibited the highest individual effect (R 2 = 0.68), followed by income (R 2 = 0.57), and poverty (R 2 = 0.55), respectively. All these values were found statistically significant at different probability levels. Considering the combined effect of these variables on COVID death, the interaction effect of income and total population was found highest (R 2 = 0.68), followed by income/poverty (R 2 = 0.62), and poverty/total population (R 2 = 0.62). Except for income/total population, the interaction effects of the other variables (income/poverty and poverty/total population)  was found higher than their individual effect. The global OLS model was also performed to re-confirm the local modelling estimates, and the results of the same are presented in Tables 3 and 4. The three-parameter regression model approximated for the COVID cases has explained 76 % model variances, which was found much lower than the local GWR model (88 %). For the COVID deaths, the four parameters global OLS model has explained 68 % model variances, which is also way lower than the variances explained by local GWR model (72 %). Therefore, after taking examples from the explanatory power of both local and global regression models, it is recommended to adopt local regression models when the dependent or target variables are spatially inter-linked and auto-correlated. The PLS and PCR models were also executed for  examining the model variances and to develop multivariate regression models for reproducibility and effective replication of the approaches adopted in this study. For cases, the PLS and PCR models have explained 68 % model variances, and for deaths, the PLS and PCR models have explained 63 % and 64 % model variances, which appears to be much lower than the variance explained by local GWR model (Table 5).
Using the multi-parameter regression models, the prediction of COVID-19 cases and deaths was made using the local GWR model (Fig. 8). For the cases, the income factor has predicted the COVID cases with 82 % accuracy. On the other hand, the prediction accuracy (77 %) was found much lower for the poverty factor. Combinedly, poverty and income factors have explained substantial model variances and predicted the COVID cases with 85 % accuracy. For the deaths, the predictive power of all three explanatory variables was measured and found highest for the total population (75 %), followed by poverty (62 %) and income (57 %). Therefore, the GWR based prediction for both cases and deaths is suggesting the superiority of spatial regression models in explaining the heterogeneous distribution of COVID cases and deaths across Europe. Additionally, spatial regression modelling will be highly effective, where spatial dependence among the observation is quite obvious and omnipresent. Figs. 9 and 10 shows the linear association between socio-demographic variables and COVID-19 for different parametric models. For the COVID-19 cases, the coefficient of determination (R 2 ) value was recorded as 0.74, while for the deaths, the linear model has explained 50 % model variances (Fig. 9). Fig. 10 explained the linear association between each socio-demographic variables and COVID-19 cases and death. Among the three socio-demographic variables, the income and total population factors exhibited a strong association with COVID cases/deaths. In comparison, the poverty factor has not shown any strong association with COVID cases/deaths. The correlation among and between the socio-demographic variables and COVID-19 cases and deaths are presented in Fig. 11. Among the driving factors, a high correlation was observed for income; a comparably weak association was accounted for the total population factor. Moreover, a weak negative correlation was found between poverty and COVID factors (Fig. 11). In addition to this, the correlation between the case and death factors (all the three explanatory variables approximated for cases/ deaths were merged together for evaluating their collective influence on COVID cases) and COVID counts was found higher than their individual impact. The spatial dependency among the observation was tested using global Moran's I statistics (Fig. 12). For both cases and deaths, a statistically significant spatial cluster was found. The Local Moran's I have produced a similar pattern, where 4 European countries are exhibiting a high-high spatial cluster, 3 countries show low-low spatial cluster, and a high-low spatial cluster was observed for a single country (Fig. S1). The spatial cluster and outliers were measured for the explanatory variables using the Local Geary C multivariate cluster method, and the same is presented in Fig. S1. A total of 6 statistically significant spatial clusters was calculated, among which 4 were significant at P = 0.05 significance level, and 2 clusters were found significant at P = 0.01 significance level. For evaluating the robustness and accuracy of the spatial regression models, the normality check was done for standardized residuals of the GWR model, and for all explanatory variables, statistically significant normality scores were measured, which combinedly suggesting that model estimates are not biased and irrelevant.
The overall summary of the four spatial regression models is reported in Table 6. Among the 2 socio-demographic variables chosen for the cases, the average R 2 was observed for income (R 2 = 0.71), followed by poverty (R 2 = 0.45), respectively. For deaths, the highest R 2 value was calculated for income (R 2 = 0.51), followed by total population (R 2 = 0.49), and poverty (R 2 = 0.39), respectively. Considering the results of all four spatial regression models, the socio-demographic variables explained 88 % model variances for the COVID-19 cases and 72 % model variances for the COVID-19 deaths (Table 1).

Discussion
The spatial distribution of COVID-19 deaths and confirmed cases were examined in order to understand how the socio-demographic structure of a country can regulate the overall casualties caused by the novel coronavirus. The distribution of the COVID-19 confirmed cases and deaths were found heterogeneous across Europe. This uneven distribution could be attributed to many corresponding factors, including demography, climatic, cultural, or socio-economic differences among the countries. For both COVID-19 cases and deaths, the maximum   records (actual values) were documented in the western European region (Spain, Italy, France, Germany, UK, Belgium, Netherlands). While, the cases and deaths were found lower (actual values) in the Eastern (Romania, Bulgaria, Greece, Estonia, Latvia, Lithuania) and Northern European region (Norway, Finland, Sweden). However, the above pattern is found somewhat different when the proportion or density of COVID cases and deaths were taken into consideration. For instance, the case density (cases per 100 000 persons) was found maximum in Luxembourg, Belgium, Spain, Ireland, and found minimum in Bulgaria, Greece, Slovakia. These statistical figures suggest that the rate of COVID infection, which indeed portray more logical and reliable estimates than its absolute counterparts, should be taken into account for unbiased estimation and effective interpretation of results. Additionally, the uneven distribution (for both cases and deaths of COVID-19) in the European countries could be linked to the age of the population (old age and median age of population). It can be seen in Fig. 2, the median age of population in Italy and Spain is 46.5 and 43.9, and these two countries affected badly by COVID pandemic in terms of the number of cases and deaths. As of 11th July 2020, total 34,938 and 28,403 deaths were recorded so far in Italy and Spain 10 . The spatial association between the socio-demographic variables and COVID-19 cases and deaths were found maximum in the central European regions (Germany, Switzerland, Italy, Austria). All these countries have been affected badly in terms of the total number of cases and deaths caused by COVID-19. For the cases, a weak association between income and the COVID-19 cases was evident in the western European countries (Portugal, Spain, Ireland). The same association was found considerably high in the central European region. This suggests that income factors do not have a uniform and spatial stationary interaction with COVID cases. Several factors could be responsible for this uneven distribution of spatial association. This includes the age structure of the population, ratio of the elderly population, ratio of dependent population, pre-existing health records, human mobility, the socio-economic structure of the society, etc. The intercept values were calculated for the two response variables   Table 6 Overall summary of spatial regression models that indicates the linkages between the socio-demographic variables and total COVID-19 cases and deaths across Europe.      SLM, SER, PCR, PLS, MLR, the individual and interaction effect of the demographic variables on COVID-19 cases and deaths were analyzed and reported. Among the 2 variables considered for COVID-19 cases, the income factor has strongly regulated the COVID-19 cases across the European region. For the COVID deaths, the income and total population factors were strongly correlated with the deaths and explained substantial model variances. The positive association between income/total population and COVID cases/deaths indicates that these two factors could be the key controlling variables that determine the overall casualties caused by this pandemic in the European countries. A similar close association between the socio-demographic factors and COVID-19 was observed in Wuhan, China (Wu, Chen, Cai et al., 2020). Wu, Chen, Cai et al. (2020) found a close association between pre-existing illness of the patients, including Acute Respiratory Distress Syndrome (ARDS) and Pneumonia and its association with COVID-19 and stated that patients with existing respiratory illness were more susceptible to COVID-19. The psychological status of the people, especially the old age people, is closely linked with the diagnostic of COVID 19 . Therefore, the combination of effective psychological interventions, including the lower level of psychological pressure and behavioural practices that boost mental health, can be used to improve the psychological status of vulnerable communities . Aging adults (>65) with long-term illness and incapable of household works were found highly vulnerable COVID-19 (Lakhani, 2020). The proportion of deaths due to COVID-19 in Italy, Netherlands, Spain, and France was 50 %, 58 %, 59 %, and 59 % for the population with age >80 (Medfod & Trias-Llimos, 2020). (Likassa, 2020), stated that the spatial distribution of COVID-19 cases was highly associated with case-fatality rate, and the linkages between these two variables were much stronger and reached up to 8.0 % for patients with the age group of 70-79 years and 14.8 % for patients aged >80 years. Likassa (2020) also stated that the high infection and death rate in China, Italy, Iran, and the USA, could be linked with the spread of previous virus outbreak. Lippi, Mattiuzzi, Sanchis-Gomar and Henry (2020) stated that three main determinantsmale sex, population with age >60, and pre-existing comorbidities such as diabetes, hypertension, chronic respiratory diseases, cancer, and cardiovascular disorders, strongly determine the rate of COVID-19 death and infection. These statistics signifying the inherent connections amongst the socio-demographic composition and overall COVID-19 deaths and cases reported so far in the European region (Almagro & Orane-hutchinson, 2020;Borjas, 2020;Dowd et al., 2020;Jia et al., 2020;Mollalo et al., 2020). Apart from the demographic factors, several climatic factors, including average temperature, minimum temperature, maximum temperature, rainfall, average humidity, wind speed, and air quality has also regulated the spread and casualties of COVID-19 (Bashir et al., 2020;Ma et al., 2020). The availability of sufficient SARS-CoV-2 testing centers is also found to be important for adopting control strategies and decision making for minimizing the impact of COVID-19 on the overall socio-ecological system (Rader et al., 2020). In addition to this, Kraemer et al. (2020) found that human mobility factors is the key critical factor that aggravated the spread of COVID-19 cases in China as the growth rates become stable or negative in some areas where strong control measures were implemented and mandatorily imposed. However, the mobility factors in the other regions where the stringent regulations were not implemented, still pose severe threats by transmitting the infection in the closest neighbours . Therefore, it has been suggested that paying more attention to controlling (inter) national migration, restricted population flows, modernizing the healthcare system by improving diagnosis and treatment capacity, and upgrading the public welfare system to make it fully functional for the crisis situation, could be the point of interest in order to fight against the COVID-19 like situation effectively (Su et al., 2020). Cities in the developing and developed world, including the UNESCO defined creative cities, which was designed in 2004 for providing maximum priority on creativity and sustainable urban development by achieving efficiency in all aspects, have been impacted badly by the outbreak of COVID pandemic. Though the long-term impact of COVID-19 on urban and city environment is challenging to predict, the historical shreds of evidence suggest that the long-term inelastic and old fashioned exhausted strategical plans of cities had always been shaped by strong interventions, like the outbreak of deadly viral diseases, natural calamities such as flood or earthquake 11 . Therefore, it is highly expected that in COVID-19 recovery period, the concerned stakeholders, including governments bodies, decision-makers, businesses leaders, city planners, land administrators, etc. will be directed in a way that forced them to re-think the viability of existing plans and find more comprehensive solutions that accelerate sustainability in city planning. There are few specific areas where complete structural reforms can be done, such as promoting a healthier building environment, which characterized by clean indoor air quality for both office and home environment. Healthier indoor environments can enhance the cognitive performance of a person that eventually boosts the logical and emotional intelligence of the person. Therefore, more focus on nature-based landscape design, such as open spaces and greenery for meditation, exercise, green material in living areas, proper ventilation system, uses of energy-efficient building materials, will be the need of the day.

Conclusion
In this study, the spatial association between the socio-demographic variables and COVID-19 cases and deaths was evaluated across Europe. The distribution of the COVID-19 confirmed cases and deaths were found heterogeneous across Europe. This uneven distribution could be attributed to many corresponding factors, including demography, climatic, cultural, or socio-economic differences among the countries. For both COVID-19 cases and deaths, the maximum records (actual values) were documented in the western European region (Spain, Italy, France, Germany, UK, Belgium, Netherlands). While, the cases and deaths were found lower (actual values) in the Eastern (Romania, Bulgaria, Greece, Estonia, Latvia, Lithuania) and Northern European region (Norway, Finland, Sweden). This can be attributed to the socio-demographic composition of these countries as Italy has the second oldest population (23 %) in the world and the oldest in Europe (population ages 65 and above). The population composition of the other European countries, i.e., Spain, France, and the Netherlands, that affected badly by COVID-19, is also dominated by senior and old age populations, thereby increasing the vulnerability to COVID-19 and many similar COVID-19 pandemics in the future. The case density (cases per 100, 000 persons) was found maximum in Luxembourg, Belgium, Spain, Ireland, and found minimum in Bulgaria, Greece, Slovakia. In comparison, the death intensity is highest in Belgium, Spain, Italy, and the lowest in Slovakia, Bulgaria, Greece, respectively. These statistical figures suggest that the rate of COVID infection and rate of COVID recovery, which indeed portrays a more logical and reliable estimate than its absolute counterparts, should be taken into account for unbiased estimation and effective interpretation of results. To explain this spatial non-stationary distribution of COVID cases/death and to examine the spatial dependency among the observation, several spatial regression models, including GWR, OLS, SLM, SEM, etc. was performed. COVID cases and death counts were considered as dependent variables for regression modelling. All the explanatory variables produced a statistically significant locally varying associations with COVID cases/death. The highest association was accounted for Germany, Austria, Italy, Spain, Luxembourg, Croatia. A strong positive association between income/total population and COVID cases/deaths is found in this study, which indicates that these two factors could be the key controlling variables that determine the overall casualties caused by COVID-19 European countries. In this study, the influence of the other controlling factors such as environmental pollution, socio-ecological status, climatic extremity, etc. have not been