1 Introduction

In December of 2019, the COVID-19 disease first occurred in Wuhan city, China, and rapidly spread across the globe. On March 11, 2020, WHO (World Health Organization) announced a pandemic, and as of March 11, 2021, more than 6,000,000 deaths and almost 450,000,000 incident cases have been officially recorded worldwide. Today, the COVID-19 Pandemic is one of the most critical disasters that humanity has faced and it has affected countries physically, socially, and especially economically (United Nations, 2020). According to data revealed by the Ministry of Health in Turkey, the COVID-19 Pandemic appeared in Turkey at the beginning of March 2020; it then rapidly spread across the country through metropolitan cities, such as Istanbul, Ankara, and Izmir. In particular, Istanbul, with its high total population and population density, has become the Wuhan of Turkey (Bahçetepe 2021). It is clear that COVID-19 can be transmitted easily across cities or regions, and researchers from different disciplines have investigated the reasons behind this dilemma (Franch-Pardo et al. 2020).

Although the reasons for variation in rate of spread of the COVID-19 pandemic remain somewhat unclear, many studies around the world have found that geo-environmental factors (Coccia 2020), demographic characteristics (Mollato et al. 2020; Arbel et al. 2021), and climatic conditions (Ma et al. 2020; Oto-Peralías 2020; Wu et al. 2020; Rios and Gianmoena 2021) can affect the severeness and spread of COVID-19. Some researchers have also focused on the relationship between health and social geography (De Kadt et al. 2020; Gibson and Rush, 2020; Kuupiel et al. 2020; Hierro and Maza 2022). Additionally, some have made interdisciplinary correlations between COVID-19 and social behavior (Allcot et al. 2020; Kuchler et al. 2020; Martinho 2021; Boumahdi et al. 2021; Cutrini and Savati 2021; Florida and Mellander 2022).

While the COVID-19 Pandemic has become a hot topic in the academic community, GIS (Geographic Information System) has also become a vital method to observe the spatial distribution of infectious diseases. GIS is already a helpful resource for preventing an epidemic and enhancing the potential of care (Mollalo et al. 2018, 2019; Xiong et al. 2020). Guan et al. (2020) first used the spatial statistics of GIS to understand relations between patients' demographic characteristics and the level of COVID-19. After these first geographic analyses provided a base and methodology for others, many researchers started to reproduce them for other countries on different scales, such as USA (Dong et al. 2020), South Korea (Rezaei et al. 2020), and Spain (Orea and Álvarez, 2020). Therefore, it is clear that GIS has become an essential tool for analyzing, capturing, and visualizing the COVID-19 epidemic. It profoundly affects decision-making in daily life and provides a predictive model for the onset of disease (FranchPardo et al. 2020).

The literature also contains some studies on the relationship between COVID-19 and spatial parameters in Turkey. Döker and Ocak (2020) discussed using GIS to monitor the geographic extent of the COVID-19 pandemic in Turkey. Zeren and Vildancı (2020) used hotspot analysis and the LISA model to understand the existing spatial pattern of the pandemic in Turkey. In addition, Zeren et al. (2021) evaluated the transmission of virus and mobility factors between sub-regions using the Global and Local Moran's I Indexes. Moreover, Istanbul Development Agency developed the vulnerability map of Istanbul using physical, economic, social, and environmental variables. However, no Turkish study has evaluated the impacts of spatial variables on the spread level of COVID-19. Therefore, this study aims to understand and analyze the association between determining physical, natural, economic, demographic, and mobility factors on Turkey's pandemic within the scope of recent developments and information.

Turkey and its 81 provinces (both cities and rural areas) were chosen as a case area following this purpose. The dataset was also prepared at the province level and obtained from open-source websites. In the dataset, the dependent variable is the COVID-19 incidence, revealed by the Ministry of Health on 2 April 2021. The fundamental reason to set this date is that vaccination is not yet widely available in the community, and new variants have not begun to dominate the pandemic. On the other hand, explanatory variables consist of 20 different demographic, economic, physical, natural, and mobility indicators. After data wrangling, independent variables were examined in terms of their significance and multicollinearity in the correlation through explanatory regression analysis. Finally, the global and local regression models (OLS, SLM, SEM, and GWR) were employed to analyze their statistical and spatial relationship in ArcGIS and GeoDa. The results are intended to present the first attempt to use regional geographic modeling in COVID-19 studies throughout Turkey and may provide policy-makers with valuable information for targeted intervention.

The structure of the paper comprises five main sections. Section two summarizes the place-based factors affecting the spread of the COVID-19 Pandemic on different scales through the studies found in the literature. In the third section, the research methodology is explained step by step. This section also contains explanations and details about the process of acquiring data and for Moran's I Index, Explanatory Regression Analysis, Ordinary Least Square, Spatial Lag Model, Spatial Error Model, and Geographically Weighted Regression. In section four, the results for each analysis type are explained separately. In the last section, variables associated with the spread of COVID-19 are also discussed in reference to the literature, and the paper is concluded by highlighting the study's primary results, constraints, and potential for future studies.

2 Place-based factors affecting the propagation of COVID-19

As mentioned before, many researchers from different countries have concentrated on the influence of spatial variables on the pandemic level. The first paper, published on 28 February 2020, explained the distribution of patients by province in China and their characteristics such as age, gender, symptoms, etc. (Guan et al. 2020). Following that, Chen et al. (2020) investigated the effect of migration between provinces in China; one of this study's notable results proving a significant correlation between migration and patient numbers. Gross et al. (2020) and Zhang et al. (2020) also worked on the ease of human mobility, short distance between provinces, and virus propagation in China. Similarly, Hierro and Maza (2022) worked on Madrid's inter-district mobility (and other socioeconomic factors) and its effect on disease propagation. Ignoring inter-city dynamism, Bogoch et al. (2020) underlined the global spread potential, examining the impact of air passenger transport on the global dispersion of the COVID-19 Pandemic. In short, all of these studies found the same result as the previous ones, the existence of a strong correlation between mobility and virus transmission.

There are also some studies which have focused on health geography topics specifically. De Kadt et al. (2020) explored the difficulties of providing health control measures, and poverty in South Africa. They prepared two maps, one of which is the risk factor index based on six risk determinants that can be recognized as obstacles to realizing basic hygiene and social distancing: overcrowded living circumstances, distribution of hygiene and water usage, dependency on public health services, low accessibility to communication mechanisms, and reliance on public transportation. On the other hand, the second one represents the vulnerability of neighborhoods based on their risk level. Again, in South Africa, Gibson and Rush (2020) evaluated the poverty, population density, and lack of infrastructure related to the vulnerability of settlements to COVID-19. On an international level, Padula and Davidson (2020) investigated the relation of the number of nurses to the mortality ratio of COVID-19 disease across countries, and they emphasized the negative correlation between these factors. Similarly, Lakhani (2020) conducted spatial research for Queensland, Australia, pointing out the strong relationship between the access difficulties of people over 65 years to health services and the disease's mortality rate. Finally, Florida and Mellander (2022) also investigated some place-based indicators and socioeconomic factors on the neighborhood scale in Sweden. They found that closeness to locations with higher risks of diseases, such as hospitals, was the most significant factor.

Other studies also built on interdisciplinary associations. Allcott et al. (2020) investigated the connection between cases of COVID-19, the community's negative attitude toward quarantine, and the dominant political party in each US state. The results show an exact relationship between provinces with lower social distancing reactions and higher rates of Republican voters. Strong reactions against quarantine were also detected in areas with higher official numbers of COVID-19 patients. Moreover, Kuchler et al. (2020) analyzed the correlation between the social connectivity index based on Facebook friendship links and COVID-19 patients in two regions significantly affected by the disease. Their results underline that a social connectivity index could support epidemiologists in predicting the extent of infectious disease.

Expecting to find correlations between the physical environment, geography, socioeconomic factors, and COVID-19 is another motivation behind the pandemic studies. Coccia (2020) analyzed the relationship between the geo-ecological and demographic characteristics of 55 Italian cities and the prevalence of COVID-19. In this research, he categorized the variables in each city: distance from sea, latitude, population density, air pollution (PM10), and climate indicators (average temperature, relative humidity, and predominant wind speed), finding a link between stimulated COVID-19 and high levels of air pollution, weather conditions, and cities far from the sea. Sridhar (2021) also set out to understand the impact of urbanization on COVID-19 prevalence in India. She determined socioeconomic factors of different levels for 600 districts of India. The results show that the high rate of urbanization, higher participation in the workforce, and higher income level contributed to increasing COVID-19 cases. Furthermore, while there is a strong positive correlation between urban poverty, slum areas, and the number of registered patients, there is a negative relationship between health services, open spaces, and COVID-19.

Similarly, Oto-Peralías (2020) worked in Spanish provinces, determining several geographical and socioeconomic indicators to describe the distinct differences in infected people between the regions. The results show significant correlation between temperature and distance from Madrid and COVID-19, suggesting that increased temperature and distance from Madrid adversely affect COVID-19 while reducing its contagion efficiency. Mollalo et al. (2020) also presented 35 explanatory determinants based on socioeconomic, behavioral, physical character, natural character, and demographic structure. They tested these factors through global and local spatial regression analysis. Results show that determinants related to the physical environment do not substantially impact the number of COVID-19 patients. However, four indicators (average household income, income imbalance, percentage of nurses, and percentage of black women) may define increased disease rate instability in the United States.

Many scholars have continued to use several global and local spatial regression techniques in their studies to test various place-based factors impacting the spread of COVID-19. Sannigrahi et al. (2020), Urban and Nakada (2021), Rahman et al. (2021), and Mansour et al. (2021) employed Ordinary Least Squares (OLS), Spatial Lag, and Spatial Error Regression models (SLM, SEM), and Multiscale Geographically Weighted Regression (MGWR) and Geographically Weighted Regression (GWR) as local models. Sannigrahi et al. (2020) evaluated the spatial relationship between 28 sociodemographic variables and COVID-19 deaths and cases in 31 European countries. They found total population, poverty, and income are key factors impacting the overall deaths caused by COVID-19. Urban and Nakada (2021) also focused on 18 demographic and socio-environmental variables, and they compared the efficiency of regression models to explain the spatial association between variables. Similarly, Rahman et al. (2021) analyzed 28 demographic, economic, built environment, health, and facilities-related factors for Bangladesh through OLS, SLM, SEM, and GWR. They also identified four risk factors affecting the COVID-19 incidence. Finally, Mansour et al. (2021) modeled the impact of 12 sociodemographic determinants on COVID-19 frequencies, and they underlined the effect of four variables and the success of MGWR in explaining spatial correlations.

In terms of economic factors, Curtini and Salvati (2021) investigated the spatial distribution of COVID-19 in Italy through selected global forces and economic drivers using regression models and local spatial analysis. The results show that urban agglomerations, including large-scale industries and companies, significantly affect COVID-19 incidence in different regions. Another study driven by Boumahdi et al. (2021) clarifies the role of industrial clusters in the propagation of viruses. They employed spatial statistical techniques, such as exploratory analysis and spatial econometrics, to prove the maximum impact of industrial activity; moreover, they examined this issue over different periods: before and after the lockdown implementations.

In addition, some scholars focused on the relationship between demographic factors and the COVID-19 cases on the country level. Hamidi et al. (2020) analyzed the direct and indirect impacts of density on the COVID-19 incidences in dense metropolitan areas of U.S countries using structural equation modeling. They highlighted that one of the most critical indicators of infection rates is the urban population; more significant metropolitan regions have higher infection and mortality rates. Also, Arbel et al. (2021) evaluate the influence of population density and some socioeconomic measures (Gini Index) of Israel's cities on COVID-19 infection. They highlight that those cities with high population density are more vulnerable to COVID-19 because of increased human interaction. They underline that urban policy-makers should enable and promote improved health infrastructure in denser cities. They also emphasized the moderate relationship between population density, size, income level, and COVID-19.

Finally, some researchers specifically concentrated on the influence of climate variables on the propagation of the disease. Sajadi et al. (2020) structured their study on temperature and humidity indicators, carrying out their investigations on a global perspective. On the other hand, Wang et al. (2020) evaluated the same topic on a national scale, observing that increased temperatures and humidity appeared to decrease the virus's transmission, compatible with similar behavior recorded for influenza transmission. Also, in China, Ma et al. (2020) investigated the correlation between COVID-19 mortality and Wuhan's daily temperature range and associated humidity. This research was the first to associate reported deaths to climate indicators and was followed by various studies examining this correlation between climate and COVID-19 globally, nationally, and locally.

To sum up, although the COVID-19 Pandemic is still a riddle wrapped in an enigma, scientists from different disciplines have analyzed this phenomenon to understand which factors increase its transmission and mortality and to predict future scenarios (Table 1). All these researchers sometimes reached the same conclusion in these studies, which they prepared using different statistical methods; besides, their results, though at the elementary level now, will become even more comprehensive and produce much more data in the future. Studies focusing on COVID-19, which have increased day by day and been carried out in different parts of the world using different data sets, will undoubtedly continue to dominate the literature for a long time.

Table 1 Summary of spatial COVID-19 studies in different scales

3 Data and model development

The research methodology consists of five main steps (Fig. 1). The first stage is designing the dataset derived from studies in the literature review with a similar research scope. Physical, economic, natural, demographic, and mobility indicators mentioned in articles were combined and evaluated according to their relevance for the case area. Besides, the readily available data in online sources were considered more generally in the factor determination process. The data are provided for the whole province, including urban and rural areas inside the administrative borders. Also, the data for certain factors, such as behavioral habits, humidity level, income level, percentage of people having chronic diseases, and the number of critical care units, do not exist at the city level, so they did not count in the model.

Fig. 1
figure 1

Research methodology

Consequently, 20 descriptive variables on the province scale were counted in the model (Table 2). The Turkish Statistical Institute's database, the Ministry of Environment, Urbanization, and Climate Change statistics, and the Social Security Institution were the fundamental sources for designing the dataset. Since February of 2021, the Ministry of Health has shared the weekly COVID-19 incidents in the cities with the public. Therefore, the number of COVID-19 cases in provinces on 2 April 2021, the anniversary of sharing city-level data with the public, was determined as a dependent variable. Also, the dynamics of dataset including minimum, maximum, median and mean values, standard deviations and quartiles, is shown in Table 3 to give detailed information about variables.

Table 2 Dependent and descriptive variables
Table 3 General dynamics of the dataset

In the second phase, GeoDa software was used to conduct Moran’s I Index and Hot Spot Analysis. The statistical distribution of the total COVID-19 patients on the city scale was tested using Moran's I Index, which analyzes spatial autocorrelation of features according to their geographical position and properties. This method determines whether the presented samples are grouped, dispersed, or random; besides, it tests the validity of the hypothesis through the estimation of a z score and p value. A positive Moran's I value means the existence of a tendency to cluster when the z score or p value shows statistical significance. In contrast, negative Moran's I values ​​indicate the direction of variance (Getis and Ord, 2010). For this analysis, a weight matrix of neighborhoods was provided through Queen contiguity. The order of contiguity was defined as one, meaning that a feature can be counted as a neighbor only if it has direct contact. Following the Moran's I Index results, Univariate Local Moran’s I, which is a spatial analysis and mapping process to reveal the clustering tendency of the phenomena, was employed under the scope of COVID-19 incidence.

Regression analysis was implemented to define what causes variation in the hotspot. Regression analysis provides a kind of model, and it examines and explores spatial relationships between dependent and explanatory variables. In other words, it is a statistical process for estimating the relationship between variables, and it helps to understand the factors behind observed spatial patterns better (Chatterjee and Hadi, 2015). Due to making this understanding and prediction process more manageable, the explanatory regression analysis can be used in the third stage before the ordinary least square regression. This method tests all combinations of descriptive variables, and provides summaries about multicollinearity and significance of variables that are directive results to build a trustable model.

In the fourth stage, models were produced in ArcGIS to show each variable’s effect on the COVID-19 incidence. Ordinary Least Square (OLS) is a non-spatial regression model, but it is a suitable beginning for all spatial regression techniques. It offers a global measure including one equation for all indicators that are the main topic of the understanding or prediction process (Mitchell 2005).

OLS assumes that the statements on the city level are independent observations; moreover, it does not regard spatial dependency in the equation. It does not consider spatial dependence due to the assumption of homogeneity and spatial non-variability. However, on COVID-19 dissemination, previous studies show that defining factors are spatially connected with the dependent one. Therefore, Spatial Lag Model (SLM) and Spatial Error Model (SEM) were also used along with OLS. Due to the global models still have limitations as they cannot account for a spatial non-stationarity issue, Geographically Weighted Regression (GWR) was employed in the last step to specify the model according to each feature's locations. GWR, introduced by Brunsdon et al. (1996), is a local regression model, and it claims that relationships between variables can vary across the study area. It provides one equation for each feature, so coefficients of variables can differ based on their locations (Wheeler and Paez 2010). GWR model was employed in ArcGIS software, and adaptive kernel type and AICc bandwidth were selected.

Briefly, this study has accepted a holistic and top–down approach as a research method. While the effectiveness of factors on the general distribution of COVID-19 was measured with OLS, the effectiveness of these factors was discussed over space with SLM, SEM, and GWR. Both modeling techniques are designed to work in coordination with one another. However, the R2 and AICc values of the four developed models are compared at the end through their performances, explaining COVID-19 incidence rates across Turkey.

4 Results

Before evaluating models, the distribution of COVID-19 patients in Turkey is mapped based on the data revealed on 2 April 2021 (Fig. 2). According to the data, Istanbul, Ankara, Izmir, Konya, Samsun, and Bursa were the most affected by the pandemic; on the other hand, Hakkari, Sirnak, Tunceli, Bitlis, Siirt, and Ardahan had the fewest COVID-19 incidents over the same time period.

Fig. 2
figure 2

The distribution of COVID-19 patients per city on 2 April 2021

4.1 Spatial autocorrelation and Moran’s I

The virus distribution is tested statistically using Moran's I index. Based on the first-order Queen’s contiguity, a weight matrix was created. Also, the descriptive statistics for the weight matrix are provided in Table 4. Accordingly, the Univariate Moran's I was employed through the produced matrix.

Table 4 Descriptive statistics of the weight matrix

The results show that the z score is 0.556, which means that the pattern appears to be significantly clustering in 0.01 confidence interval (Fig. 3). Given this score, it can be said that there is less than a 1% likelihood that this clustered pattern could be the result of random chance. When we examine the statistically significant clusters in detail through Univariate Moran's I, it can be said that there are two hot spots and two cold spots. In terms of hot spots, the first region is located in the northwestern part of Turkey, and it includes eight cities: Istanbul, Tekirdag, Kirklareli, Edirne, Canakkale, Bursa, and Kocaeli. This result can be explained by the existence of Istanbul, which dominates the Marmara Region for many reasons. The second hot spot is formed around six provinces in the Black Sea Region: Samsun, Corum, Sinop, Amasya, Tokat, and Ordu. On the other hand, while first cluster consists of eleven cities from the eastern part of Turkey which are Sanliurfa, Diyarbakir, Mardin, Batman, Mus, Siirt, Bitlis, Sirnak, Hakkari, Van, and Agri, other cold spot includes Denizli and Burdur from the western part of Turkey. Although the first hot spot can be conceptualized as partly due to the existence of Istanbul, explaining all these clusters is one of the main aims of this study. Such an explanation is attempted in the analyses that follow.

Fig. 3
figure 3

Results of Moran’s I

4.2 Explanatory regression

As mentioned before, the explanatory regression method examines all sets of descriptive indicators. While the explanatory regression was employed; the minimum acceptable R2 was determined as 0.5, the maximum VIF value was selected as 7.5, and the minimum acceptable Jarque Bera value and spatial autocorrelation value was arranged as 0.05. This regression analysis also produces results about multicollinearity and the importance of variables to create a trustable equation. When the summary of multicollinearity between variables is examined (Table 5), it can be said that 11 factors explain the same part of the equation, which negatively affects the model's reliability:

  • Young population (D1), elderly population (D2), active population (D3), population density (D4), and foreign immigrants (D6) in the demographic variables,

  • Active insured people (E9), registered unemployment (E10), and industry sector (E11) in the economic variables,

  • Health capacity (PY12) and the hospital capacity (PY13) in the physical factors, and

  • Car dependency (M19) in the mobility.

Table 5 Summary of multicollinearity

When choosing among the 11 indicators mentioned, it is essential to know at what level and how the factors explain the model, and Table 6 explains the significance of variables in the equation. According to the results, population density (D4) and hospital capacity (PY12) have the highest score among other variables. While population density explains the model at a 99.97% significance level and 100% positive, hospital capacity also describes the equation at a 99.58% significance level and 100% positive. Therefore, selecting these two factors and continuing the process with them are the most optimal option to ensure the accuracy of the possible model. As a result of selecting two indicators among 11 variables showing multicollinearity, the Ordinary Least Square (OLS) regression was employed with the rest of the 11 variables in total.

Table 6 Summary of variable significance

4.3 Global models (OLS, SEM, and SLM)

The Ordinary Least Square regression was the first step to analyze the effects of selected indicators behind the clustering tendency of COVID-19 incidence before using Geographically Weighted Regression. According to the first model's results (Table 7), only 5 of 11 variables are statistically significant based on their probability values to explain the relationship with the distribution of COVID-19 patients. These are population density (D4), net migration rate (D7), GDP per capita (E8), health capacity (PY12), and annual temperature (N17). However, before proceeding to the last stage of OLS, the five indicators that were found to be statistically significant were re-evaluated according to their significance level in the equation. According to Table 6, while GDP per capita (E8) explains the model with a 25.74% significance level and 75.37% negative, net migration rate (D7) also describes the equation at a 17.50% significance level and 61.79% positive. Therefore, these factors were not included in the final stage of the OLS, because their low significance level would be misleading in terms of the model's accuracy. With this in mind, OLS regression was run with three factors: population density (D4), the number of hospitals per 100.000 people (PY12), and the annual temperature (N17) (Table 8). According to probability values, each variable is statistically significant in the equation. When the VIF (Variance Inflation Factor) is examined, all values are less than 7.5, meaning that each variable defines a different part of the story. Also, according to Jarque–Bera Statistics, the value is 0.059301, higher than 0.01, meaning that model predictions are not biased. Finally, the Adjusted R-Squared value is 0.975890, so it can be said that the model performance is 98%. After verifying the model, it can be said that population density (D4) affected the propagation of COVID-19 most (99.97%), and health capacity (PY12) comes in second place at 98.95%. On the other hand, the annual temperature (N17) negatively correlates with the dependent variable. It explains the model at a 63.46% significance level, which is 96.3% negative. Briefly, this result emphasizes that about 98% of the COVID-19 incidence rates across Turkey are associated with the model's three factors.

Table 7 Summary of OLS results, first model
Table 8 Summary of OLS results, final model

When the distribution of error terms is examined (Fig. 4), the value was found to be more than zero (0.146), which means that the error terms tend to cluster and are not normally distributed. This indicates the importance of geography; therefore, SLM and SEM models were produced by integrating spatial dependence among the variables to enhance the outcomes of the overall OLS model (Table 9).

Fig. 4
figure 4

The distribution of error terms

Table 9 Results of the SLM and SEM models

According to the results, three factors (population density, health capacity, and annual temperature) sustain their significance in explaining the COVID-19 incidence rates across Turkey. On the other hand, the models' autoregressive lag coefficients (Rho and Lamda) were statistically significant between the 90% and 95% confidence levels.

When models are compared through their R2 and AICc values, Table 10 shows that both SLM and SEM have higher R2 values and lower AICc values than the OLS. SEM model provides the highest R2 value among other global models. This result indicates that the SEM model is the best option compared with OLS and SLM. However, the modeling performance of the COVID-19 incidence rates in Turkey might be enhanced if the model could be produced locally instead of globally. Therefore, GWR was employed to model the COVID-19 incidence on a local scale.

Table 10 Comparison of OLS, SLM, SEM, and GWR models

4.4 Geographically weighted regression (GWR)

After establishing a trustable model through OLS, SLM, and SEM, GWR was used to analyze whether the variables differ depending on their locations, and each factor was evaluated separately. As mentioned before, the highest R2 value comes from the SEM model; however, this value increased from 0.979 in the SEM to 0.988 in the GWR model (Table 10). Also, the AICc value decreased to the lowest ratio, 1393.33 in the GWR, compared to other global models. This result shows that the GWR model explains 98.8% of the variations of COVID-19 incidence across Turkey, and it is the most effective option in this context.

The spatial distribution of the coefficient values of the GWR model is examined in Fig. 5. The results show that population density (D4), annual temperature (N17), and health capacity (PY12) have almost similar patterns. According to Fig. 5, population density is essential in explaining the COVID-19 incidence in the western part of Turkey. This situation can be related to highly populated metropolitan cities in the west, such as Istanbul, Kocaeli, Bursa, Izmir, Ankara, and Konya, which also resulted in provinces with a higher number of cases. In terms of the health capacity factor, the situation is similar. In the northwest part of Turkey, health capacity influences COVID-19 incidence positively. When the annual temperature is examined, OLS regression already underlined the existence of a negative correlation with COVID-19 incidence. Similarly, the map shows that while the annual temperature is an influential variable in the western part of Turkey, it is less influential in the country's north-eastern region.

Fig. 5
figure 5

Spatial distribution of coefficient values

Figure 6 also shows the local R2 values of the GWR model. According to the results, it is clear that provinces located in the western part of Turkey have higher values than the east. In addition, it was determined that this value increased again toward the east after a specific region. GWR estimated the local R2 value of the province based on its surroundings, and provinces with the lowest R2 values are central provinces with neither dependent nor independent variables. Therefore, this situation can be explained through the hot and cold spots.

Fig. 6
figure 6

Spatial distribution of the local R2 values of GWR model

5 Discussion

The COVID-19 Pandemic is the most dangerous disaster of the twenty-first century, and it has affected countries physically and socially, but especially economically. Although their causes and effects are still unclear, many disciplines try to shed light on the matter by understanding the leading indicators behind the spread of the disease. Similarly, this study aims to examine the correlation between determining physical, natural, economic, demographic, and mobility factors on Turkey's pandemic within the scope of these recent developments and information. Following this purpose, Turkey and its 81 provinces were chosen as a case area, and cities were investigated using spatial/non-spatial and global/local regression methods: Explanatory Regression, Ordinary Least Square, Spatial Lag Model, Spatial Error Model, and Geographically Weighted Regression.

Three of the twenty factors, namely population density, health capacity, and annual temperature, are highly correlated with the spread of COVID-19 in Turkey. Arbel et al. (2021) emphasized that high population density increases the vulnerability of cities to COVID-19; moreover, Florida and Mellander (2022) underlined that the existence of hospitals increases the risk of infection due to the interaction between people. Also, many studies found a negative correlation between temperature and the spread of COVID-19 (Wang et al. 2020; Ma et al. 2020; Rios and Gianmoena 2021). Considering the findings from the literature, it can be said that the statistical significance of these factors in explaining the model is a sensible conclusion to draw.

On the other hand, many factors posited as influential in the propagation of virus by the literature were actually ineffective according to the OLS results. Hierro and Mazza (2022) and Chen et al. (2020) had already underlined the influential role of migration, which is related to human mobility, in the spatial distribution of COVID-19 disease. In addition, most of the studies focused on international mobility between countries, and underlined the positive correlation between air traffic and COVID-19. In terms of the socioeconomic perspective, while GDP per capita as a global variable impacts the dissemination of the virus in terms of income level (Curtini and Salvati 2021), Gibson and Rush (2020) evaluated the household size under the scope of social distance issues. They claim that the high density in informal settlements could increase vulnerability to the virus. Sridhar (2021) examined the relationship between the urbanization level of Indian cities and the spread of the virus and highlighted some positive and negative correlations in terms of selected physical factors such as open spaces. Coccia (2020) investigated the impact of high air pollution and special meteorological conditions on COVID-19 for different zones of Italy. Despite all these validating statements in the literature, the absence of a statistical relationship with COVID-19 within the scope of the current study can only be explained through Turkey's internal dynamics.

According to GWR results, it was also found that the impact of these factors differs spatially, especially in the West–East direction. In terms of population density, it is clear that this variable is an influential factor in explaining COVID-19 incidences in the provinces in the west, especially Istanbul, Kocaeli, Bursa, Tekirdag, Kirklareli, and Edirne. The situation shows a similar pattern when the health capacity factor is examined. This situation can be described through the density of metropolitan cities in the west compared to the east. They have the highest values regarding COVID-19 incidence, population density, and health capacity; therefore, they dominate the equation and determine the direction of correlation. The last factor, annual temperature, reflects the same result with other variables in a different direction. It is an influential variable in the western part of Turkey and less influential in the country's north-eastern region. Briefly, only 3 of the 20 factors are statistically associated with the spread of COVID-19 in Turkey. While these three factors differ spatially, metropolitan cities, especially Istanbul, Kocaeli, İzmir, and Ankara, have most affected the equation. Since each variable has a different dynamic based on the location, the direction of the correlation varies. However, it can be said that this correlation is generally formed on the East–West axis of Turkey.

This study also offers a chance to investigate different models and their performance in evaluating the role of place-based factor in the COVID-19 incidence. When three global (OLS, SLM, and SEM) and one local model (GWR) are compared, it is clear that while SEM is the best option between global model because of the higher R2 (0.979) and lower AICc values (1425.06), GWR performs the highest R2 (0.988) and lowest AICc values (1393.339). On the other hand, the high R2 values provided by all models stand out. This situation can be explained through the distribution of variables in the data set and their impact rates. In future studies, analyses can be repeated by normalizing the data, and comparisons can be made over R2 values.

Although OLS, SLM, SEM, and GWR can explain the correlation between dependent and explanatory variables on the global and local scale, this study's fundamental limitation was data availability. Although the Ministry of Health started to share the weekly COVID-19 cases on the city scale after February 2021, data about the mortality and ventilation cases which are used in correlation studies in the literature are still unclear. This situation affected the reliability index of the study. Besides, similar to COVID-19 mortality and ventilation rates, many explanatory factors underlined in the literature, such as behavioral habits, humidity level, and income level, do not exist at the city level, so they did not count in the model.

6 Conclusion

In conclusion, despite difficulties in finding data, the study did offer many remarkable results. Three factors are statistically associated with the spread of COVID-19 in Turkey. Three factors, population density, health capacity, and annual temperature, are statistically significant in explaining the spread of COVID-19 in Turkey's provinces. While these factors differ spatially, the correlation is generally constructed on the East–West axis of Turkey. Moreover, while all global and local methods explain the correlation between dependent and explanatory variables, it is clear that GWR is the best option among them because of its highest R2 and lowest AICc values. On the other hand, it should be considered that the findings are shaped in line with Turkey's internal dynamics and may vary according to the space-based conditions. These results provide a base for future studies, and they fill an important gap in the literature. Similar studies can be developed using new datasets and different statistical methodologies, spatial or non-spatial. In this way, urban policy-makers can develop appropriate approaches toward expected future pandemics, including that of COVID-19.