Spatiotemporal clustering patterns and sociodemographic determinants of COVID-19 (SARS-CoV-2) infections in Helsinki, Finland

This study aims to elucidate the variations in spatiotemporal patterns and sociodemographic determinants of SARS-CoV-2 infections in Helsinki, Finland. Global and local spatial autocorrelation were inspected with Moran's I and LISA statistics, and Getis-Ord Gi* statistics was used to identify the hot spot areas. Space-time statistics were used to detect clusters of high relative risk and regression models were implemented to explain sociodemographic determinants for the clusters. The findings revealed the presence of spatial autocorrelation and clustering of COVID-19 cases. High–high clusters and high relative risk areas emerged primarily in Helsinki's eastern neighborhoods, which are socioeconomically vulnerable, with a few exceptions revealing local outbreaks in other areas. The variation in COVID-19 rates was largely explained by median income and the number of foreign citizens in the population. Furthermore, the use of multiple spatiotemporal analysis methods are recommended to gain deeper insights into the complex spatiotemporal clustering patterns and sociodemographic determinants of the COVID-19 cases.


Introduction
Coronavirus disease , caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was first identified on 31 December 2019 in the Wuhan prefecture in the Hubei Province of China (WHO, 2020a), where the first cases were linked to the Huanan Seafood Wholesale Market (Hui et al., 2020). However, the origin of the pandemic has not yet been determined. On 11 March 2020, the World Health Organization (WHO) declared COVID-19 a pandemic (WHO, 2020b). In Finland, the first coronavirus case was diagnosed already on 29 January 2020 in Lapland (Haveri et al., 2020), while the first positive case in the City of Helsinki was diagnosed on February 25th (Jarva et al., 2021). Thereafter, the disease spread quickly after winter holiday travelers to Austria, Italy and Spain returned to Finland . In April and early May, the daily number of new cases remained relatively high (5.6% of tested individuals were positive) in the Greater Helsinki area (Jarva et al., 2021). In June and July, the COVID-19 epidemic eased temporarily in Finland and infections dropped steadily. In autumn 2020, a total number of new COVID-19 cases was on the rise and the second wave of coronavirus hit Finland, mainly seeded by new imported SARS-CoV-2 strains . The incidence of the second wave peaked in early December, and after a brief descent, started to rise again in mid-January. This heralded the beginning of a third wave driven by the more transmissible alpha (B.1.1.7) and beta (B.1.351) SARS-CoV-2 variants , which peaked in late March. This study focuses on the second and early third wave of the COVID-19 epidemic in the City of Helsinki from 28th October 2020 to 24th March 2021.
Since the outbreak of the COVID-19 pandemic in early 2020, there has been growing interest in spatial modeling of COVID-19 to understand spatial patterns and spatiotemporal dimensions of the disease spread (Fatima et al., 2021). Clustering, hot spot analysis, space-time scan statistics, and regression modeling have been the most commonly used spatial methods (Franch-Pardo et al., 2020). Global Moran's I and Local Indicators of Spatial Association (LISA) statistics have been used in previous COVID-19 studies to explore spatial epidemic dynamics of the virus Kang et al., 2020), and to examine spatial patterns of COVID-19 incidence cases and death rates (Cavalcante et al., 2020;Kim et al., 2021). According to Fatima et al. (2021), there has been less attention to analyze COVID-19 hot spots with Getis-Ord Gi* statistics (see e.g. Das et al., 2021;Lakhani, 2020;Mollalo et al., 2020). Previous COVID-19 studies, to the best of our knowledge, did not include a comparison of LISA and Getis-Ord Gi* statistics.
Space-time scan statistic (Kulldorff, 1997) has been used to analyze space-time clusters and risk areas of COVID-19 in a number of studies (Andrade et al., 2020;Cordes and Castro, 2020;Desjardins et al., 2020), which mainly employed the prospective Poisson space-time scan statistic method. However, when using a large maximum scanning window in this method, a large cluster can hide several smaller distinct clusters (Han et al., 2016). To overcome this problem, Gini Optimization parameter was introduced in SaTScan software (Kulldorff, 2021) but it is not yet widely utilized.
Traditional OLS regression methods and spatial regression models have been used to understand the contribution of socioeconomic, demographic and environmental determinants to explain spatial variability of COVID-19 incidences and mortality in the epidemic (Mansour et al., 2021;Maiti et al., 2021;Mena et al., 2021;Middya and Roy 2021;Mollalo et al., 2020;Zhang and Schwartz, 2020;Snyder and Parks, 2020). However, in the majority of these studies, only one regression-based method was employed. Furthermore, most of the earlier geospatial COVID-19 studies have been conducted at global-continental or country scale (McKenzie and Adams, 2020;Melin et al., 2020;Moonsammy et al., 2021;Pourghasemi et al., 2020;Sannigrahi et al., 2020;Sobral et al., 2020) or at county/municipality level (Han et al., 2021;Liu et al., 2021;Imdad et al., 2021;Martines et al., 2021;Rahman et al., 2021;Sun et al., 2020;Sun et al., 2021). Less research has been carried out at postal/ZIP code level (Cordes and Castro, 2020;DiMaggio et al., 2020;Kim et al., 2021). Furthermore, the majority of previous studies have investigated COVID-19 with only one temporal time slot. This study represents a one-of-a-kind effort to contribute to the existing geospatial COVID-19 research by analysing second and early third wave COVID-19 cases with multitemporal postal code level data using GIS and spatial modeling methods.

Purpose of the study
This study aims to identify and map the significant clusters of COVID-19 cases and significantly elevated disease risk areas in Helsinki, Finland. As recommended in previous studies, we used the prospective Poisson space-time scan statistic. Besides that, there is a need to investigate the Poisson space-time scan approach using Gini Optimization further, so we tested Gini Optimized cluster detection with retrospective purely spatial Poisson scan statistics. Furthermore, the goal of this research is to determine sociodemographic factors that influence the spread of COVID-19. We employed three different regression methods: Ordinary Least Squares (OLS) (Ward and Gleditsch, 2018), Geographically Weighted Regression (GWR) (Brunsdon et al., 1996), and Multiscale Geographically Weighted Regression (MGWR) (Fotheringham et al., 2017) with sociodemographic determinants to explain spatial variability of COVID-19 infection rates. This is the first comprehensive spatial analysis study of COVID-19 (SARS-CoV-2) infections in Finland, and it provides new insights into the disease's spatial spread, temporal trends and sociodemographic correlates. The findings could help public health services better target intervention locations and control disease spread. The spatiotemporal analysis methods presented here could be used to investigate and provide information to improve management and control of the ongoing COVID-19 crisis and future pandemics in other parts of the world.

Study area and COVID-19 infection data
Finland is situated in northern Europe, bordering Sweden, Norway and Russia. Helsinki, the capital of Finland, is located in the southern part of the country on the shore of the Gulf of Finland (Fig. 1). Helsinki is the most densely populated city in Finland (2934 people/km 2 ), with 653 835 inhabitants (31 December 2019) of whom 9.6% are of foreign origin (City of Helsinki, 2020a). The City of Helsinki is officially divided into 60 neighborhoods, and these neighborhoods have altogether 84 postal code areas (Fig. 1).
The City of Helsinki provided the dataset of COVID-19 infections at the postal code level. The data are publicly available online and include information on new COVID-19 infections and new cases per 100,000 residents at approximately 14-day intervals (14-day notification rate) (City of Helsinki, 2020b). This study used data from 28th October 2020 to 24th March 2021. During this period, a total of 21,668 COVID-19 infections were diagnosed in Helsinki comprising 29.5% of all cases in Finland during the period ((City of Helsinki, 2020b). There were 69 postal code areas in this study for which COVID-19 data were available for 11 time periods (Fig. 2). Due to sensitivity issues, data for 15 postal code areas were missing (less than five COVID-19 infection cases in postal code areas cannot be published) (City of Helsinki, 2020b).

Sociodemographic data for regression analysis
Statistics Finland (2020) provided sociodemographic data, and the Helsinki Region Environmental Services SeutuData'19 database provided data on foreign citizens (HSY, 2019). Based on previous COVID-19 studies DiMaggio et al., 2020;Liu et al., 2021;Mansour et al., 2021;Mollalo et al., 2020), seven potential predictor variables for regression analyses were chosen as outlined in Table 1.

Spatial association of COVID-19 infections in the City of Helsinki
First, we used global Moran's I to test the spatial independence of COVID-19 infection cases per 100,000 residents in Helsinki (Moran, 1950). Global Moran's I computes the degree of spatial autocorrelation across the entire study area. Moran's I values range from − 1 to +1, with a positive Moran's I value indicating similar value clustering, a negative Moran's I value indicating dissimilar value clustering, and a value of 0 indicating random distribution.
Second, two analyses of local spatial autocorrelation were performed: the Local Indicators of Spatial Association (LISA) (Anselin, 1995) and the Getis-Ord Gi* statistic (Getis and Ord, 1992). LISA analysis was used to detect statistically significant local spatial clusters with high values high-high (H-H) or low values low-low (L-L), as well as spatial outliers with high-low (H-L) or low-high (L-H). The cluster type distinguishes a statistically significant cluster of high values (H-H), a statistically significant cluster of low values (L-L), an outlier in which a high value is surrounded primarily by low values (H-L), and an outlier in which a low value is surrounded primarily by high values (L-H). The Getis-Ord Gi* statistic was used to determine the geographic distribution of potential COVID-19 infection hot spots (high values) and cold spots (low values) per 100,000 residents.

Space-time clustering patterns and epidemic curve
The space-time patterns of COVID-19 infections in Helsinki were examined in SaTScan software (version 10.0) to detect significant (p-value<0.05) space-time clusters using the prospective Poisson spacetime scan statistic and the retrospective purely spatial Poisson scan statistic (Kulldorff, 2021). First, we ran purely spatial scan statistics for three time periods: 28.10. 2020-9.12.2020, 23.12.2020-10.2.2021, and 24.2.2021-24.3.2021. We computed pure spatial scan statistics with and without Gini optimization to see if there were any 'Gini clusters' in the data. Then, for the same time periods, we used prospective Poisson space-time scan statistics to detect active and emerging clusters. We restricted the spatial and temporal scanning windows to include ≤20% of the population at-risk and ≤50% of the study period. In addition, each candidate must contain at least 5 infection cases at a minimum duration of 2 days Hohl et al., 2020). To avoid detecting extremely large clusters, larger spatial and temporal windows were not chosen . To determine the statistical significance of space-time clusters, we used Monte Carlo testing with 999 simulations. The relative risk maps for COVID-19 infections were then reported and visualized for three different time periods, presenting spatiotemporal variation of significant (p-value<0.05) relative risk (RR) clusters with significantly higher observed cases compared to expected, and relative risk values for each postal code area. In addition, an epidemic curve for COVID-19 infections in seven major districts of Helsinki was created.

Regression modeling
For each of the 14-day datasets, the ArcGIS Exploratory Regression data-mining tool was used prior to regression modeling. All possible combinations of the seven sociodemographic input candidate explanatory variables for regression models were evaluated by the tool. The tool looks for the best ordinary least squares (OLS) models that meet certain criteria. Refer to Supplementary Materials 1 for more information. Variable combinations with the lowest corrected Akaike Information Criterion (AICc) (Akaike, 1974) value were chosen for regression analyses for each model. To identify the significant sociodemographic determinants of COVID-19 infection rates, we used three regression methods: OLS, GWR, and MGWR. Their detailed discussions can be found elsewhere, for example (Brunsdon et al., 1996;Fotheringham and Oshan, 2016;Ward and Gleditsch, 2018;Oshan et al., 2019) and their equations are presented in Supplementary Materials 1.
We may assume that COVID-19 infections are spatially autocorrelated, which violates the implicit assumptions of OLS. In order to allow parameters varying spatially, we used GWR, which calculates regression coefficients for each individual data entity (in this case, postal code areas) separately rather of estimating global values for regression parameters (Fotheringham and Oshan, 2016). However, GWR assumes that the scale of relationships remains constant across space, which may not be the case in spatial models. To overcome this implicit assumption, we applied MGWR (Fotheringham et al., 2017;Yu et al., 2020). For comparison, all regression models (OLS, GWR, and MGWR) were implemented with the same variables in the MGWR 2.2 software (Oshan et al., 2019). To compare the model performances, the adjusted R 2 and AICc were used. A higher adjusted R 2 value and a lower AICc value indicated the best model fit. Moran's I was used to see if there was any significant spatial autocorrelation in the regression model residuals.

Global spatial autocorrelation
This study examined whether a spatial association occurred in the new infection cases of COVID-19 per 100,000 residents in the City of Helsinki. Results, presented in Table 2, indicate that positive spatial autocorrelation was detected during the whole study period from 28.10.2020 to 24.3.2021 (Moran's I = 0.1393, p<0.029). Positive spatial autocorrelation, on the other hand, was relatively low for the majority of the time periods. Global Moran's I statistic reveals that the distribution of COVID-19 infections is not random for most of the studied periods, and spatial clustering was detected. Moran's I statistics with the highest values were 0.3243 for 11.11.2020, 0.2992 for 24.02.2021, 0.2645 for 24.03.201, and 0.1866 for 10.03.2021. Moran's I produced lower values for other dates. Global Moran's I statistics in the City of Helsinki were only moderate at best, indicating spatial heterogeneity of COVID-19 case rates between postal code areas. As a result, it

Local spatial autocorrelation
LISA cluster maps were used to indicate the locations of significant spatial clusters and outliers of COVID-19 infection rates. Fig. 3 depicts the postal code areas associated with high-high (H-H), high-low (H-L), low-high (L-H), and low-low (L-L) values of COVID-19 infection rates in the LISA cluster map. Throughout the study period of 28th October 2020 to 24th March 2021 high-high (H-H) clustering areas were mostly found in the eastern parts of the city, while low-low (L-L) clusters were mostly found in the city's western postal code areas (Fig. 3). When the COVID-19 epidemic eased temporarily in Helsinki in December 2020 and January 2021, there were few exceptions to the general pattern. Only few high-high (H-H) clusters emerged as local outbreaks. Many high-high (H-H) clusters were identified in the eastern parts of the city where SARS-CoV-2 infections spread rapidly in late February and early March 2021. Low-low (L-L) clusters were mostly observed in the city's western and southern outskirts. Low-high (L-H) outliers were mostly found near high-high (H-H) clusters, whereas high-low (H-L) outliers were evenly distributed across Helsinki (Fig. 3).

Hot spot analysis of COVID-19 infections
To detect hot spot and cold spot clusters of new COVID-19 cases in Helsinki, Getis-Ord Gi* hot spot analysis was used as an alternative method to LISA. Overall, the patterns were similar to those found with LISA analysis, indicating that the eastern suburbs were hot spots during the majority of the studied periods, while the western parts of Helsinki  were cold spots (Fig. 4). However, a comparison of the LISA map (Fig 3) and the hot spots map (Fig 4) reveals certain differences in recognition of COVID-19 hot and cold spots that must be considered when interpreting the results.

Spatial distribution of pure spatial and space-time clusters of COVID-19 and epidemic curve
The results of the retrospective pure spatial Poisson scan statistic and the prospective Poisson space-time scan statistic show that the detected COVID-19 clusters varied over time and space in Helsinki (Figs. 5 and 6). Tables S1, S2 and S3 in Supplementary Materials 2, provide the characteristics of the statistically significant (p<0.05) pure spatial scan statistic and space-time scan results of the COVID-19 infection rates at the postal code level at three different aggregated time periods, and from October 28th 2020 to March 24th 2021.

Pure spatial scan statistic results
From late October to mid-December 2020 (28.10.2020-9.12.2020), two clusters (C1, C4) were detected in the eastern parts of Helsinki, two clusters (C2, C3) were identified in northwestern areas and one cluster (C5) was found closer to the city center (Fig. 5 a1-2). A similar pattern was observed in the 23.12.2020-10.2.2021 data (Fig. 5 b1-2), with the main cluster detected in the eastern parts of Helsinki and smaller clusters detected in northwestern areas and in the central parts of Helsinki. The smaller clusters had a high relative risk of COVID-19 infection. It is worth noting that one 'Gini cluster' (C5) was discovered in the eastern suburbs (Fig. 5 b2). The COVID-19 situation deteriorated in February and March 2021, as the number of infections increased rapidly, particularly in the eastern suburbs, where the main cluster with a high relative risk (RR 1.94) was discovered (Fig. 5 c1). Two more clusters (C2 and C3) were detected in the central and southeastern regions (Fig. 5 c1). Furthermore, the eastern suburbs had a high relative risk (RR) and in the eastern parts of Helsinki, many non-overlapping 'Gini clusters' were detected (Fig. 5 c2).

Space-time scan statistic results
According to the results of prospective space-time scan statistics of COVID-19 clustering, the main clusters were active or emerged in the eastern parts of Helsinki, and smaller clusters were detected in other areas, possibly as a result of local outbreaks (Fig. 6). Space-time scan statistics, on the other hand, detected less significant clusters than pure   2020-9.12.2020, 23.12.2020-10.2.2021 and 24.2.2021-24.3.2021. Maps on the left present pure spatial scan statistics (a1, b1 and c1) and maps on the right (a2, b2 and c2) present pure spatial scan statistic with "Gini optimized" clusters.  Table S3 in Supplementary Materials 2 describes the characteristics of the statistically significant active and emerging space-time clusters of COVID-19 infection rates at the postal code level over three time periods. Fig. 7 depicts the epidemic curve of new COVID-19 infection cases per 100,000 Helsinki residents based on major districts. The eastern district had the highest infection rates throughout the study period, while the northern and southeastern districts had the lowest. Infection rates were relatively low in all districts in October 2020, but began to rise in November 2020 and continued to rise in December 2020. The spread of COVID-19 slowed in January 2021, but infection rates began to rise again in February 2021, particularly in the eastern district. Infections peaked in all districts in March 2021.

Performance of global and local regression models
Different regression model methods produced slightly different model-fit results. Models were unable to explain variation in COVID-19 Fig. 7. The epidemic curve indicating the new COVID-19 infection cases per 100,000 residents per two weeks period in different major districts of Helsinki. data over many time periods, resulting in very low adjusted R 2 values (Table 3). Overall, MGWR models outperformed all other models, with the highest adjusted R 2 and lowest AICc values. Unexpectedly, at three time periods: 9.12.2020, 23.12.2020, and 13.01.2021, OLS models outperformed MGRW and GWR models. GWR models performed best on two occasions: 10.02.2021 and 24.02.2021. Supplementary Materials 2 shows the regression variables and coefficients of the final multivariate OLS models for COVID-19 (Tables S4 and S5). Supplementary Materials 3 contains the results of all regression modeling (OLS, GWR, and MGWR). According to Supplementary Materials 3, for the entire study period (28.10.2020-24.03.2021), the OLS model with the variables: median income of inhabitants, number of foreign citizens, and pensioners could explain approximately 40% of the variation in COVID-19 infection rates data (adjusted R 2 =0.401), for GWR adjusted R 2 was 0.453, and for MGWR (adjusted R 2 =0.436). Moran's I statistics were computed in all OLS, GWR, and MGWR regression model residuals, and Moran's I test revealed significant spatial autocorrelation in some of the OLS models. Spatial autocorrelation was absent in all GWR and MGWR regression models. Fig. 8 show the residuals and local R 2 values of the OLS, GWR, and MGWR models for COVID-19 infection rate data from 28.10.2020 to 24.3.2021. Generally, local R 2 values for GWR and MGWR models are high in the eastern part of the study area, decreasing gradually towards the western part of the city.

Sociodemographic variables explaining variation in COVID-19 infection data
In this study, sociodemographic variables were investigated to determine the best predictors of COVID-19 infections in postal code areas in the City of Helsinki. To identify the best predictors, we used linear regression with the dependent variable being the COVID-19 case median rate for the entire study period of 28.10.2020-24.03.2021 (Table 4). Each independent sociodemographic variable was tested separately. Table 4 also shows how frequently each sociodemographic variable was chosen for the OLS regression models out of a total of 11 regression models run for different 14-day time periods to explain the variation in SARS-CoV-2 infection rates data. Fig. 9 presents the spatial distribution of median COVID-19 infection rates and sociodemographic variables that best explain the variation in median COVID-19 infection rates in the City of Helsinki from October 28th, 2020 to March 24th, 2021. The maps show that COVID-19 infections are concentrated in areas with a lower median income, a relatively high number of foreign citizens, a low level of education, and a high number of unemployed citizens (Fig. 9).

Discussion
Throughout the epidemic, Helsinki has been the COVID-19 disease epicenter in Finland. Approximately 30% of all COVID-19 infections in Finland were diagnosed in Helsinki between the 28th of October 2020 and the 24th of March 2021. This study sought to understand the spatiotemporal clustering patterns and sociodemographic determinants of the second and early third wave of COVID-19 (SARS-CoV-2) infections. We demonstrate a holistic approach to analyze COVID-19 epidemic at local level using four spatial and spatiotemporal techniques; global and local spatial autocorrelation (Moran's I and LISA), Getis-Ord Gi* hot spot analysis, space-time scan statistics, and three regression modeling methods (OLS, GWR and MGWR).

Global and local spatial autocorrelation of COVID-19 in the City of Helsinki
Global Moran's I analysis revealed that there was a moderate positive spatial autocorrelation (Moran's I = 0.1393, pseudo p-value=0.029) between 28th October 2020 and 24th March 2021, indicating that COVID-19 infection rates were not randomly distributed in Helsinki with clear variations in different time periods (Table 2). The LISA map (Fig. 3) shows high-high COVID-19 clusters in the eastern parts of Helsinki. With a few exceptions, the Getis-Ord Gi* hot spot map yields essentially similar results (Fig 4). Moran's I value was relatively low from late November 2020 to February 2021, with the absence of statistically significant clusters from the majority of Helsinki (Figs. 3 and 4). There were only few significant COVID-19 clusters that could be identified as local outbreaks during that time period. COVID-19 infections began to rise with the third wave of coronavirus in mid-February 2021. Moran's I value was statistically significant from data on February 24, 2021 to the end of the study period. LISA and Getis-Ord Gi* hotspot maps (Figs 3 and 4) shows that COVID-19 infections have strong clusters in eastern part of Helsinki. Previous research has shown that Moran's I is a useful method for understanding the overall spatial dependency of COVID-19 and for presenting reliable information about the disease's spatiotemporal patterns to local health authorities and policymakers Kang et al., 2020;Kim et al., 2021). However, global Moran's I cannot reveal spatial heterogeneity of the studied phenomenon, and local spatial autocorrelation analysis i.e. LISA statistics analysis is recommended to map the local variation of COVID-19-related phenomena. Liu et al., 2021;Sun et al., 2021). We utilized Getis-Ord Gi* hot spot analysis to compare clustering patterns from the LISA statistic. Interestingly, LISA clusters and Getis-Ord Gi* hot spot analysis maps yielded partly different clustering results (Figs. 3 and 4). This may imply that both cluster analysis methods, as observed in the previous study (Sánchez-Martn et al., 2019), are required to gain a deeper understanding of the spatiotemporal clustering of the studied phenomena. Overall, our findings are consistent with previous studies in which LISA analysis of COVID-19 infection or mortality cases revealed local clustering patterns Liu et al., 2021;Sun et al., 2021). Unlike previous studies, we were able to examine short-term variations in COVID-19 cases at 14-day intervals (14-day notification rate). We discovered spatiotemporal variations in which COVID-19 clusters emerged at short intervals, such as high-high (H-H) and low-low (L-L) clusters most likely related to local outbreaks (Figs. 3 and 4).

Space-time scan statistics, spatiotemporal trends and epidemic curve
Space-time scan analysis (Kulldorff, 1997) was performed over three aggregated time periods, yielding statistically significant COVID-19 clusters. Throughout the study period, clusters were found in the eastern areas of the City of Helsinki, though local clusters emerged in other parts of the city as well. A similar pattern was discovered in the relative risk of COVID-19 in postal code areas (Figs. 5 and 6). The prospective Poisson Space-time scan analysis method has been found to be a valuable method for detecting COVID-19-related clusters and relative risk areas Masrur et al., 2020;Xu et al., 2021). In order to detect smaller secondary clusters, we tested Gini Optimized cluster detection with the retrospective purely spatial Poisson scan statistic method, rarely used in previous COVID-19 studies. We discovered a number of secondary clusters in eastern Helsinki, particularly between 24.2.2021 and 24.3.2021 (Fig 5, c2), coinciding with the third wave of the COVID-19 epidemic, when the disease caused by alpha and beta variants spread rapidly in the eastern suburbs, as shown in Fig. 7. In addition to space-time scan analysis, the multitemporal quintiles map visualizes spatial and temporal trends for COVID-19 incidence diffusion, allowing comparison of the epidemic over time (Fig. 2). In most postal code areas, the number of infection cases per 100 000 residents remained low in autumn 2020. The virus began to spread throughout Helsinki in November and December 2020, but the number of infections remained relatively low, with only a few postal code areas having a high number of infection cases. The epidemic subsided temporarily in December 2020 and January 2021, but with the third wave of coronavirus, the number of COVID-19 infection cases began to rapidly increase in mid-February 2021. The third coronavirus wave exacerbated the situation, particularly in the eastern and northeastern suburbs (Fig 2). Interestingly, high population density in postal code areas did not appear to be the primary cause of high clusters, as shown in Figs. 1 and 2. This suggests that other sociodemographic factors could account for COVID-19 clustering. The sociodemographic factors associated with the previously mentioned patterns are discussed in greater detail in the forthcoming section. The epidemic curve, along with an accompanying map, may reveal valuable information about the epidemic's spatial progression (Fig. 7). However, due to reporting delays, it may be difficult in some cases to determine the progression of epidemics.

Performance of OLS, GWR, and MGWR regression models
In this study, we discovered spatiotemporal clustering patterns of COVID-19 infections in Helsinki, but spatial analysis only revealed "half of the story." Therefore, we attempted to determine the factors underlying the spatial patterns of COVID-19 infection rates. To explain the variation in COVID-19 data, we used the most important sociodemographic determinants identified in previous studies. Finally, we used only seven sociodemographic predictors to reduce predictor multicollinearity, also observed in previous studies (Mollalo et al., 2020;Snyder and Parks, 2020). However, Souza et al. (2020) discovered that reducing multicollinearity may compromise the study's quality.
Similarly, in this study, reducing multicollinearity and then using only seven predictors could explain why some regression models had low explanatory power at certain time periods. Low model performance may also result from more efficient virus transmission throughout Helsinki, such as during the period 13.01.2021-27.01.2021, reducing the ability of sociodemographic factors to explain variation in COVID-19 infections. In addition to sociodemographic predictors, other variables not included in our study, such as environmental, distance-based, and behavioral factors, could improve the models' explanatory power. The best performance of regression models appears to have been achieved during time periods with positive local and global spatial autocorrelation and scan statistics detecting significant clusters in the COVID-19 data, such as on 10.03.2021 and 24.03.2021.

Sociodemographic predictors explaining variation in COVID-19 infections in Helsinki
In general, median income of inhabitants and a high number of foreign citizens were the best predictors of variation in COVID-19 infection data, followed by a low level of education and a relatively high number of unemployed in postal code areas (Table 4). Our findings are in line with previous socioeconomic studies in Finland, which found that COVID-19 infections were most common in adults with low incomes and a low level of education (Helsinki GSE, 2021), and one out of every four coronavirus infections in Finland has been diagnosed among foreign citizens (Holmberg et al., 2022;THL, 2020). Furthermore, health authorities in Helsinki region announced in 2020 that high number of COVID-19 infections had been detected among the foreign population in the metropolitan area (Rantavaara, 2020). According to THL's MigCO-VID survey (Skogberg et al., 2021), there is an increased risk of imported SARS-CoV-2 virus infection among people of migrant origin due to the following factors; working conditions, lower education and income, and crowded housing. However, one factor stood out above the rest: working conditions. According to Skogberg et al. (2021), foreigners do a lot of work that cannot be done remotely. Many professions, for example, are in the service sector, and only one-third of foreign citizens had the opportunity to work remotely. In addition, half of the foreign-speaking respondents said it was impossible to observe safety intervals at work. Skogberg et al. (2021) also discovered that during the pandemic, foreign citizens traveled more than Finnish native citizens, primarily to visit relatives living in other countries. This could be another reason for the virus's rapid spread among foreign communities. These postal code areas in Helsinki with lower median incomes and higher proportions of foreign citizens are mostly found in the eastern and northeastern suburbs (Fig 9). In these areas, there are pockets of poverty neighborhoods with low median income, low levels of employment, and low levels of education (Kortteinen and Vaattovaara, 2015). Furthermore, in the eastern and northeastern suburbs, there may be concentrations of cramped condominiums with circular migration workers, such as construction workers from the Baltic and Eastern European countries. Many of the COVID-19 infections in Helsinki and the surrounding area were linked to these migrant worker housing conditions, and a large number of infections were observed on construction sites (Ervasti, 2020).

Targeting future interventions and control of the disease spread in the City of Helsinki
Based on the study's findings that COVID-19 infections are concentrated in areas with lower income, relatively high number of foreign citizens, a low level of education and a high number of unemployed citizens; future disease-control interventions should be geographically targeted particularly to the eastern and norther eastern suburbs, which have served as hotspots during most epidemics. Furthermore, as the virus was discovered to be spreading strongly among foreign citizens, public health authorities should be ready to co-operate with foreign communities and organizations. Active efforts should be emphasized to Table 4 Linear regression performance of sociodemographic variables explaining independently the variation in median COVID-19 infection rates data for the time period 28.10.2020-24.03.2021, and the total number of times each variable was selected to the OLS models out of a total of 11 regression models run for different 14-day time periods.  disseminate COVID-19-related information in the City of Helsinki with multilingual and multi-channel communication and counseling. Moreover, COVID-19 infections were detected in foreign citizen communities, particularly among the younger generations; thus, low-threshold tracking and testing, as well as mobile vaccination points, are required particularly in these target groups and intervention areas.

Holistic approach to analyze spatiotemporal patterns of COVID-19
This study takes a holistic approach to analysing spatiotemporal clustering patterns and sociodemographic determinants of COVID-19 infections in Helsinki. There are only a few studies that we are aware of that take a holistic approach to study the spatiotemporal aspects of COVID-19 transmission Wang et al., 2021). However, these studies were conducted on a small scale and focused on factors such as population movement, meteorological parameters, and air pollutants. Whereas, for the first time, our study was able to provide holistic insights into the spatial patterns and sociodemographic determinants of COVID-19 infections in Helsinki at the postal scale level. Based on our study results, spatial analysis techniques can identify neighborhoods and communities where public health interventions should be targeted to reduce local COVID-19 outbreaks. Furthermore, these techniques could be utilized in contacts tracing in healthcare units to more efficient action. However, these techniques were not used during the first, second, and third waves of the COVID-19 epidemic in Finland as the governmental and local authorities in Finland were not fully aware of the benefits of geospatial analysis in fighting the pandemic. In future, in order to study quickly spatiotemporal phenomena of emerging diseases and epidemics, spatial and molecular epidemiology data should be available for researchers and aggregated more efficiently.

Limitations of spatiotemporal analysis in COVID-19 studies
Although our findings indicated that the overall spatial patterns of second-wave COVID-19 infections in the City of Helsinki could be analysed, as well as space-time clusters and risk areas identified and mapped, this study has limitations. In 15 postal code areas, low numbers of COVID-19 infection cases prohibited the City of Helsinki from publishing data due to privacy reasons. These postal code areas could not be analysed, which may have influenced the results. Furthermore, spatial analysis at postal code level aggregates information, hindering understanding of the local variations inside these areal units. Thus, interpretation of the results is limited to postal code scale due to ecological fallacy and modifiable areal unit problem (MAUP) (Wang and Di, 2020). In addition, misinterpretations of the resulted maps of COVID-19 infections may arise from the way in which COVID-19 infection data were collected. Postal code information is based on the infected person's home address, not the location of the initial infection, which is often difficult to determine and may differ. As a result, study findings from spatial analysis of COVID-19 infection data collected at the postal code level must be interpreted with caution.
Previous spatiotemporal studies on COVID-19 infections have encountered similar difficulties in other parts of the world. Kim and Castro (2020) mentioned that more detailed information about COVID-19 infection cases would have allowed them to fine-tune their analysis to a finer-scale than district level in South Korea. According to Fatima et al. (2021), COVID-19 data quality and access to fine-scale data are crucial and are the main limitations in spatial analysis of COVID-19 epidemics. There are also limitations in sociodemographic data, as we were unable to obtain the most recent data for postal code areas because it was not freely available. Furthermore, we used sociodemographic predictors that were available, but other variables may explain the variation in COVID-19 infections better. It is also worth noting that because vaccinations were only started in January 2021 in Helsinki, a sufficiently rapid accumulation of population immunity by disease, or vaccine immunity in a given area, could not be counted in any way in this study.

Conclusion
Our findings show that open datasets, such as the City of Helsinki's postal code level COVID-19 infection data and Statistics Finland's open sociodemographic dataset, can be used in spatial analysis to gain a better understanding of spatiotemporal patterns and sociodemographic determinants even without access to individual-level data. The holistic approach used in this study, including global-and local spatial autocorrelation (Moran's I and LISA); Getis-Ord Gi* hot spot analysis, and regression models (OLS, GWR, MGWR), can be applied to any other global location with similar datasets to contribute to the existing geospatial knowledge of the COVID-19 pandemic at the local scale.
However, acquisition of real-time, fine-scale COVID-19 infection and sociodemographic data are often challenging, making spatial analyses difficult to conduct. To be better prepared for future pandemic waves and to guide policymakers and local health authorities in implementing mitigation strategies, it is critical to understand the benefits of the holistic approach in spatial epidemic analyses in Finland and elsewhere in the globe. Future research should be conducted using fine-scaled COVID-19 surveillance data in collaboration with health authorities, who should be encouraged to elucidate these complex spatiotemporal patterns to inform mitigation and control efforts of the ongoing as well as future pandemics.