Neighborhood-level inequalities and influencing factors of COVID-19 incidence in Berlin based on Bayesian spatial modelling

Numerous studies have explored influencing factors in COVID-19, yet empirical evidence on spatiotemporal dynamics of COVID-19 inequalities concerning both socioeconomic and environmental factors at an intra-urban scale is lacking. This study, therefore, focuses on neighborhood-level spatial inequalities of the COVID-19 incidences in relation to socioeconomic and environmental factors for Berlin-Neuk ¨ olln, Germany, covering six pandemic periods (March 2020 to December 2021). Spatial Bayesian negative binomial mixed-effect models were employed to identify influencing factors and risk patterns for different periods. We identified that (1) influencing factors and relative risks varied across time and space, with sociodemographic factors exerting a stronger influence over environmental features; (2) as the most identified predictors, the population with migrant backgrounds was positively associated, and the population over 65 was negatively associated with COVID-19 incidence; (3) certain neighborhoods consistently faced elevated risks of COVID-19 incidence. This study highlights potential structural health inequalities within migrant communities, associated with lower socioeconomic status and a higher risk of COVID-19 incidence across diverse pandemic periods. Our findings indicate that locally tailored interventions for diverse citizens are essential to address health inequalities and foster a more sustainable urban environment.


Introduction
As the overarching aim of the United Nations Sustainable Development Goals agenda is to promote well-being for all, stark health inequities have become a major health concern in many countries, and to monitor and to address these inequities are considered among the top priorities in the coming decade (Hosseinpoor et al., 2018).In the wake of the pandemic, this has become a key challenge particularly in cities.They are grappling with triple challenges, encompassing the health ramifications of COVID-19, the ecological and climate changes, and social and economic inequalities (European Environment Agency, 2021).Initial studies in many countries have found evidence of ethnic minorities and socioeconomically deprived communities facing a disproportionally higher risk of COVID-19 morbidity and mortality (Benita et al., 2022;Carrión et al., 2021;Das et al., 2021;Dukhovnov & Barbieri, 2022;Harris, 2020;McGowan & Bambra, 2022;Ribeiro et al., 2021).Since COVID-19 might magnify existing inequalities and problems nested within cities, investigating spatial disparities and factors that contribute to COVID-19 incidence through an intra-urban lens is crucial to gain insights into building more equitable and sustainable cities in the post-pandemic era (Acuto et al., 2020;United Nations, 2020).In Germany, such research efforts remain scarce.Straßburger and Mewes (2022) found that socially disadvantaged neighborhoods were particularly affected during the second wave of infection at the intra-urban level in the City of Duisburg, Germany.However, evidence has suggested that the influencing factors of COVID-19′s health disparities and their impacts may vary, depending on local contexts and scales of analysis (Alidadi & Sharifi, 2022).
Furthermore, as the pandemic progresses, the virus's variations and intervention policies were constantly changing across different phases.Assuming that influencing factors have a time-independent, uniform impact on COVID-19′s health outcomes would result in inaccurate estimates and misleading conclusions (Maiti et al., 2021).For example, while environmental factors' health effects can differ across seasonal variations and social cycles (e.g., school terms and holidays) due to changes in the patterns and intensity of human interactions, more complex variations arise when people's actions change in response to evolving infection risks and government interventions, such as stay-at-home orders, travel restrictions, and quarantine protocols (Kwan, 2021).
As a matter of fact, due to data availability, only very limited research has focused on the spatiotemporal variations of COVID-19 at a finer intra-urban scale (Nazia et al., 2022).Most spatiotemporal studies on COVID-19 up to now have been conducted on the national, provincial, county, or municipal level, or they have only analyzed the pandemic's early stages (Aral & Bakir, 2022;Castro et al., 2021;Chen et al., 2021;Gaudart et al., 2021;Maiti et al., 2021;Rohleder & Bozorgmehr, 2021).Further spatiotemporal studies covering different stages of the pandemic at neighborhood level are needed to provide a more granular view of how COVID-19 spread within urban areas, facilitating the development of targeted, evidence-based responses to mitigate its effects.
Previous studies have discussed the various factors' roles in explaining the variability of COVID-19 incidence and mortality.Individuals with a low socioeconomic status have been found to experience higher rates of COVID-19 cases, hospitalization, and death (Benita et al., 2022;McGowan & Bambra, 2022;Mena et al., 2021) and increased age is associated with heightened susceptibility to severe COVID-19 symptoms (Bartleson et al., 2021).Moreover, extensive evidence has confirmed that children have a substantial impact on the transmission of COVID-19, specifically within school and household settings (Pierce et al., 2022).COVID-19 has also been shown to take a disproportionate toll on minority communities in the US (Andersen et al., 2021;Bilal et al., 2021;Carrión et al., 2021).Since the virus that causes the disease spreads primarily through human-to-human contact, COVID-19 is also strongly associated with urban population dynamics, including social interactions and the mobility patterns of urban residents (Bönisch et al., 2020;Lai et al., 2020;Manzira et al., 2022).The COVID-19 pandemic has also drawn attention to how environmental factors contribute to health disparities (Weaver et al., 2022).Initial studies have found that the built environment and urban layouts (Frumkin, 2021;Li et al., 2021;Schmiege et al., 2023), as well as physical environmental factors (such as green-space exposure, air quality, temperature, and humidity; Azuma et al., 2020;Grigsby--Toussaint & Shin, 2022;Han et al., 2022;Kogevinas et al., 2021;Konstantinoudis et al., 2021), are associated with COVID-19 incidence.Despite a plethora of efforts to identify influencing factors, few studies have considered both socioeconomic and environmental factors simultaneously when characterizing the spatial heterogeneities of COVID-19 incidence (Sun et al., 2021).Given the multi-faceted nature of health disparities during this pandemic, a comprehensive approach that considers both environmental and socioeconomic factors is essential to understanding the complex dynamics of COVID-19.
From a methodological perspective, a systematic review of COVID-19 spatial analysis (Nazia et al., 2022) found that most existing spatial studies employed frequentist approaches, such as geographic weighted regression and spatial autoregressive regression.However, these studies tended to focus solely on spatial random effects and overlook unobserved random effects of unaccounted factors, hindering the accurate estimation of the impacts of observed variables (Wali, 2023).Bayesian methods, commonly preferred over frequentist approaches, could accommodate both spatial and unobserved effects through a hierarchical modeling scheme (Nazia et al., 2022).Additionally, Bayesian analyses can be applied to a much smaller ratio of parameters to observations without losing power while retaining precision (Van De Schoot et al., 2015) and are therefore very appropriate for small-area disease mapping (Ver Hoef et al., 2018).This approach may provide local decision-makers a more robust and reliable data-driven governance than traditional frequentist methods model, especially in scenarios with limited data, which is often the case in many intra-urban health governances.Moreover, Bayesian methods facilitate the dynamic updating of estimates as it can take prior knowledge into account when incorporating new data, contributing to more adaptive analyses for long-term health monitoring and governance (Nazia et al., 2022;Van De Schoot et al., 2015;Ver Hoef et al., 2018).
The aim of this study is, therefore, to employ a Bayesian approach to (1) identify neighborhood-level socioeconomic and environmental influencing factors and risk patterns of COVID-19 incidence for the entire study period and separately for different pandemic periods; (2) discern whether evidence suggests spatial, socioeconomic or environmental inequalities in COVID-19 incidence and risks in Berlin-Neukölln, Germany.This study is the first, to our knowledge, to investigate neighborhood-level spatiotemporal variations of COVID-19 incidence in relation to both socioeconomic factors and environmental factors in the context of Germany.Compared to existing studies, we utilize a longer time series and incorporate advanced hierarchical modeling approaches to promote a more comprehensive and nuanced understanding of the complex dynamics shaping health disparities in a dense and diverse global city.In practical terms, our approach could be considered as a blueprint for future pandemics because it covers various stages of the pandemic at neighborhood level, offering practical insights that can enhance the precision and effectiveness of public health interventions, resource allocation, and policy adaptation.Furthermore, this study identified health inequalities in COVID-19 and its influencing factors and thus may provide valuable insights that can inform targeted interventions, policy formulation, and urban planning strategies to build resilient and equitable cities in the aftermath of the pandemic.

Materials and method
To address our research objectives, this study was structured into a series of steps (Fig. 1) that are explained in more detail in the following subchapters.A framework was first proposed to help understand the underlying mechanisms and influencing factors of COVID-19 health inequalities.We measure COVID-19 inequalities through three perspectives of spatial disparities: 1) in incidence outcomes using the Gini coefficient, 2) in COVID-19 incidence risks across our study area, measured by the Bayesian spatial model and 3) in socioeconomic or environmental factors by connecting COVID-19 risks to its influencing factors based on the model results.

Conceptual framework
In this study, we measure the inequalities in the spatial distribution of COVID-19 incidences and risks, and we try to understand these inequalities through their associations with influencing factors.Health inequalities manifest when neighborhoods facing environmental and social disadvantages exhibit higher COVID-19 risks, while those endowed with environmental resources and social privilege show lower COVID-19 risks (Zhuang et al., 2022).To properly unveil and address COVID-19 inequalities, we need to better understand the influencing factors and underlying mechanisms shaping the health inequalities.
While environmental and social inequalities have long been rooted in our society, the COVID-19 pandemic magnified these existing defects due to the disease's highly transmissible nature (Alberti et al., 2020;McGowan & Bambra, 2022).Fig. 2 presents our conceptualization of the underlying pathways of COVID-19 health inequalities.While an environmental hazard is a natural or human-induced physical event or physical impact that may cause certain negative health effects (e.g., air pollution, noise, and heat stress; European Environment Agency, 2018), environmental benefits are natural or human-induced resources that may positively affect human health (e.g., green space and blue space; Schüle et al., 2019).Social vulnerability combines sensitivity (which is primarily driven by age and health) and the capacity to avoid, manage, or adapt to environmental health hazards (which is linked to socioeconomic status, available health and social support, or risk awareness) (European Environment Agency, 2018).In the context of COVID-19 exposure, the roles of virus variants, control measures such as social distancing implemented by local governments, and people's behaviors in response to these implemented measures are considered crucial.
The presence of social vulnerability may influence COVID-19 exposure levels via disease transmission, susceptibility, and treatment (Bambra, 2022;Fu & Zhai, 2021;Huang et al., 2022), while exposure to certain environmental conditions could mitigate or exacerbate the health influence of COVID-19.The combined impacts of social vulnerability, environmental health hazards and benefits, and COVID-19 exposure have given rise to COVID-19 health inequalities.

Study setting
This study was undertaken in an intra-urban setting in Neukölln, one of the twelve municipalities of Berlin.It is located in the southeastern part of the city's metropolitan center, with a total of 327,100 inhabitants in an area of 44.9 km 2 (Fig. 3a).Neukölln is the third most densely populated district in Berlin, and can be considered as the twentieth populated city in Germany.The district's built-up structure, sociodemographics, and environmental characteristics are very heterogeneous.Generally, many areas in the northern part of Neukölln are lower in socioeconomic status, are characterized by high housing density and a higher percentage of foreign residents and migrants, whereas the    In this study, we utilized the 46 planning zones (PLRs) from the lifeworld-oriented spaces system (Lebensweltlich Orientierte Räume) as our unit of analysis (Fig. 3).These zones apply the finest scale created by the City of Berlin as planning units that contain relatively homogeneous spatial entities in terms of socioeconomic and built characteristics and comparable population sizes across units (N = 1247-12,546, on average 7172 inhabitants) (Senatsverwaltung für Stadtentwicklung und Wohnen Berlin, 2020).
This study took a longitudinal approach from March 1, 2020 to December 26, 2021, examining six distinct pandemic periods defined by Germany's national public health institute, the Robert Koch Institute (RKI; Schilling et al., 2022).These periods are characterized by varying different infection rates, variants, and measure strategies as shown in Table 1 (see Schmitz et al. (2023) for more information).This classification of periods was commonly used and referred to in the context of Germany.We followed this classification to have more policy relevance for health department, and for study comparisons in Germany.

Model variables 2.3.1. COVID-19 case data
Data on the reported COVID-19 cases confirmed by PCR-tests in Neukölln were provided by the Neukölln Department of Health.For this study, we received the aggregated cumulative incidence (cases per population of 100,000) over the whole study period (Fig. 3b) and separately for each of the six pandemic periods (Fig. 4) at the PLR level.Due to the higher risk of infection in facilities such as nursing homes and refugee shelters, as well as these facilities' uneven distribution in Neukölln, a total of 960 cases that occurred in such facilities to avoid distribution biases in presenting the COVID-19 cases had been removed in the original data, as had been duplicates and data with missing age or spatial information.

Explanatory variables
We selected explanatory variables to represent influencing factors according to the proposed conceptual framework (Fig. 1) with adjustments based on data availability.

Social vulnerability.
At the neighborhood level, social vulnerability can be characterized by the local socioeconomic status, demographic structure, and access to health resources (see Table 1 for the  details on the data).Three socioeconomic factors were chosen to represent the neighborhoods' socioeconomic statuses: the percentage of the population in needy households who received state support benefits; the percentage of the population under 15 years old who were state support beneficiaries in needy households; the percentage of the population who were unemployed (Senatsverwaltung für Stadtentwicklung und Wohnen Berlin, 2021).Five demographic variables were used: the population's percentage of young people, aged less than fifteen years; the population's percentage of elderly people, aged over 65 years; the population's percentage of people with migrant backgrounds 1 ; and the percentage of foreign residents.For further analysis, the percentage of the population with migrant backgrounds was then split into the two largest ethnic groups in the study area, immigrants from the European Union (EU) 2 and immigrants from the Organization of the Islamic Conference (OIC) . 3 The number of physicians (or general practitioners) and the number of pharmacies per 100,000 inhabitants represent indicators for health resources.
We assumed the social vulnerability data had not changed markedly during the pandemic and therefore collected the data for a single point in time only.

Environmental hazards and benefits.
The average distance from a residential block4 to the nearest public green space and blue space was calculated based on the land use map as indicators for access to environmental resources.Data on air pollution levels in 2019, the annual mean traffic noise (day, evening and night) with 10 m resolution in 2017 and the predicted annual mean number of hot days in 2020 for each residential block in Berlin was used.Additionally, calculated mean distances to the nearest transportation stop as a proxy for neighborhoods' access to transportation, and the average distance from a residential block to the nearest industrial sites to consider potential industry-induced environmental hazards.

COVID-19 exposure.
We addressed this factor by analyzing incidence distribution separately for different pandemic periods since the different virus variants and control measures that characterized different pandemic periods may have affected the populations' levels of exposure to the virus.Social distancing's role in varying COVID-19 exposure was also crucial.We calculated urban-structure characteristics from the built environment that may have affected social distancing as a proxy, following earlier studies (Frumkin, 2021;Gaisie et al., 2022;Li et al., 2021).Social distancing was characterized by two variables: living density (the average inhabitants per hectare of a residential block), based on data provided by FISBroker for 2020, and the number of social sites (including restaurants, bars, cafes, pubs, and marketplaces), derived from the 2022 OSM data.A summary of all the explanatory variables is presented in Table 2.

Statistical analysis
We first calculated the Gini coefficient (Sun et al., 2021), the most commonly used measure of inequality, for COVID-19 incidences to assess the spatial inequality independently of any socioeconomic or 1 According to the Statistics Office of Berlin-Brandenburg (2021), persons with migrant backgrounds are reported as foreigners and Germans with a country of birth outside Germany, a second nationality, a naturalization marker, or an option identifier (i.e., native German birth; since January 1, 2000, the children of foreign parents have initially been granted German citizenship via option regulation) and persons under the age of eighteen years who do not have their own migration characteristics but who have at least one parent with a migrant background if that person is registered at their parent's address.

Base models
We then applied Bayesian negative binomial regression to develop a base model for the case data from all periods.The choice of a negative binomial model was driven by its advantages in handling over-dispersed count data (Hilbe, 2011).To test for spatial autocorrelation in the outcome data, we computed the global Moran's I statistic with an empirical Bayes index modification which is frequently used to adjust the estimates of Moran's I in situations with small sample sizes or high data variability (Assunção & Reis, 1999).Given the significant spatial correlation in the data for incidence across all periods, as evidenced by the Moran's I Bayes Index (EBI = 0.63, p < 0.001), we first compared three potential base regression models with only response variables from all periods for the overall study period (See Table 3): (1) a negative binomial model with only a fixed intercept (Fixed); (2) a negative binomial model with nonspatial random effects to account for unobserved random effects specific to each planning zone (Random); and (3) a negative binomial Besag-York-Mollié model (Besag et al., 1991) with two components respectively representing both the spatially structured and nonspatial random effects (BYM).All models assumed a non-informative prior for hyperparameter precision and the remaining parameters, as proposed in a previous study (Bilal et al., 2021).We selected the model BYM, the one with the lowest deviance information criterion (DIC; Spiegelhalter et al., 2002), for all subsequent analyses and regressions.In this study we leveraged the Lasso algorithm, a powerful statistical tool for automatic variable selection (Sun et al., 2021;Tibshirani, 1996), with 5-fold validation to identify the most informative explanatory variables, and we further improved the selection using expert knowledge from the local health department.
Stepwise selection was employed to only preserve significant variables to ensure a parsimonious final model.Multiple collinearity tests-including the variance inflation factor (VIF) and correlation coefficients between the remaining explanatory variables were executed to detect the models' redundant variables and ensure that multicollinearity was eliminated.
2.4.2.2.Model fitting.We used the integrated, nested Laplace approximation (INLA) method and the R-INLA package (Rue et al., 2009) to obtain Bayesian estimates for all models.Although this method is approximation-based, it has been shown to be accurate and to minimize the computational time (Blangiardo & Cameletti, 2015).The INLA generated the posterior distributions of the parameter estimates, and we separately presented the results with relative risk, exceedance probabilities, and 95 percent credible regions for the whole pandemic period and each sub-period.The variables were standardized before they were added to the model; thus, the estimate coefficients were comparable between variables.Stepwise selection methods were then used to further simplify the model until all the remaining variables were statistically significant (Whittle & Diaz-Artiles, 2020).The DIC was used to compare different models.Generally, a DIC difference greater than ten between two models suggested that the model with the lower value was preferable (Spiegelhalter et al., 2002).Finally, we plotted and mapped the predicted relative risk (RR) and posterior probability of exceeding an RR threshold (RR = 1.1;Rohleder & Bozorgmehr, 2021), estimated for each planning zone in our models.

Validation.
To check our findings' robustness to the selection of priors, we conducted sensitivity analyses by comparing the fixed effects and their 95 % credible intervals for the models with priors chosen in this study versus the default priors of the INLA package.

Period-specific models
To more closely examine major influencing factors during each one of the six pandemic periods, we employed the negative binomial BYM model for the cumulative incidence of each pandemic period by further repeating the variable selection and model-fitting process mentioning in the all-periods model.

Additional analysis
We conducted additional analysis based on the informative predictors identified in the final models to further investigate which social groups were facing COVID-19 inequalities.These health inequalities were considered existing when disadvantaged social groups showed a positive association with higher risk of incidence.Specifically, the percentage of the population with migrant backgrounds was split into two major ethnic groups in the study area-EU immigrants and OIC immigrants-for further analysis.These two groups were selected because they are the largest (60 % of the total population with migration background) and, hence, the most representative migrant ethnic groups in Neukölln.In this step, we ran the final model separately for each of these two groups, and an age-standardized COVID-19 incidence was calculated and utilized to control for age's confounding effect when exploring the associations between risk factors and COVID-19 incidence.

Results
We address the research objectives in the following sections as outlined: 1) the overall influencing factors and risk patterns of incidence for all pandemic periods; 2) the spatiotemporal dynamics of influencing factors and risk patterns across different pandemic periods; and 3) the neighborhood-level COVID-19 inequalities in relation to social inequalities.After addressing multicollinearity, the remaining predictors' VIFs in the final models (for all periods and the sub-periods) were all below five.The spatial distribution of the factors identified in all the final models is visualized in Supplementary Figure S1, and the correlation matrix of them is presented in Supplementary Figure S2.

Overall influencing factors and risk patterns of COVID-19
The Gini coefficient for all-period COVID-19 incidence across PLRs is 0.232, which implies a moderate level of spatial inequalities of COVID-19 incidence.
The all-period final model suggested that six influencing factors made statistically significant contributions to the spatial variations in the COVID-19 incidence: The percentage of elderly inhabitants had the largest effect size with an expected 6.2 % decrease in the COVID incidence (RR = 0.938, 95 % CIs = 0.916-0.961).Additionally, the number of pharmacies per 100,000 inhabitants was positively associated with the COVID-19 incidence (RR = 1.033, 95 % CIs = 1.018-1.047).Moreover, the percentage of the population with migrant backgrounds (RR = 1.029, 95 % CIs = 1.006-1.053),the percentage of young population (RR = 1.029, 95 % CI = 1.013-1.045),and the mean distance to the nearest transit stop (RR = 1.025, 95 % CIs = 1.011-1.039)were positively associated with the incidence.The annual mean traffic noise value had a negative association (RR = 0.097, 95 %CIs = 0.956-0.984).The posterior RR estimates and the 95 % credibility intervals are summarized in Supplementary Table S1.
The PLR-specific COVID-19 incidence RR and the exceedance probability for RR > 1.1 throughout the study periods shows, that 21 from the 46 planning zones had an RR greater than 1, and eight faced a risk higher than the 1.1 threshold, with the highest RR of 1.19 (see Fig. 5).Moreover, most of these areas were clustered in the upper north area of Neukölln.

Dynamics in influencing factors on COVID-19 over the 6 periods of the pandemic
Fig. 6 visualizes the RR from the models' posterior estimation for different periods, and the results of the model for all-periods were also included for reference and comparison.The 95 % CIs indicate great uncertainty in estimations for the first wave and the two summer plateaus, possibly due to the much lower incidence during these periods and hence lower statistical power.Generally, the percentage of the population with migrant backgrounds was selected with a positive association in most periods and with the strongest one during Wave 4. The percentage of the population aged over 65 years was most strongly associated with a lower risk of COVID-19 incidence, selected by the models for all-periods, Wave 2, and the two summer plateaus.The effect sizes of the covariates in the sub-periods were larger than those of the model for all-periods.For example, the unit effect of a young population (under the age of 15) on the COVID-19 incidence increased from 2.9 % to 5.9 %, versus 3.0 % to 8.6 % for migrant backgrounds, during the second wave.Note that the estimated RR for the mean distance to transit exerted effects in the opposite direction in the model for all periods and the Summer 1 model, and some variables were only selected by the models for certain sub-periods but not the model for all periods, such as green space (RR = 0.864, 95 % CIs = 0.747-0.980) in Wave 1, annual mean hot days in Wave 4 (RR = 1.042, 95 % CIs = 1.016-1.069),and the percentage of state beneficiaries in Wave 3 (RR = 1.105, 95 % CIs = 1.052-1.159).To interpret this result, we must consider both the study's sample size and real-life situations during the considered period.Fig. 7a displays PLR-specific RR for sub-period models, while Fig. 7b shows the posterior probability of RR exceeding 1.1.In the first wave, elevated RR spread across Neukölln; 35 of 46 zones had RRs > 1, with 23 exceeding 1.1 and nine surpassing 1.2.This period observed high risk in the lower southern part, but with a relatively low exceedance probability.In subsequent waves, especially during the first summer and Waves 2 and 3, the upper part of Neukölln, especially the upper east, showed greater RR with strong exceedance probability; highest RRs were 2.06, 1.43, and 1.33, respectively.In the second summer, the risk remained in the northern part, with a slightly lower exceedance probability.In Wave 4, 11 zones had RRs > 1.1, similar to previous periods, but with a slightly elevated risk in the southern inner corner of Neukölln.

Validation of model estimation
We found no major differences in fixed effects or their uncertainty estimates with the INLA default priors, indicating our findings' satisfying robustness to our choice of priors.The BYM model achieved the highest prediction accuracies with the lowest RMSE value compared to other modelling approaches in our study, suggesting a good performance of our selected models.Table S2 in the supplementary material shows the RMSE values for the different models' predictions of the COVID-19 incidence.

Additional analysis
Although several environmental factors were found statistically significant in the model results, the direction of the associations did not imply that communities with higher burden of environmental hazards or lower access to environmental resources are faced with higher risks of COVID-19.For example, the noise exposure level was negatively associated to COVID-19 incidence, which did not suggest higher risks for groups of population suffered higher level of noise.On the other hand, the population with migrant backgrounds consistently exhibited a positive association with the COVID-19 incidence over the three waves and in the all-periods model, indicating a potentially elevated health risk for this population and hence a social inequality in COVID-19 incidence.When splitting the migration group into further sub-groups, we found the model's results to be significant for the percentage of the population with OIC migrant backgrounds but not for the population with EU migrant backgrounds.For the overall model, a standard-deviation-unit increase in the percentage of the population with OIC backgrounds in a planning zone led to a 3.1 % (RR = 1.031,95 % CIs = 1.006-1.053)rise in the incidence.For the second wave, when replacing the input variable -percent of population with migration background-with the percent of population with OIC backgrounds, the effect size rose from 8.6 % to 10.4 % (RR = 1.104,95 % CIs = 1.057-1.152).A standard-deviation increase in the OIC group would lead to a 10.9 % (RR = 1.109, 95 % CIs = 1.078-1.141)greater incidence during Wave 4. The adjusted association was still significantly positive (RR = 1.009, 95 % CIs = 1.002-1.012)between OIC population and age-standardized incidence (see Supplementary Figure S3).

Variability in influencing factors and relative risks
A key finding from our results is that socioeconomic factors exerted a more significant influence on the spread of COVID-19 when compared to environmental factors.This finding is in line with other intra-urban studies that identified stronger influence of social characteristics over other variables, such as studies in Chicago (Kashem et al., 2021) and Tehran (Lak et al., 2021).Among the sociodemographic features, age and ethnic minority were identified most constant factors on COVID-19 infections in our study.In this section, we will discuss the role of the social and environmental factors we have highlighted in our results, except for ethnic minorities.We will address ethnic minorities separately in the next section to delve further into it, considering it a significant factor of health inequality.

Social vulnerability
While many other studies have found that neighborhoods with higher percent of elderly people are associated with greater risks of COVID-19 infection and mortality (Alidadi & Sharifi, 2022;Khavarian-Garmsir et al., 2021;Lak et al., 2021;López-Gay et al., 2022), our results indicated that, in Neukölln, the generally high percentage of elderly inhabitants is associated with a lower COVID-19 incidence.Similar negative association was also reported by Johnson et al. (2021) that the population over 70 is connected to lower COVID-19 cases in 299 local authorities in the UK.One possible reason for this finding could be that, since cases from nursing homes had been removed for this study, the percentage of elderly people was highly associated with a sparse housing density (Pearson's coefficient: − 0.82), which has been consistently linked to reduced COVID-19 transmission in earlier studies (Alidadi & Sharifi, 2022).Besides, the increased vulnerability of the elderly to COVID-19 mortality (Williamson et al., 2020) was highlighted in public health guidance.Consequently, older individuals may have reduced their contact with others and exhibited greater compliance with public health measures compared to younger adults (Korn et al., 2022;Nivette et al., 2021).Furthermore, the German government has prioritized the vaccination of elderly people (Brenner, 2021).Therefore, sparse housing, less social contact, and targeted vaccine campaign, might all have contributed to less COVID-19 risk for the elderly group.Conversely, higher risks were found to be associated with the young population.This finding is congruent with evidence from previous studies (Whittle & Diaz-Artiles, 2020;Li et al., 2022) suggesting that significant transmission was fueled by young, asymptomatic carriers.The significant role of children in the transmission of COVID-19 is likely a result of elevated exposure within the school environment (Pierce et al., 2022), which recognized as high-risk settings for COVID-19 transmission due to close and frequent contact among students and teachers.Moreover, the obligatory testing schemes in schools throughout the pandemic and the detrimental effects of crowded and poorly-ventilated indoor spaces may further contribute to COVID-19 transmission in schools (Xu et al., 2021).
Regarding health resources, the number of pharmacies per 100,000 inhabitants was found to have a statistically significant positive association with the COVID-19 incidence in the all-periods model.A previous study confirmed pharmacies' roles in addressing current population health gaps and influencing patients' health engagement (Livet et al., 2021).The positive association between the pharmacy indicator and incidence was inconsistent with previous studies on city scales in China (Zhang et al., 2021), but at neighborhood level it aligned with findings in Tehran, Iran (Lak et al., 2021).This finding could suggest that the current allocation of health resources involving pharmacies is reasonable and matches the demand for at-risk populations since areas with higher incidence were equipped with more pharmacies.On the other hand, inhabitants in areas with more pharmacies might tend to have higher health-risk awareness and easier access to self-tests; therefore, they may be more likely to take PCR tests, leading to more confirmed cases.

Environmental hazards and benefits
Generally, in this study, environmental variables only exerted a marginal but statistically significant effect.We did not find enough evidence of environmental inequalities since no disproportionate environmental burdens had statistically significant connections to higher disease incidence.During the first wave, our model suggested that the further a residence from public green space, the more likely were lower risks of infection.This finding contradicted some previous studies in the United States (Liu et al., 2021;Russette et al., 2021;Spotswood et al., 2021) where green space exposure was a negative predictor on morbidity or mortality of COVID-19.However, it aligned with some intra-urban studies that focused on the pandemic's early stage in China (Huang et al., 2020;You et al., 2020).We hypothesize that during the lockdown, with the closure of indoor workplaces and recreational spaces, public green spaces may have experienced increased human mobility as a primary recreational venue, potentially leading to more transmissions.Previous studies in various countries have supported our hypothesis and confirmed a rise in park visitation during the outbreak (Geng et al., 2021;Venter et al., 2020).We also found that the Southern part of our study area included a large expanse with sparse housing and private gardens instead of public green space.This trait may have also affected green space's influence on the COVID-19 incidence.
The planning zone's mean noise value was also a statistically significant predictor of the cumulative COVID incidence in the combined waves and in Wave 4. Contrary to the results of a previous spatiotemporal study (Díaz et al., 2021), in our study planning zones with higher noise levels were found more likely to have lower COVID incidence.According to the metadata of the datasets, the main noise source was traffic, and noise value was an annual mean from 2017 for each whole planning zone.As studies in London and Dublin indicated, a significant decrease in the perceived outdoor noise level were reported during the lockdown (Basu et al., 2021;Lee & Jeong, 2021).Therefore, the data we used might not fully represent the neighborhood's noise level that residents experienced during the pandemic.Instead, it could indicate the S. Zhuang et al. area's traffic connections.It is consistent with another variable in the final model for all periods-the mean distance to transit-suggesting that easier access to transit was associated with a lower COVID-19 incidence.
While some studies have demonstrated a positive correlation between public transport utilization and COVID-19 incidence (Carrión et al., 2021;Guo et al., 2021;Huang et al., 2020;Xu et al., 2022), our result is supported by the findings of intra-urban studies in Sydney (Gaisie et al., 2022), Washington DC (Hu et al., 2021) and Dublin Manzira et al. (2022).During the COVID-19 pandemic, fear of infection, perceived risk, and travel anxiety took precedence as the primary influencers of travel choices and contributed to a shift of transportation mode from public transportation (Chen et al., 2022).As a result, most public transport operated at reduced capacity with additional hygiene measures in place, public transport users may not have been at a higher risk of contagion (Manzira et al., 2022).In this case, more transportation options in an area could further reduce crowding on public transit and, hence, disease transmission.However, note that the mean distance to public transit had the opposite effect on disease incidence during the first summer.This result might have been due to the high mobility during summer vacation time, especially after the previous lockdown.This finding is consistent with expectations from earlier studies that have suggested that the impact of public transport use is nonlinear and context-dependent (Kim et al., 2023).
As opposed to the results in a recent global multi-city study indicating a negative association between temperature and COVID-19 infections (Nottmeyer et al., 2023), our study found annual mean hot days were positively connected to COVID-19 incidence in Wave 4. Nakada and Urban (2021) also observed that temperature was inversely correlated to COVID-19 infection in their intra-urban study.Our result might indicate a potential detrimental health impact of heat stress instead of land or air temperature.People's reluctance to wear masks on hot days could also contribute to the higher incidence (Milošević et al., 2022).On the other hand, it could be understood as a proxy for the urban heat island effect; the busier and denser an urban area, the higher its land temperature.As control measure were gradually lifted and economic activities were recovered during this period, the result might indicate that neighborhoods with higher urbanicity recorded higher infection rates, a trend that has been observed in Melbourne, Australia (Gaisie et al., 2022), Huangzhou, China (Li et al., 2021) and multiple cities in Brazil (Viezzer & Biondi, 2021).

Social distancing urban parameters
Living density and number of social sites, proxies for social distancing, have been found as insignificant contributors to the spreading of COVID-19 in our study.This was unexpected as in many studies, urban densities are main predictors for incidence.For example, a recent study in Tokyo has observed a positive association between COVID-19 spread and not only population density but also other urban density like commercial and healthcare facilities (Alidadi et al., 2023).Similar results were also found in Barcelona (López-Gay et al., 2022), Tehran (Lak et al., 2021), Wuhan (Xu et al., 2022).However, Khavarian-Garmsir et al. (2021) argued that urban density alone cannot be considered a risk factor and has a double-edged effect on COVID-19 in Tehran, Iran.This was also confirmed with a study in Chicago (Kashem et al., 2021).In our case, the living density and density of recreational and social sites were insignificant which might be a result from relatively strong and effective health control measures.Even when the lockdown and partial lockdown measures were lifted, strict 2 G, 3 G policies and mask mandate in indoor public space and traffic were implemented.As pointed out by Chu et al. (2021), COVID-19 spread was highly related to urban governance capacity instead of city size, denoting the importance of a better urban governance to prevent and control public health risks.

Structural health inequalities within migrant communities
The percentage of the population with migrant backgrounds demonstrated a significant effect on the COVID-19 incidence across all periods and in several separate periods in our data set, indicating a persistent health disadvantage rooted in the neighborhoods.As the RR map shows, the northeastern part of Neukölln demonstrated some of the highest risk and exceedance probability during the first summer, Wave 2, Wave 3, Wave 4, and the whole study period.This area is also among the most socioeconomically deprived areas with high proportions of OIC communities.It could indicate that OIC ethnic groups residing in this area might suffer socioeconomic disadvantages and a disproportionately high risk of COVID-19 infection.These findings corroborate the claims by Wulkotte and Bozorgmehr (2022) and Koschollek et al. (2023) that migrants in Germany experience higher levels of socioeconomic disadvantages and poorer health statuses than non-migrants.This study contributes new insights into intra-urban ethnic health inequalities within the German context, expanding beyond previous evidence primarily focused on the U.S. For example, Hu et al. (2021) observed that, across all wards in Washington DC, African American residents accounted for the highest percentage (75 %) of COVID-19 deaths at the ward level.In another study, neighborhood Latino and Black population proportion was positively correlated with COVID-19 positivity in New York (Chan et al., 2021).With partial lockdown measures during the pandemic's second wave, migrant populations became the main predictor of COVID-19 incidence with an increasing effect size, suggesting that lockdowns might worsen existing inequalities (Bajos et al., 2021;Dorn et al., 2020).
These health inequalities may result from a range of factors-including discrimination, language barriers, and differences in housing and workplace conditions, as well as access to healthcare and social support.In the context of COVID-19 health inequities, these inequalities could be associated with vaccine uptake.Marleen et al. (2023) recently pointed out that the sense of belonging to German society is associated with vaccine uptake.People who lack this sense of belonging are much less likely to be vaccinated than those who feel a strong sense of belonging in Germany.As Koschollek et al. (2023) have pointed out, living and working situations increased the risk of COVID-19 infection-not migrant status.Future health resource prioritization and policymaking could pay more attention to these migrant communities to address potential health inequalities.
This study incorporated the roles of migration and ethnicity into its analysis solely for the purpose of revealing health inequalities and their underlying mechanisms.In no way should these statistical representations entail any discrimination or misinterpretation.It is essential to examine migration groups' health situations in connection with the social determinants that shape their lives and influence their health (Kajikhina et al., 2023).Apart from ethnic minority groups, evidence of social inequalities was also found during Wave 3 when a higher percentage of state support recipients in a PLR was significantly associated with a higher COVID incidence, indicating that deprived neighborhoods might face elevated risks of infection.This is in line with some studies in both developing and developed countries, such as Chile, China, Brazil, India, Iran, Australia, the US and throughout European countries (Das et al., 2021;Gaisie et al., 2022;Han et al., 2022;Lak et al., 2021;Liu et al., 2021;Mena et al., 2021;Moosazadeh et al., 2022;Sannigrahi et al., 2020;Viezzer & Biondi, 2021).Understanding intra-urban inequalities facilitates more effective community engagement.Public health campaigns and communication strategies can be customized to resonate with the diverse populations across cities, fostering a greater understanding of preventive measures and encouraging compliance with health guidelines.

Strength and limitations
This study considers time-dependent variations in the influencing S. Zhuang et al. factors of COVID-19 incidence.Although the pandemic periods are not the finest temporal resolution, the delineation into six distinct phases facilitates clearer comparisons of COVID-19 risk patterns and influencing factors across different stages of the pandemic.This structured approach simplifies the interpretation of results, making it easier to connect the findings to specific public health responses and derive policy implications for health departments.The sociodemographic and built environment data in Berlin at a fine-grained scale enables us to disentangle the spread of the infection within highly international and mixeduse European urban areas.The BYM model and the Bayesian approach outperforms other methods investigated in this study, validating its ability to enable more robust estimation for small-area study with limited observation (van Zoest et al., 2022).By including both area-specific, nonstructured noise and spatially structured effects, it allows for the compensation of some unstudied confounders (e.g., face-mask measures and vaccination coverage).
However, several limitations of our study should be acknowledged.First, our response variable was based on the reported number of COVID-19 positive PCR-tests, which may not reflect the actual number of cases due to asymptomatic carriers and the lack of widespread testing.Nonetheless, we contend that our findings are valuable in explaining the crucial roles of detection and prevalence in effective pandemic management (Whittle & Diaz-Artiles, 2020).Second, the spatial analysis faced methodological challenges, including the ecological fallacy and the modifiable areal unit problem inherent in aggregated data analysis.Caution is advised in interpreting established associations, as spurious correlations may result from unmeasured confounding variables common in observational studies.Additional research is essential to establish causal relationships, even when statistically significant associations are present.Third, the case data from nursing homes were removed, but the population data did not exclude corresponding data since we lacked information on the demographic characteristics of individual cases in those facilities.This data exclusion might cause slight bias when analyzing sociodemographic factors' roles in determining COVID-19 incidence.Fourth, we have limited observations at PLR level (46) with a wide range of covariates, which might not lead to the best statistic power.In this regard, we utilized Lasso technique to reduce variable dimensions before modeling and Bayesian approach to minimize the influence of our small sample size and to have statistically reliable results.Finally, due to data availability, we were only able to collect rather limited indicators of socioeconomic status, and we were unable to collect any housing conditions and vaccine coverage at study level.Environmental data were aggregated on the neighborhood level with coarse time resolution.Given these data limitations, we may not have been able to capture certain spatial variations and fully unravel potential inequalities.A future study could incorporate high-resolution longitudinal environmental data to better assess the impact of environmental factors.Additionally, conducting a gender-stratified analysis would provide insights into potential gender inequalities.A multilevel study, incorporating individual-level influencing factors and guided by Directed Acyclic Graphs (DAGs) for variable selection, would help mitigate ecological fallacies, offering deeper insights into intra-urban health inequalities.Furthermore, a broader comparison of various modeling approaches for COVID-19 outcomes is encouraged for future research.

Conclusions
In this study, we have investigated the associations between explanatory variables from all the components of a proposed healthinequality framework and the COVID-19 incidence at an intra-urban scale.As the pandemic evolves, the models identified different influencing factors.Sociodemographic factors such as percentage of the population over 65 years old and the percentage of the population with migrant backgrounds were noted to have stronger effects on COVID-19 incidence than the environmental factors selected in this study.The community with migrant backgrounds, especially those with OIC backgrounds, faced a higher risk of COVID-19 incidence, indicating health disparities associated with social inequalities.Persistent elevated relative risks in the upper part of Neukölln suggest structural spatial inequalities.Specific built environmental factors, including green spaces, transportation, and pharmacies, were found to influence COVID-19 incidence during certain periods.However, the direction of these associations does not indicate any COVID-19 inequalities attributable to unequal distribution of environmental hazards and benefits.The proposed Bayesian spatial models outperform other model schemes, confirming their suitability for local small-area health monitoring and governance.
Our study uncovers a previously unrecognized link between migration background and increased COVID-19 risk in Berlin, highlighting ethnic health disparities.This finding underscores the importance of considering sociodemographic factors in understanding disease transmission and addressing health inequalities.Additionally, recognizing how influencing factors intersect with COVID-19 incidence to shape health inequalities can inform long-term urban planning efforts to build more sustainable cities, such as investing in healthcare infrastructure and social support systems to bolster resilience in vulnerable communities.Studying different pandemic periods offers valuable insights into evolving health crises and informs adaptive responses by local health departments, enhancing their preparedness for future health emergencies.Our Bayesian approach promotes a robust data-driven approach to health governance in complex urban areas, providing a promising tool for long-term health monitoring and urban planning.Decision-makers can leverage these insights to make informed choices, monitor trends, and adjust policies accordingly.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Workflow of this study's methods and data analysis procedures.

Fig. 2 .
Fig. 2. Conceptual framework of COVID-19 health inequalities at the neighborhood level, adapted from the work of Zhuang et al. (2022).
S.Zhuang et al.
southern areas are dominated by more dispersed housing with a lower living density, and a higher percentage of their population is elderly (aged over 65 years) (Senatsverwaltung für Stadtentwicklung und Wohnen Berlin, 2021).

Fig. 5 .
Fig. 5. Unequal distribution of COVID-19 incidence risks over the whole study period.(a) Area-specific Relative Risk (RR).(b) Exceedance probability for RR higher than 1.1.An exceedance probability above or equal to 90 % indicated a high likelihood of exceeding the RR thresholds.

Fig. 6 .
Fig.6.Estimated relative risks (mean and 95 % credible intervals) for the associations between the explanatory variables and the incidence by all-periods and period-specific model.

Fig. 7 .
Fig. 7. Unequal distribution of COVID-19 incidence risks across the six pandemic periods.(a) Area-specific RR.(b) Exceedance probability for an RR above 1.1.An exceedance probability above or equal to 90 % indicated a high likelihood of exceeding the RR thresholds.

Table 1
Summary of the six pandemic periods' key features.

Table 2
Descriptive statistics and data sources for this study's variables.
* The final variables were calculated and derived from the source's original data.

Table 3
Characteristics of the three base models (no predictors).A lower deviance information criterion (DIC) represents a better trade-off between the model's fit and complexity.
(van Zoest et al., 2022)final model's performance, the dataset of all-periods cases was randomly split into training (80 %) and test data sets (20 %).After estimations were obtained based on the training data set, different models with the same variables that remained in the final model were applied to the test data set.We compared our BYM spatial model to the other two base models under the Bayesian framework, i.e.Fixed and Random.In addition to our three base model types, a frequentist model (e.g.maximum-likelihood generalized linear model (MLE)) and a popular machine learning model (e.g.random forest (RF; 500 trees to grow and a default setting for other hyperparameters in the R package ran-domForest) were implemented to allow for a broader comparison.The root mean square error (RMSE), one of the most commonly used measures for evaluating predictions' quality, was calculated to validate models' performance(van Zoest et al., 2022).