Regional disparities in SARS-CoV-2 infections by labour market indicators: a spatial panel analysis using German noti cation data

Morten Wahrendorf (  wahrendorf@uni-duesseldorf.de ) University Hospital Düsseldorf, Institute of Medical Sociology https://orcid.org/0000-0002-4191-1420 Marvin Reuter Universitatsklinikum Dusseldorf, Institute of Medicial Sociology Jens Hoebel Robert Koch Institute, Unit of Social Determinants of Health Benjamin Wachtler Robert Koch Institute, Unit of Social Determinants of Health Annika Höhmann University Hospital Düsseldorf, Institute of Medical Sociology Nico Dragano University Hospital Düsseldorf, Institute of Medical Sociology


Introduction
After two years of the COVID-19 pandemic, the evidence on occupational determinants of SARS-CoV-2 infection risks is still limited. Among the existing studies, many focus on essential occupations and reveal that risks of infections are generally higher among health care workers, transport workers, teachers, child care workers, postal service workers, kitchen staff or workers in the logistic sector -but rather low for workers in the media sector (e.g. journalists), lawyers, scientist or workers in the nancial sector [1][2][3][4][5][6][7][8][9][10][11].
These studies are instrumental as they help to describe infection risks for speci c occupations. However, far-reaching conclusions on high-risk groups and corresponding policy implications remain limited, mainly because most studies focus on rather homogenous occupational groups without allowing for systematic comparisons of risks between different occupations. In addition, it is still not clear how ndings based on individual-level data translate into regional differences of SARS-CoV-2 infection risks and corresponding incidence rates. Yet, knowledge on regional differences and the role of labour markets and their properties is instructive, as infections tend to cluster regionally and many decisions for pandemic intervention measures are made at regional levels [12]. Therefore, research based on individual data should be supplemented by ecological studies that examine infection risks in relation to characteristics of regional labour markets -not as a substitute for individual studies (in case individuallevel data is missing), but as a helpful supplement with immediate relevance for local and targeted policy decisions [13].
In fact, ecological studies on regional differences of COVID-19 have experienced increased attention during the pandemic [14,15]. For example, studies from Germany [16][17][18][19], the UK [20], and the US [21,22], suggest that SARS-CoV-2 infections risks and COVID-19 mortality rates are comparatively higher in regions with high poverty rates or low income levels, or in regions that are generally socioeconomically disadvantaged. Interestingly, studies that also compare associations between different phases of the pandemic in Germany demonstrate that these socioeconomic differences are less pronounced (or even inversed) in the early rst wave of the pandemic (with lower infection rates and mortality in more disadvantaged regions) [19,23,24]. Such an inverse association during the early phase of the pandemic was also found for the US [25] and France [26]. One possible explanation for these differences (and the reversion in the course of the pandemic) are different compositions of regional workforces and respective exposure to the virus. Higher infection rates at the beginning of the pandemic are possibly due to more employed people with international business travel, transregional commuting and more overall mobility (incl. holidays). In the course of the pandemic, though, these were exactly the occupational groups that were able to reduce their mobility with opportunities of working from home, while workers in less advantaged occupations were more exposed to the virus at their workplaces. Additionally, because of preexisting health differences [27], workers in less advantaged occupations were then also more susceptible to an infection due to underlying health conditions.
Despite being mentioned as potential explanations, the role of regional labour markets and their properties, however, remains unclear. Likewise, regional differences are usually studied for overall infections rates, but a simple comparison of infection rates among working-age populations (those most likely to be affected by labour markets) is not available. One exception is a recent study from Toronto that documents that infections and COVID-19 mortality rates (though for the overall population) are higher in neighbourhoods with a high proportion of people working in essential occupations [28]. A comprehensive study of regional differences in infection rates among the working-age population that explicitly focus on regional labour markets and their properties (instead of socioeconomic factors) and takes changing patterns over time into account is still missing. It is the overall objective of this study to ll this gap of knowledge for Germany.
Using weekly noti cation data on SARS-CoV-2 infections at the regional level for Germany combined with regional labour market indicators, the present study investigates differences in age-standardised incidence rates (ASIRs) of SARS-CoV-2 infections according to three labour market indicators at the regional level. As previous studies show that associations between regional indicators and incidence rates can vary across phases of the pandemic, we study associations separately for the rst four main pandemic waves in Germany (ranging from March 2020 until December 2021). The three labour market indicators are: the overall extent of employment in a region, the composition of the workforce by economical job sectors, and existing opportunities of working from home. We also explicitly focus on the working-age population, as well as we apply a statistical approach (spatial panel analysis) which allows to address spatial autocorrelation -meaning that neighbouring regions are usually interrelated and not independent (see Methods for details).

Data sources
We combined the following data available at the German district level: (1) SARS-Cov-2 infections rates from the SurvStat@RKI 2.0 database, (2) geospatial information from the German Federal Agency for Cartography and Geodesy, (3) regional statistics on labour force participation as main exposures of interest, and (4) various regional population statistics (mainly as control variables in multivariable analyses).
Weekly age-standardised SARS-CoV-2 incidence among working-age population Regional data on weekly noti ed laboratory-con rmed SARS-CoV-2 infections was extracted from the SurvStat@RKI 2.0 database (data query on 10 January 2022) from the Robert-Koch-Institute. The data is available for Germany's 400 districts (German "Kreise" and "kreisfreien Städte", the NUTS-3 level), which is the smallest area level available in the German nationwide noti cation data. For the analyses, the covered time period ranged from 2 March 2020 (calendar week 10, as de ned as start of the rst wave [29,30]) till 19 December 2021 (end of calendar week 50 in 2021). To calculate incidence rates (noti ed cases per 100,000 residents with same age), SurvStat@RKI uses population data (i.e. number of residents) from the Federal Statistical O ce. To allow for meaningful comparisons of incidence rates between regions with different age compositions, we conducted direct age-standardisations and calculated age-standardised incidence rates (ASIRs). More speci cally, we used age-speci c incidence rates in 5 ve-year intervals (from age 20 to age 64) and weighted each age group according to its distribution in the revised European Standard Population [31].
We used the whole observation period (for descriptive statistics) and distinguished between the following four pandemic waves to account for possible variations of the associations between labour market indicators and incidence rates: are usually less reliable due to delayed noti cations [29,30].

Geospatial information
Geospatial data came from the German Federal Agency for Cartography and Geodesy as a SHAPE le (© GeoBasis-DE / BKG (2021)), with details at the district level for 401 districts. To enable linkage of geospatial data with incidence rates, we considered an administration reform of July 2021 (where two regions in Thuringia were merged into one) and merged the two respective region (or "polygons") into one region using the "mergepoly" command in Stata [32]. Geospatial data both served as basis for maps and as essential part of the spatial regression models (see below for details).
Information on the regional employment rates came from the German Federal Agency for Work ("Bundesagentur für Arbeit") [33]. It measures the proportion of working-age persons living in an area that is employed (not counting people who are unemployed or looking for a job) and refers to December 2019.
Employment by main sectors divided employment into the three broad economic sectors in accordance to the European NACE systematic (french for "Nomenclature des activités économiques"). Speci cally, we calculated the percentage of workers working in a sector of all workers. These three sectors are extraction of raw materials ("primary sector"), manufacturing ("secondary sector", all industries that produce a nished, usable product or are involved in construction), and service sector ("tertiary sector", all occupations that distribute and sale goods, or provide services to other businesses or to nal consumers). We again used information from December 2019, provided by regional and federal o ces of statistics [34].
Capacity to work from home consisted of a recently developed index that combines survey and administrative data [35]. In short, the index uses information on reported feasibility to work from home across different occupations (from the German BIBB/BAuA Employment Survey from 2018), and combines respective information with administrative data from the Federal Employment Agency on the frequency of different occupations by regions. This results in an index that quanti es the existing potential of working from home in each region as percentage of jobs in a region.
All labour market indicators were available for the 401 German districts before the administration reform mentioned above. We therefore calculate population weighted means of the two merged regions in case of "capacity to work from home", or recalculate rates based on absolute values for the two other labour market indicators.

Additional variables
We also included regional information on the following factors, mainly as potential confounders in multivariable analyses (after checking for possible multicollinearity between these variables): proportion of employees without professional quali cation as percentage of all employees (as an indicator for the quali cation level of the active workforce), proportion of female employees (to account for sex composition of the workforce), median salary income based on people living in the region (to consider the general income level of workers), district type based on four categories: "large city district", "urban district", "rural district with populated areas", and "sparsely populated rural district" (to consider degree of urbanization), settlement density measured as number of residents per square kilometre in urban areas of settlement and transport space (accounting for residential density/proximity), average living space in square meter per accommodation (accounting for size of housing spaces), and boarder region (whether a district is a direct neighbour to another country). All variables were harmonised into 400 regions.
Details of each measure, including data sources and year of measurement are summarised in Table 1. Table 2 presents a correlation matrix of all measures used.
[ Tables 1 & 2 about here] Analytical strategy Following a simple overview of all variables under study, we present the geographical distributions of ASIRs for the calendar weeks with the highest rate throughout Germany in each of the four waves under study, together with Moran's test of residual correlation (to test for spatial autocorrelation). We then present trajectories for the entire observation period of ASIRs by different levels of regional labour market indicators. In that case, each labour market indicator was regrouped into "low", "medium", and "high" (based on tertiles) and we calculated mean scores of ASIRs by respective groups and calendar weeks.
For an in depth study of the associations between regional labour market indicators and regional ASIRs, we then estimated (separately for the four waves) spatial regression models for panel data (with weeksly incidence rates nested in regions and calendar weeks included as dummy variables). In contrast to standard panel regression models (that assumes that neighbouring regions are independent), spatial models extend standard linear models by including geospatial information, and thereby, enable to account for spatial autocorrelation [14,36]. Spatial models are recently used extensively to evaluate geographical differences in COVID-19 and yield less biased estimates in case of spatial autocorrelation (see [14,36,37] for details). Spatial models principally follow two approaches to address spatial autocorrelation (and combinations and variations thereof): through a spatial lag model (also called spatial autoregressive (SAR) model) or through a spatial error model (SEM model). In short, the SEM re ects that neighbouring regions possibly share common characteristics (by including a spatial error component into the model), and the SAR model extends standard regressions by adding a "spatially lagged" dependent variable as predictor into the regression, thus, allowing that incidence rates of one regions can also affect incidence rates of neighbouring regions (spillover effects). To allow for these extensions, spatial regressions require that geographical information is in a so called "spatial weighting matrix" that quanti es the distances between each pair of regions (resulting in a symmetrical 400 x 400 matrix in our case). In the present study, we de ned the spatial matrix using a contiguity weighting matrix ( rst order, queen criterion) where direct neighbours can affect each other. To select the best tting spatial model, we contrasted the AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) statistics of the two spatial Models to a standard linear model (SLM), as well as we compared models based on likelihood ratio tests. Details are provided in the supplemental material (Table S1). On that basis, we saw the that SLM is likely to be misspeci ed (because of obvious autocorrelation) and opted for a SEM Model because of best model ts. This is also the most widely used one in the literature.
As to the modelling strategy, all models are estimated for each of the ve labour market indicator separately, and include the additional variables named above together with dummies for each calendar week. In contrast to the descriptive analyses, the labour market indicators are each treated as continuous variables (avoiding loss of information). In doing so, we also tested for a possible non-linear relationship of the labour market indicators (by adding quadratic terms) -but these were not integrated in the nal models due to absent additional explanatory power. In the Results section, we present estimated coe cients by wave for each of the labour market indicators (resulting in 20 models) together with con dence intervals (95%) and p-values. Finally, to summarise the main ndings and to study if trajectories vary by labour market indicators, we re-estimate all models with additional inclusion of interaction terms between calendar week and labour market indicators. On their basis, we predicted trajectories of incidence rate (i.e "adjusted predictions at representative values") for the cases of a high, medium and low value of each labour market indicator (based on Stata "margins" command [38]). Speci cally, trajectories were predicted in case of a mean value ("medium") and plus/minus one standard deviation ("high" and "low") and are shown in Figure 4. Results of the interaction tests (comparing models with and without interaction terms) are summarised in the supplemental material (Table S3), presenting degrees of freedom (depending on the number of calendar weeks), the test statistics (Chi²) and corresponding p-values.
As part of robustness checks, models were recalculated with alternative weighting matrices (i.e. inverse distance matrix). Further, we tested if other speci cations of the spatial models (e.g. models that combine a spatial error term and a lagged dependent variable) resulted in better model performance -but this was not the case. Additionally, in the case of wave 3 and wave 4 (where vaccination was possible), all models were rerun and additionally included a proxy measure on cumulative COVID-19 vaccination rates for each region (see Supplemental material for details). All calculations and graphs are produced with Stata 16.1, using the "sp" package for spatial analyses and maps.

[Figures 1 & 2 about here]
Descriptive ndings Table 1 shows that the proportion of employed working-age persons in a region ranges from 45 to 70 percent, with a mean value of 62 percent. Also, we observe that most people either work in the secondary or the tertiary sector. In other words, regional labour markets are mainly divided between the secondary and tertiary sector. This also explains the very high correlation between the two later sectors in Table 2 (-0.99). From Table 2 it is also worth noting that regions with a large tertiary sector have higher opportunities to work from home. Turning to the maps presented in Figure 1 (and Moran's test of residual correlation), there clearly are spatial autocorrelations in ASIRs in all four waves, with clustering of high rates during the rst wave in Southern Germany (speci cally Bavaria) and a clustering of high rates in Eastern Germany (speci cally Thuringia and Saxony) during the remaining waves. Figure 2 gives a rst answer on how ASIRs for working-age populations differ by regional labour market indicators. The shaded areas cover the four pandemic waves and -in the case of employment sectorsthe gure focuses on the secondary sector (because the primary sector is negligible and because ndings for the third sector are complementary). With except of wave 1, rates are highest in regions with a high employment rate. Also, regions with a high proportion of people working in the secondary sector (and vice-versa with a less established third sector) or with low capacities to work from home have generally higher ASIRs -at all studied phases of the pandemic.

Results of spatial regression models
Estimates of main analyses are presented in Table 3, with three ndings worth noting: First, employment rates and incidence rates are positively associated in all four waves, with lowest estimates in wave 1 (where weekly incidences rates are generally lower). Second, turning to employment sectors, regions with a pronounced secondary sector have higher ASIRs -again speci cally in wave 2 to 4. Conversely, regions with a pronounced tertiary sector reveal lower rates. Third, from wave 2 to 4 we see that higher capacities to work from home are increasingly related to lower incidence rates. In sum, these latter ndings con rm descriptive results, even after adjusting for various factors (incl. settlement density, district type, border region, level of quali cation, proportion of female employees) and considering spatial autocorrelation, with p-values providing strong evidence against the null-hypothesis for all the reported associations. Findings for wave 3 and wave 4 also remain consistent in sensitivity analyses additionally adjusted for a proxy measure of vaccination rates (see supplementary Table S2 for details).
To summarize the main ndings, Figure 3 presents the predicted ASIRs at given levels of labour market indicators (mean +-1 SD) for each calendar week under study (adjusted for same covariates as in Table   3). Again, we clearly see that rates are particularly high for regions with higher rates, with a high proportion of people working in the secondary sector, or with low capacities to work from home. Furthermore, for these latter regions, the increase in incidence rates (beside the overall level) appears generally steeper -a result that is further supported by the fact that interactions between labour market indicators and calendar were all signi cant with p-values <0.001 (see Table S3 for details).

Discussion
This study provides evidence that regional labour markets and their properties are related to regional patterns of SARS-CoV-2 infections in the working-age population in Germany. In detail, for all four phases under study we nd that regions with higher proportions of people in employment have generally higher weekly age-standardised incidence rates, and that regions where more people work in the secondary sector or with low capacities to work from home (mainly in waves 3 and 4 though) have higher rates as well. Furthermore, ndings indicate that these latter regions also experience a steeper increase of infection rates in the course of the four waves under study. Findings are based on spatial models that account for spatial autocorrelation and remained stable after adjusting for potential confounders at the regional level. And -in cases of wave 3 and wave 4 -the reported ndings were also found when additionally adjusting for a proxy measure of vaccination progress.
Overall, the observed associations are in line with previous studies, speci cally ecological studies that investigate socioeconomic deprivation in conjunction with SARS-CoV-2 infection rates or COVID-19 mortality [23,39]. Yet, by focussing on working-age populations and conducting re ned spatial panel analyses of trajectories of weekly age-standardised incidence rates that consider both regional clustering and potential regional confounders, we provide evidence that adds to existing research in at least two ways: First, by including an indicator that measures the general amount of employment at the regional level, we highlight that work and employment could be key factors for infection transmissions in a region, and thus, that the workplace may be an important entry point for interventions. Second, the nding of higher infections rates in regions with a large secondary sector (or conversely a small tertiary sector) adds to current knowledge that is either limited to smaller geographical areas [28], or relies on studies that use cumulative infection rates across an extended observation period as outcome without focussing on labour market factors [40,41]. Sure, the considered occupations of the secondary sector (i.e. all occupations involved in the production or the construction of goods) must be considered as heterogeneous in our case, but our ndings give good reasons to believe that people who work in the secondary sector are generally more likely to be exposed to the virus (at least for the studied periods of the pandemic). Overall, this is also supported by studies based on individual data with more re ned measures of occupations that show that essential workers in the secondary sector have higher rates and that many jobs of the tertiary sector have lower rates (with except of health care workers) [2,8]. On the one hand, we may speculate that protective measures in the secondary sector (e.g. social distancing or use of face masks) are less established and less effective (e.g. when working in a large manufacturing hall compared with o ce work). Also, the number of co-workers in close proximity at work is possibly higher compared with o ce work. On the other hand, though, our results suggest show that opportunities of working from home, including the possibility to reduce work-related mobility (e.g. public transport to the workplace), are much smaller in jobs of the secondary sector. The latter idea is also supported by our nding of a negative correlation between size of the secondary sector and capacity to work from home. The named aspects (i.e. transmission risks, mitigation measures, work from home) are also important components of recent efforts to estimate potential SARS-CoV-2 infection risks and to develop a respective job exposure matrix [42,43]. Another reason, though, may be that the sectors were differently affected by closures as part of the non-pharmaceutical interventions implemented to contain the pandemic. In fact, while large parts of the tertiary sectors were affected through closures of several businesses (the gastronomy, cultural institutions, or shops), many industries of the secondary sectors remained open.
On a more general note, our study illustrates the necessity of extending the growing evidence on socioeconomic differences in infection risks to factors that are considered as potential explanations.
Future ecological studies also need to focus on other potential explanations of socioeconomic differences, such as pre-existing health conditions [27], air pollution [44,45] (as two potential reasons for greater vulnerability), or on measures capturing different access and use of medical care in a region (incl. adherence to NPI and vaccination coverage) [46]. Here, the included measure of vaccination rate as part of our sensitivity analyses must clearly be seen as a preliminary measure that deserves more methodological re nements (assuring that we know where vaccinated people live) and more analyses.
The study has several limitations: First, we again need to consider that work and employment are possibly one -though it is not the only factor that may explain regional variations of infection rates, and other factors equally deserve attention in future studies to understand varying infection risks [47,48]. Beside those just named above, these may also be more general aspects not necessarily related to socioeconomic deprivation, such as meteorological information (e.g. number of raining days or average temperature), sanitation or hygiene, public transport systems or policy interventions at the regional level. Second, albeit we maintain that ecological studies are instrumental, and well-suited to supplement individual data (because of the direct relevance for potential interventions), we need to be very careful when drawing conclusion from the regional to the individual level. At this point, we need to remember that -albeit being the smallest level available -our study relies on rather large areas. We therefore must consider the risk of an ecological fallacy including potential heterogeneity of workers within regions. To be clear, we cannot guarantee that those who are employed in the secondary sector of a region are also those who are infected. Another limitation relates to testing strategies in the regions. While our study is based on o cial noti cation data on laboratory-con rmed infections with identical noti cation procedures throughout Germany, testing strategies may still vary by regions, with a potential bias as some occupational groups are more likely to be tested in some regions than others (and thus to be detected and noti ed as cases). Yet, in the case of Germany, testing opportunities (i.e. antigen rapid tests with subsequent PCR-test in case of positive result) were free of charge throughout the whole observation period of this study and respective policy changes are decided at the federal level (without variations between states). Furthermore, even if changes may exist across time, all regions were equally affected by these changes -thus making it unlikely that the found differences by regions are biased. Likewise, it is known that health seeking behaviour varies by occupation [49], meaning that some infections are possibly more likely to remain undetected for some occupational groups. Another limitation relates to the generalisation of results. Albeit our study covers nearly two years of the pandemic and distinguishes four periods, insights into subsequent waves with other dominant virus variants are not possible. Results also need to be replicated for other countries. Finally, although our study rstly allows to compare regional variations in infection rates focusing on working-age populations, we still need to question if the observed differences for working-age populations are also translated into differences at the population level. Because infections are likely to be transmitted within families, and because the overall regional disparities correspond to what is observed for the general population [16,23], there are good reasons to think that this is the case.
In conclusion, our study extends current knowledge, by analysing regional variations of SARS-CoV-2 infection risks across four waves of the pandemic by regional labour markets and their properties. This underlines the importance of work and employment as key domains and places for transmission risks. In doing so, it points to the necessity of strengthening these factors as essential component of pandemic preparedness plans and to amend workplace interventions, particularly among workers of the secondary sector without opportunities of working from home. Table 3. Association between labour market indicators and age-standardised SARS-CoV-2 incidence rates for Note. All models are calculated for each labour market indicator separately. Models are adjusted for proportion of employees without professional qualification, proportion of female employees, average income, district type, settlement density, average living space, and boarder region, as well as dummies are included for each calendar week. Figure 1 Regional distributions of weekly age standardised SARS-CoV-2 incidence rates by wave in Germany (for the calendar week with highest rate in Germany) and results of Moran's test of residual correlation (spatial autocorrelation)

Figure 2
Trajectories of weekly age-standardised SARS-CoV-2 incidence rates (ASIRs) for working-age population (aged 20 to 64 years) by levels of regional labour market indicators (based on tertiles) in Germany Predicted age-standardised SARS-CoV-2 incidence rates (ASIRs) for working age population (aged 20 to 64 years) at given levels of labour market indicators (mean +-1 SD) for different pandemic waves based on spatial error model for panel data (same adjustments as in Table 3)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.