Economics and Human Biology

A century after the Spanish Flu, the COVID-19 pandemic has brought renewed attention to socioeconomic and occupational differences in mortality in the earlier pandemic. The magnitude of these differences and the pathways between occupation and increased mortality remain unclear, however. In this paper, we explore the relation between occupational characteristics and excess mortality among men during the Spanish Flu pandemic in the Netherlands. By creating a new occupational coding for exposure to disease at work, we separate social status and occupational conditions for viral transmission. We use a new data set based on men’s death certificates to calculate excess mortality rates by region, age group, and occupational group. Using OLS regression models, we estimate whether social position, regular interaction in the workplace, and working in an enclosed space affected excess mortality among men in the Netherlands in the autumn of 1918. We find some evidence that men with occupations that featured high levels of social contact had higher mortality in this period. Above all, however, we find a strong socioeconomic gradient to excess mortality among men during the Spanish Flu pandemic, even after accounting for exposure in the workplace.


Introduction
In 1918-1919 the Spanish Flu took the lives of an estimated 50-100 million people worldwide (Johnson and Mueller, 2002). In the Netherlands, estimates of contemporaries suggest that over 41.000 persons died, of which about 30.000 perished during the autumn wave of 1918 (CBS, 1918;Quanjer, 1921). At the time, it was thought that social differences in mortality to the disease were relatively small, both in the Netherlands and abroad. For example, the Dutch Health Council reported that there were notable differences in mortality between age groups and regions, but found no reason to conclude that occupation or wealth affected mortality (Quanjer, 1921). In recent decades, however, evidence has grown that there existed a socioeconomic gradient in Spanish Flu mortality (Mamelund, 2006;Mamelund and Dimka, 2021;Clay et al., 2019). Yet, uncertainty remains, a large part of the literature is based on aggregate data, and the mechanisms that linked together socioeconomic factors and mortality risk to the Spanish Flu remain unclear. In this paper, we use new micro-level data containing occupations and dates of death for five provinces in the Netherlands to further establish whether and how mortality risk during the Spanish Flu was linked to socioeconomic and occupational characteristics.
have contributed to risk of infection with and risk of mortality during the Spanish flu. Particularly, we are interested in the social gradient in mortality while taking into account occupation-related risk of infection. Such a social gradient may point to the relevance of resources and existing health differences in shaping mortality during the pandemic, whereas a relationship with occupational characteristics may point towards the role of infection risk in the workplace.
Our case is the Spanish Flu in the Netherlands, the history of which has largely remained unwritten (for exceptions, see De Gooyer, 1968;De Melker, 2005;Quanjer, 1921;Vugs, 2020;Mourits et al., 2021). We focus on the second wave of the Spanish Flu, in autumn 1918, which was by far the deadliest phase of the pandemic. We use occupations registered on death certificates to measure socioeconomic status of the deceased and create a new occupational classification whether work led to frequent social contact and whether it took place in enclosed spaces. We use this to isolate socioeconomic status from occupational exposure to viral transmission.
We find evidence for a social gradient in mortality among men both in strongly and more mildly affected places. Our findings suggest that part of the socioeconomic patterns in Spanish flu mortality identified by previous studies exist independently from occupational differences in the frequency of social contact with others and conditions for viral transmission at work. Other factors related to socioeconomic status are thus likely to have been important and may have included the opportunity to isolate and viral transmission away from work, and existing health differences in the population.

Historical background
The 1918 influenza pandemic has been described as ''the mother of all pandemics'' (Taubenberger and Morens, 2006). After a mild summer wave in 1918, a deadly second wave took hold during the final months of the First World War. Between September and December 1918, the pandemic took more lives than the preceding four years of warfare. The flu was notorious for its lethality, but especially feared as it heavily affected young adults between ages 20-40, who are normally least affected by infectious diseases. Especially young men and pregnant women were affected. The infected initially showed common symptoms of the flu, which could quickly develop into pneumonia. Death often followed within days. The horror is perhaps best described in the memoirs of the surgeon general of the US army Vaughan: ''I see hundreds of young, stalwart men in the uniform of their country [. . . ] placed on the cots until every bed is full and yet others crowded in. The faces soon wear a bluish cast; a cough brings up the blood stained sputum. In the morning the dead bodies are stacked about the morgue'' (Vaughan, 1926;Kolata, 2001).
In the 20th century, the Spanish Flu was mostly perceived as a socially neutral disease that affected both the poor and the rich. Immediately after the outbreak, many doctors and scientists maintained that there was no socioeconomic gradient in who was affected (Bengtsson et al., 2018;Quanjer, 1921). For the Netherlands, the Dutch Health Council (Quanjer, 1921) wrote that for influenza-related mortality: ''One cannot conclude that there is any distinction, neither by occupation, nor between the more and less affluent''. 1 The idea of the Spanish Flu as a socially neutral disease persisted in the remainder of the 20th century. Doctors and scholars thought that the influenza virus infected and killed all classes equally because the disease was so highly transmissible .

Social differences in early 20th-century mortality
The image of Spanish Flu as a socially neutral disease has changed in the past decades. Statistical evidence has since then shown that mortality to the Spanish Flu was higher in poorer countries (Johnson and Mueller, 2002;Murray et al., 2006;, even though there are recent studies arguing that Spanish Flu was unrelated to pre-pandemic economic characteristics (Brainerd and Siegler, 2003;Crosby, 2003). Furthermore, in more developed countries mortality among the poor was higher, even though there was no perfect social gradient in mortality (Bengtsson et al., 2018;Mamelund, 2006). Using micro-level data from the Swedish census, (Bengtsson et al., 2018) show that this disadvantage of the lower classes was especially notable among men and less pronounced among women The discussion about the role of socioeconomic inequalities during the Spanish flu pandemic links up to a larger debate on the origins of socioeconomic differences in mortality. Across Europe, the social gradient in health was growing in the first half of the 20th century (Bengtsson and Van Poppel, 2011). In industrialised and urbanised England, survival differences by socioeconomic group were already present around 1900 (Antonovsky, 1967;Razzell and Spence, 2006). In some regions, such as Southern Sweden, a health gradient by social class emerged relatively late (Debiasi, 2020). In the Netherlands in 1918, mortality differences after age 50 were only to a limited extent affected by social class (Mourits, 2019), but in younger age groups, which were especially affected by the Spanish Flu, there existed a social gradient in health from the end of the 19th century. Van Poppel et al. (2009) found that between ages c. 35-55, the elite in the Netherlands had a survival advantage over farmers and the middle class, whereas the working class had a survival disadvantage. Both existing differences in health as well as other social class characteristics related to knowledge and skills, income, and occupational characteristics may have played a role in excess mortality during the Spanish Flu pandemic.
According to the fundamental cause theory (Link and Phelan, 1995), a socioeconomic gradient in health persists even when the disease environment changes, as socioeconomic resources provide health advantages regardless of what the dominant diseases and causes of death are. However, the specific advantages linked to socioeconomic resources are dependent on the local context (Clouston et al., 2016). For diseases that are not well understood or cannot be treated effectively, such as Spanish Flu, socioeconomic differences in disease burden or mortality are still expected. For example, resources can help people isolate from sick relatives or co-workers, improve access to information and understanding of the appropriate behaviour to avoid exposure, and ease access to health care.
The topic of the Spanish Flu has seen renewed interest by historians and social scientists in recent years (Boberg-Fazlic et al., 2021;Colvin and McLaughlin, 2021;Devos et al., 2022;Gallardo-Albarrán and de Zwart, 2021), though few studies provide in-depth analyses of socioeconomic differences in mortality. Evidence for socioeconomic contrasts in Spanish Flu mortality comes from a number of studies, predominantly at the aggregate level and showing correlations between local conditions and excess mortality. Herring and Korol (2012) show that the poorer neighbourhoods of Hamilton, Ontario had significantly higher influenza mortality rates than the richer neighbourhoods. Tuckel et al. (2006) and Fanning (2010) found that immigrant groups from Italy and Eastern Europe in Hartford, Connecticut and Norwood, Massachusetts, often of low socioeconomic status, were harder hit by the Spanish Flu pandemic than non-immigrants. Similarly, Clay et al. (2018) find that the local share of foreign-born immigrants and prepandemic typhoid rates predict higher all-cause mortality during the pandemic. Clay et al. (2019) also find that high illiteracy rates and prepandemic infant mortality predict higher mortality during the Spanish Flu. Studies on socioeconomic gradient in Spanish Flu mortality based on individual-level data remain scarce, however.
Among the factors that may have contributed to higher risk among lower socioeconomic status groups, higher exposure to disease figures prominently. 2 Socioeconomic status is and was closely related to the degree and intensity of social contact, and households living in crowded conditions may have had a larger chance of exposure to the Spanish flu, especially in urban slums. Specific factors for socioeconomic differences in Spanish Flu mortality include hygiene and crowded housing, implying that the sick could not be cared for in a separate room. Such living conditions were also related to existing health before the arrival of the pandemic. Tuberculosis in particular may have been a risk factor, as it may have contributed to the chance of a secondary infection with bacterial pneumonia as well as reduced access to health care (Grantz et al., 2016;Mamelund, 2006). Not surprisingly, then, working class neighbourhoods and areas with small housing had higher death rates during the Spanish Flu in Oslo, Norway (Mamelund, 2006).

Occupational exposure to the Spanish flu
Exposure to disease does not only occur at home but also at the workplace. Evidence for a relation between occupational characteristics and Spanish flu mortality largely stems from macro-level studies. For Chicago it was found that local unemployment rates were related to decreased mortality rates and transmission, indicating that social contact and not poverty itself could have been a key factor in mortality to the Spanish flu (Grantz et al., 2016). From the early 20th century, working conditions in factories were recognised as a general risk factor for transmission of infectious diseases (Van Der Woud, 2010). In 1898, Dutch socialist politician Domela Nieuwenhuis (1898) wrote about the poor hygiene, and exposure to toxins, dust, and small particles among the working poor. By 1917, critique of labour conditions had become more widely accepted. Statistics Netherlands reported that working men aged 35-54 working outdoors or near furnaces had lower all-cause and tuberculosis-related mortality than those who were subjected to organic dust from tobacco, flower, and textiles, worked with chemicals and toxins, or -even more detrimental -were exposed to grinding dust from glass, metal, porcelain, or stone (CBS, 1917). For the same period in Sweden, reports by healthcare professionals addressed how poor air quality and unhygienic conditions in factories made workers more susceptible to tuberculosis (Sundin and Willner, 2007).
In a study of the Swedish case using individual level data, Bengtsson et al. (2018) provide evidence that the ability to maintain distance from others (now commonly known as ''social distancing'') may have affected mortality rates favourably. In southern Sweden, farmers were least affected by the Spanish flu, and low-skilled and unskilled workers the most. Although this points towards the importance of solitary work, the protective effect of farming was only found for men, and white collar workers had no mortality advantage compared to the working class. To understand whether social contact affected Spanish Flu mortality, social contact at the workplace needs to be measured in more detail.
Differences in mortality by social class may have been directly affected by job characteristics and working conditions that are not fully captured by social class schemes (Debiasi, 2020). Recent studies have highlighted substantial heterogeneity in mortality rates within social classes but between occupational groups (Debiasi, 2020). The decades 2 There are also studies that focus on vulnerability to infection. One way through which vulnerability could lead to higher mortality during the Spanish Flu was through air pollution, to which the poor were more exposed. Clay et al. (2018) show that pollution due to coal-fired electricity generation worsened the impact of Spanish Flu in US cities, as the air pollution probably made lungs more susceptible to infection. Similarly, Brundage and Shanks (2008) argue that secondary infections, rather than the virulence of Spanish flu itself, played a key role in the high death rates and age profile of the deceased as well as differences between occupational groups. before 1918 were dominated by infectious disease mortality, and occupational differences in mortality may have been, at least partly, related to exposure to infectious disease. For example, among white collar men in Sweden, especially among health professionals, there was a mortality disadvantage, but not among religious professionals. It is contested whether the clergy may have had a better lifestyle than the general population (Debiasi, 2020), as Bavarian monks had no survival advantage over non-cloistered men before the 1950s (Luy, 2003). More likely, the difference in mortality rates between doctors and religious professionals can be explained by exposure to infectious disease due to the nature of their occupation.
In this paper, we take an in-depth look at men's occupational characteristics and mortality during the Spanish Flu pandemic. We overcome some of the shortcomings of earlier studies by studying individual-level data for the entire population of part of the Netherlands, including social and occupational characteristics. Our primary variable of interest is socio-economic status. However, instead of analysing specific occupations, as has been done in earlier studies to allcause and infectious disease mortality, we also measure occupational risk factors for broad categories of occupations. We supplement existing coding systems for occupational skill and status with a coding for occupational risk of exposure to infectious disease. Specifically, we look into the degree of social contact at work and whether workers worked in indoor environments, at the time often in cramped and poorly ventilated conditions with workers working closely together, conditions under which viral transmission tends to be higher than outdoors.

Death certificates and the civil registry
We use death certificates from the Dutch civil registry to calculate excess mortality in 1918 in comparison to deaths in the period 1910-1917. The Dutch civil registry was introduced in 1811 and provided legal proof of birth, marriage, and death. By the turn of the twentieth century the procedure was well-established. For everyone who died in the Netherlands a death certificate was issued, in duplicate so that safekeeping was ensured Vulsma, 1988). Digitisation of these certificates has been ongoing since the 1990s by local and provincial archives. Death records list the date of death, full name, age, sex, place of residence, and the occupation of the deceased. 3 These individual-level records allow us to analyse mortality during the Spanish Influenza pandemic in more detail than before. Other historical sources, such as municipal reports or national health reports, give only aggregated deaths per year and do not split out observations by age, sex, or occupation (see for example the Historical Database Dutch Municipalities Mourits et al., 2016or CBS, 1918. The aggregated number of deaths per municipality is available by age group, but without information on occupation or other indicators of socioeconomic status. Some records of individual-level causes of death data survive in the Netherlands, but they are only available for isolated periods and municipalities. Death certificates are thus the only remaining source that combines individual-level characteristics with sufficient nationwide coverage to generate meaningful insights into mortality patterns along socioeconomic lines. Dutch death certificates contained no cause-specific mortality information. However, total numbers of deaths from death registration are sometimes deemed the most accurate way to address the impact of an epidemic. Shifts in overall mortality rates do not depend on potentially wrong or missing diagnoses of the cause of death, includes individuals who had multiple diseases at the time of death, and includes indirect deaths from the Spanish Flu (Colvin and McLaughlin, 2021). Death certificates have been digitised in all Dutch provinces, but not for all municipalities ( Some certificates were entered by more than one archive, or were digitised multiple times by the same archive, resulting in duplicates. These have been removed from the data. 4 Certificates were digitised on a per-archive basis, and therefore there is variation in what information from the certificates was digitised ( Fig. 1), and we have to drop further municipalities for this reason. We can see clear bimodal distributions of missing data, with municipalities clustering at low or high percentages. We therefore exclude municipalities where the share of reported ages, sex, and full dates is below 50%. 5 In the case of occupations, many of the deceased would not be expected to have an occupation, especially in the case of children, the elderly, and women. Because nearly half the deaths are among women, and the very young and the elderly dominate among those who died in the years 1910-1918, the local share of recorded occupations can be low. We exclude municipalities where the share of recorded occupations is below 10%. We explore the impact of using stricter thresholds in Appendix A.
The changes in the sample due to excluding further municipalities are shown in Table 1, showing that it leaves us with 220 795 certificates for 215 municipalities, concentrated in the provinces of Gelderland, Overijssel, Zeeland, Drenthe, and Zuid-Holland. The most important bias is that three of the four largest cities are missing from the data: Amsterdam, Rotterdam, and Utrecht. However, our data does contain large and medium-sized cities (The Hague, Arnhem, Nijmegen, Leiden, and Apeldoorn), including a number of textile and industrial centres. Because of this and because very small municipalities are underrepresented, the average size of a municipality in our final dataset is 7850 inhabitants, compared to 6015 for the country as a whole. Comparing the average excess mortality rate (EMR) of municipalities that were dropped from our sample due to missing variables (excluding Amsterdam as there is no data available), we find that the difference is negligible: 1.38 outside our sample, and 1.40 in our sample. Table 1 also shows the results of further selection steps, and how each step affects the composition of the included sample. As explained below, we calculate excess mortality based on the months September-December, and exclude the year 1914. We only keep certificates where the deceased is aged 16-78.
We only include men in the analyses. Female occupations were not always recorded due to norms about female labour force participation (especially in marriage records Walhout and van Poppel, 2003). Consequently, for every 11 men with a recorded occupation, we only have information about the occupation of one woman. Moreover, recorded occupations for women fall in a small number of categories, such as maids, seamstresses, and nurses. The high percentage of maids in particular makes the women in our data less skilled than the male population. In part, these are real effects, as labour market opportunities for women were limited, especially in more skilled and high-status occupations. At the same time, we believe the data contain insufficient detail about women's occupations to include them in the current work. We therefore omit women from our analyses. As we discuss below, our results are robust to including women.

Occupational coding
Standardised occupations of the deceased were matched against standardised occupational titles from the Historical Sample of the Netherlands (HSN) . 6 This dataset contains over 280 000 Dutch occupational titles, with numerous spelling variations, coded in HISCO (Historical Classification of Occupations) and HISCLASS (Historical International Social Class Scheme) (Van Leeuwen and Maas, 2011;Van Leeuwen et al., 2002). Almost 98 per cent of occupational titles on the death certificates could be coded this way. 7 To infer the skill level of each occupation, we used the skill category of its corresponding HISCLASS, ranging from high (HISCLASS 1-2), medium (HISCLASS 3-4, 6-8), low (HISCLASS 5,(9)(10), to unskilled (HISCLASS 11-13). We decided to aggregate to these skill level groupings to avoid ending up with very small totals in some of the HISCLASS categories, the higher skilled occupations in particular.
As an alternative to the HISCLASS system, we also use HISCAM occupational classification scores (Lambert et al., 2013). This is a continuous measure that infers distance between occupations in an occupational hierarchy from the social interaction of the occupations incumbents, specifically historical marriage behaviour. The advantage of HISCAM is that it is based on actual behaviour, indirectly accounting for many implicit social factors that affect social stratification, making it a worthwhile addition to the HISCLASS skill dimension. HISCAM also provides a useful alternative coding of farmers. In the HISCLASS skill scheme, farmers are coded as medium skilled workers, while in HISCAM they are viewed as a relatively low status profession. Both arguments are valid insofar farmers had to run a complicated business,  but the certificates we rely on do not always distinguish between the operators of large farms, smallholders and farm labourers. 8 To analyse exposure to infectious disease in the workplace, we used the HISCO classification to create an occupational indicator for exposure. Comparing occupations with higher and lower excess mortality could lead to ad-hoc explanations, whereas creating this new indicator allows us to systematically account for characteristics in occupations that lead to risk of exposure. Moreover, like the skill levels derived from HISCLASS above, this scheme avoids very low regional counts for certain occupations.
Exposure to infectious disease due to working conditions was coded on two dimensions: (1) working in an indoor environment (yes/no), and (2) regular social interaction (yes/no). Coding was done at the three-digit HISCO level. For example, an office clerk was considered to be working indoors, but had infrequent social contact, especially with strangers. A tailor also worked indoors but would have had frequent social contact through dealings with customers. Conversely, a farm worker worked neither indoors, nor in close contact with others. Occupations were independently coded at least two researchers and then compared. Approximately 75 per cent of occupations were coded identically in the first round, and the remaining occupational groups were discussed by the coders until an unanimous decision was reached. The remaining differences were often due to differing interpretation of the underlying occupations in the HISCO unit or the actual tasks performed in an historic occupation, which could often be solved by consulting the documentation of HISCO. 9 Although our scoring omits regional and within-occupation variance in occupational characteristics, it captures perceived conditions for exposure to and transmission of the virus on the work floor. Moreover, the scores measure something different from the skill levels we also use in our analyses. Correlations between these dummy variables range from −0.27 to +0.32 (see Table 13 in Appendix B).

Excess mortality
As death certificates in the Netherlands do not include a cause of death, we work with all-cause mortality. We use excess mortality as our outcome variable: the number of deaths in excess of what would be expected based on previous years. 10 This approach is used because 8 The Dutch term landbouwer which is frequently used on the certificates is not specific enough to make this distinction. 9 Appendix B provides our coding of the most frequent occupations in the dataset. The full coding scheme is described in Schalk et al. (2022), and the data can be found at https://hdl.handle.net/10622/TFG1DQ. 10 Unless stated otherwise, excess mortality in the results and conclusion section is used to refer to excess all-cause mortality during the second Spanish Flu wave. the 1918 municipal population at risk (by age, sex, and occupation) is not available. We calculate excess mortality rates (EMRs) for 1918, comparing the number of deaths between September 1st to December 31st in 1918 by age group, municipality, and occupational cluster to mortality in the preceding years. 11 Baseline mortality was established by calculating the average mortality of the same occupational groups (by age group) in autumn 1910-1917 (September through December). The Netherlands remained neutral during the First World War, so dramatic changes to the population at risk and mortality rates that characterised many other countries at the time are not expected (Colvin and McLaughlin, 2021). The exception was the year 1914, when the Netherlands briefly took in 1 million Belgian refugees, who largely returned before the end of the calendar year. We excluded 1914 from the baseline to make sure that our estimates of excess mortality are not biased by the events of the First World War in the Netherlands.
Our approach relies on the assumption that the population at risk is relatively stable in the period leading up to the Spanish Flu. Fig. 2 shows that relative to the huge mortality spike of the Spanish Flu, weekly death number were stable in the 2.5 years preceding the pandemic, especially in the age categories that we are interested in. Moreover, Fig. 3 shows that the annual number of deaths for the main subgroups of interest, in the selection of the data described above, were also stable over the 1910-17 period.
We also use external demographic data to investigate the possibility of strong demographic shifts in the Netherlands in the 1910-17 period.
The Historical Database Dutch Municipalities shows that across all municipalities, the coefficient of variation in population was on average only 0.04 (Mourits et al., 2016). Total Dutch population figures from the Human Mortality Database show that the coefficient of variation in population by 10-year age groups averaged to, again, 0.04 (Human Mortality Database, 0000, July 2021 version). While the total Dutch population in this period grew 1.5% per year, implying growth in the subgroups, year-to-year shifts in mortality are small compared to the events of 1918.
In Table 2, we show excess mortality rates by occupational skill level. Less skilled people had higher excess mortality. We now turn to analysing these differences further.

Analysis
Excess mortality is modelled using OLS-regressions with the ratio of deaths in September-December 1918 over the average number deaths A. Rijpma et al.  in these months for the period 1910-17 (excluding 1914) as the dependent variable. The distribution of these ratios is highly skewed, so we take the logarithm of the dependent variable. 12 However, as we calculate excess mortality per municipality by age, and social class, for 59% of our observations for 1918 there are no deaths, resulting in a ratio of zero. We do not discard these data because the fact that these groups had zero mortality during the Spanish Flu is important. To include these observations, we add one to the dependent variable. This means we estimate excess mortality in region , month , occupational group , and age group as follows: Standard errors are clustered by region (Zeileis et al., 2020). For regions we use Economic Geographic Regions (EGG): clusters of municipalities defined by Statistics Netherlands below the NUTS-3 level. 13 We prefer to work at this relatively low level of aggregation to be able to control for the spatial patterns of the pandemic and differences between urban and rural regions which the EGG regions capture, but NUTS-3 region in the Netherlands do not.
To verify the validity of our outcomes, we conduct a number of robustness checks. First, we explore whether the level of geographic aggregation affects our results. Larger geographic areas mean there are fewer regions without deaths in 1918, and, more generally, will increase precision in our data, albeit at the expense of a lower number of observations. In addition to the EGG level, we also estimate our models for municipalities, COROP 14 (NUTS-3) regions, and provinces (NUTS-2). For similar reasons we also provide separate estimates for high and low mortality regions. This allows us to focus on the regions most affected by the Spanish Flu pandemic, and again, exclude many cases of zero deaths. Additionally, this provides a check for the importance of high-mortality outliers. The cutoff was an overall excess mortality 2.5 times the mortality we would usually expect in September-December. In addition, we also look at a variety of alternative types of models to deal with the zeros in our data. We also check whether population density affect our estimates due to easier transmissibility in cities compared to the countryside. We also check whether the 1914-1918 mobilisation of troops affects our results by controlling for the presence of army bases. We also include alternative specifications for our occupational exposure measures and estimate the models including women. Finally, we estimate our models again but using stricter thresholds in the data selection process. Table 3 shows the results of this model, with progressively more controls to come to our preferred model in the rightmost column. For each estimate we report the coefficient, standard error, and significance level.

Results
In the most basic model, it can be seen that there is a skill gradient in the excess mortality rates of 1918. Compared to higher-skilled workers, medium skilled men had 55 percent higher mortality than they would experience in a normal year (exp(0.44) -1). For unskilled workers this was higher still: their excess mortality was 68 percent higher than that of higher skilled workers. Lower skilled workers also had substantially higher excess mortality than higher-skilled workers, but not as high as unskilled workers and medium-skilled workers. An important driver of this break in the skill gradient are farmers. They are a large group in our dataset with relatively high mortality who are classified as medium-skilled in the HISCLASS scheme. We return to this point in more detail below. Here we observe that if we separately control for farmers (model 2 onwards), excess mortality is still not strictly increasing with occupational skill, but closer to a full social gradient. Next, we introduce our occupational exposure variables (model 3). Here, we find that contact occupations had nearly 20 percent higher excess mortality compared with occupations that had neither of the exposure characteristics. In this specification with limited controls, the other occupational exposure variable, indoor work, is not associated with higher excess mortality. Once further controls are added it also shows a positive association, but it is not estimated with sufficient precision to draw firm conclusions. However, an F-test does show that the exposure variables are jointly significant (model 7 has the lowest F-statistic at 4.05, p = 0.006).
Of particular interest is what happens to the skill gradient when we introduce these exposure variables (model 4). We find that when controlling for exposure, the skill gradient in excess mortality remains largely unchanged, although the mortality estimate for farmers increases. 15 Overall, we find that working indoors does not predict higher excess mortality. However, occupational exposure to infectious disease 15 We also note that the increase in standard errors is limited compared to model 2 and 3, suggesting the moderate correlations between these two sets of variables discussed in Section 3.2 are not an issue in the model. due to frequent contact predicts somewhat higher excess mortality, though skill remains a more important predictor.
Our next model adds demographic controls. Adding age controls reduces the social gradient in excess mortality describe above somewhat, but it is still clearly present. This is important because we know that Spanish influenza affected younger men in particular, and we can expect them to have lower-skilled occupations due to career effects. Indeed, keeping all else constant, we find the highest excess mortality for age groups 16-30 (reference category), with progressively lower mortality predicted for older age groups. Adding month dummies to capture the phase of the epidemic also leaves the estimates for skill level largely unchanged. We finally add region fixed effects to adjust for the fact that some regions were struck harder than others (the rural, poorer North-East in particular, see Mourits et al., 2021). This increases the skill gradient again. In our preferred model, including the full set of demographic controls and region and time fixed effects medium skilled workers have 46 percent higher excess mortality compared to high skilled workers; lower skilled workers 23 percent and unskilled workers have 70 percent higher excess mortality. The predicted excess mortality due to the exposure characteristics of the occupations of the deceased is 19 percent for contact occupations compared to the baseline of occupations characterised by neither exposure risk.
To further analyse the socioeconomic gradient gradient in excess mortality during the Spanish Flu, we turn to the role of farmers in our results and an alternative measure of social status -HISCAM. These alternative models are tested in Table 4. In column 2, the preferred model from Table 3 is estimated again, but for a dataset where excess mortality was calculated without farmers. In the next column, as another alternative, we recode farmers as lower skilled, rather than medium skilled as in the HISCLASS scheme.
Omitting farmers from our dataset makes only a limited impact on the social gradient we find. Recoding farmers as lower skilled does make a difference. The N-shaped gradient, where, compared to higher A. Rijpma et al.

Table 4
Regression models of log excess mortality rate, with different treatments of farmers, and with HISCAM scores rather than HISCLASS-based skill levels. Region-clustered standard errors between parentheses. skilled workers, lower skilled workers had lower excess mortality than medium skilled workers, disappears. Medium and lower skilled workers now have similar excess mortality. In this model, our occupational exposure variables show the counter-intuitive result, where working indoors now predicts lower excess mortality. We surmise that this is because the composition of the reference group is changed strongly by farmers group who were classified as having neither an indoors nor a contact occupation. We also estimate our preferred model with HISCAM scores instead of HISCLASS. Again, we find a socioeconomic gradient gradient in excess mortality. The model using HISCAM shows that, keeping all else constant, higher occupational status predicts lower excess mortality during the Spanish Flu. The HISCAM scores in our data range from 0.40-0.99, so going from the lowest to the highest status occupation predicts a decrease in excess mortality of 23%.
To assess if high-mortality regions drive our results, Table 5 compares how our preferred model performs in regions with relatively low and high overall mortality. Making this distinction is a more direct assessment of the impact of regional Spanish Flu severity versus occupational or demographic characteristics. The cutoff was an overall excess mortality rate of 2.5, that is 2.5 times the mortality we would usually expect in September-December. As expected, we find that most estimates in the high mortality areas are higher compared to low mortality areas, and we lose some precision on the estimates due to the lower number of observations. Otherwise, the overall results are similar, suggesting that the basic occupational patterns in excess mortality during the Spanish Flu were present regardless of the presence of high-mortality clusters. Table 5 Regression models of log excess mortality rate for low (EMR below 2.5) and high (above 2.5) excess mortality regions. Region-clustered standard errors between parentheses.

Robustness checks
To verify the robustness of our results, we included a number of alternative models in the Appendix. In Table 6 we estimate the preferred model from Table 3 at different levels of geographic aggregation: municipalities, EGG regions, Corop (NUTS-3) regions, and provinces (NUTS-2). While we prefer to work at a relatively low level of aggregation, a concern could be that this results in cells in our dataset with few observations, either for our 1910-17 baseline, or for the Spanish Flu pandemic deaths. Overall, the results are similar to the preferred EGG aggregation level, though overall effect sizes tend to be larger at the higher aggregation COROP (NUTS-3) level, and we lose some precision at higher levels of aggregation.
Important factors in the spread of any infectious disease are population density and geographic mobility, as these make it easier for the disease to move from one person to the next. While the inclusion of region fixed effects in our models should capture any time invariant aspects of these factors, we explore this possibility further in Table 7, Appendix A, where we show the effect of including population density and the presence of army bases or hospitals in a municipality. Because these variables are measured at the municipality level, we aggregate our data to this level as well for these checks. As expected, the overall socioeconomic gradient is the same as these effects would also be captured by the location fixed effects included in the models in Table 6. The effect of the population variable itself is positive. However, population density does not have a significant effect on excess mortality, even if we exclude population size from our model.
In the same table, we also include a control for municipal military bases in part of our models for more reliable estimates of excess mortality. Troop movements and crowded conditions at army bases could are suspected to be important contributors to the spread of the Spanish Flu (Flecknoe et al., 2018). Although the Netherlands remained neutral during World War I, the Dutch army was mobilised from 1914 to the end of 1918. Living conditions in the barracks and other army camps were poor (Koten and Weel, 2014), and young adult men were affected by the Spanish Flu in large numbers. As many soldiers were conscripts, their occupation listed at death would not be 'soldier', but their occupation before the War. Only 0.2% of the death certificates have occupational titles that can be linked to the military between 1910 and 1918. Given that over 200.000 men out of 6.5 million inhabitants were conscripted, this is surely an underestimation of deaths among conscripts. To control for this potentially strong increase in local excess mortality around military bases, we add a set of dummy variables to indicate whether divisions of the Dutch army were present in 1918 at the places of death recorded in the certificates. These locations were obtained from a list of military bases in 1918 compiled by Ringoir (1980) and from the locations where Dutch soldiers were admitted to military hospitals in 1913, listed in annual reports (Landmacht, 1913). We used military hospitals for 1913 because no military hospital reports are available between 1914 and 1918, and the recording procedure changed after the War. The presence of army bases shows a negative, though statistically insignificant association with excess mortality, which is surprising given positive results found by other studies (Clay et al., 2019). It should be mentioned, however, that one of the most important military bases in the Netherlands, the main navy base in Den Helder, had some of the highest excess mortality in the country (see Fig. 1), but had to be excluded from our dataset because the occupations for these certificates were not digitised.
There are a number of ways in which the occupational exposure variables could have been used in the models, and in Table 8, Appendix A we explore a number of these specifications. We first look at a model that omits the interaction between indoors and contact occupations. This reduces the size of the effects we find and they are no longer statistically significant. While we think the combination of the two kinds of exposure is important to include in the model, we also note that this further shows that the occupational exposure effect is not consistent in our data. We next look at an additive measure of occupational exposure, by adding up the two dummy variables so that no risk equals zero, one of both indoors or contact risk results in a score of one, and both being present gets a score of two. We find that when we enter this variable as a linear predictor it does not have a statistically significant association with excess mortality. We prefer the specifications where occupational exposure enters as separate dummies because it does not require us to make assumptions about the size of the two variable's effect sizes as adding them up would.
In Table 9 in Appendix A we check whether the thresholds we used in our data selection procedure impact our results. Specifically, we check whether using higher coverage thresholds (at least 80% of age, sex, and date observations in a municipality have to be present rather than 50%). For presence of occupations, we use a threshold of 20% rather than 10%. The results are very similar, the main differences being that the estimates on our exposure variables have become stronger.
We also check whether including women alters our results (Table 10), applying the same model as in Table 3, with the addition of a dummy variable for sex. We find that men in this data indeed had higher excess mortality during Spanish Flu. Beyond that, our main findings are similar across these two models.
As a final check of our results, we investigate whether the way we handled the cases of zero deaths during the Spanish Flu in our model affects our results. The following alternative approaches are explored: dropping cases of zero deaths (column 2), not taking the logarithm of the dependent variable (column 3), a quasiPoisson model with the ratio of 1918 to baseline deaths as the dependent variable (Silva and Tenreyro, 2006, column 4), a quasiPoisson model of the number of deaths during the Spanish Flu pandemic with the log baseline deaths as the offset (column 5), and taking the inverse hyperbolic sine of the ratio (Bellemare and Wichman, 2020; Burbidge et al., 1988, column 6).
Appendix A, Table 11 shows that the socioeconomic gradient in excess mortality and the higher excess mortality for contact occupations we find in our preferred specification is found in the alternative models as well. The main exception here is the model where we exclude groups with zero deaths during the Spanish Flu pandemic, in which the patterns we find elsewhere are reversed. As mentioned earlier, we think the absence of deaths during the Spanish Flu pandemic is a relevant outcome that should be included in the model. The precision of our estimates is also lower in the two quasiPoisson models. Beyond that, what stands out is that the alternative models provide higher estimates of the same pattern.

Conclusion
This paper explored whether there was a socioeconomic gradient in excess mortality among male workers during the Spanish Flu in the Netherlands. Using individual level occupational information and all-cause mortality data, we calculated excess mortality and found it to be highest among farmers and unskilled labourers and lowest among higher-skilled labourers. Excess mortality for medium-skilled and lower-skilled workers was in between these groups. These findings are in line with evidence from a recent study on Sweden (Bengtsson et al., 2018) and earlier regional evidence, e.g. for Kristiania (Oslo, Norway) (Mamelund, 2006), and strongly contrast with the perceptions of contemporaries that the Spanish flu had a relatively egalitarian impact across the society (Mamelund, 2006;Quanjer, 1921).
The national death registers we relied on included individual information on the age, occupation, and date of death as well as sex and the municipality of residence of the deceased. This information allowed us to estimate the association between social class and excess mortality, while also taking occupational characteristics, age, and regional variation in excess mortality into account (Colvin and McLaughlin, 2021;Mamelund, 2011). Moreover, controlling for municipality size and presence of local army bases, or analysis of specific regions with high excess mortality resulted in similar estimates of the magnitude of socioeconomic differences in excess mortality during the Spanish flu. In other words, there was a clear socioeconomic gradient in excess mortality during the autumn wave of the 1918-19 influenza pandemic. As we study excess mortality, these estimates come on top of the existing social gradient in mortality among individuals age 35-55 in the Netherlands (Van Poppel et al., 2009), indicating that the Spanish Flu increased existing social inequalities in mortality.
Our analyses take into account whether the social differences in excess mortality during the Spanish Flu could have been driven by occupational characteristics, including social interaction at the workplace and indoor work. For COVID-19, it has been shown that occupational conditions are related to workplace infections (Godefroy and Lewis, 2022). Similarly, working together with others in an enclosed space or regular social interaction at the workplace could have been related to increased chances of infection and death with the Spanish Flu in autumn 1918. To test this hypothesis, a contact index was constructed. We found some evidence that social contact at the workplace predicted higher excess mortality during the Spanish Flu pandemic. However, contrary to our expectations, it did not seem to matter whether individuals worked together indoors or not. While the estimates of our contact index were generally robust for controlling for demographics, municipality size, local army bases, regional fixed effects, or specifically looked at regions with high excess mortality, it should be noted that the effect we find for social contact depends on how these exposure variables are included in the model.
Our findings indicates that the higher social strata were, to some degree, able to protect themselves against the Spanish flu. It is tempting to think that such differences result from differences in income, as educational divides in mortality were uncommon before the second half of the 20th century. Earlier studies show inequalities in Spanish flu at the macro level. Neighbourhoods with larger housing had lower Spanish flu mortality (Mamelund, 2006) and, across the globe, poverty was related to higher Spanish flu mortality (Murray et al., 2006). In the Netherlands in 1918, lower-skilled and medium-skilled labourers were by no means rich, but they could afford significantly better housing and food than unskilled labourers (Brooshooft, 1897;Van Der Woud, 2010). However, material resources are unlikely to fully account for the advantages that social status provides, as non-material resources -such as existing health conditions, occupational hazards, knowledge about infectious diseases, and access to health care -have been at least equally important in explaining survival differentials by social status in other social contexts (Link and Phelan, 1995;Clouston et al., 2016;Debiasi, 2020;Elo, 2009;Edvinsson and Broström, 2012) including the ongoing COVID-19 pandemic (Drefahl et al., 2020).
We found that farmers had higher excess mortality during the Spanish Flu, especially after we control for occupational skill levels, occupational social contact and age composition. This contradicts earlier work on mortality during the Spanish Flu in Sweden (Bengtsson et al., 2018). More generally, at the turn of the 20th century, farmers were known to outlive their peers in many populations (Ferrie, 2003;Gagnon et al., 2011;Mourits, 2019;Schenk and Van Poppel, 2011;Smith et al., 2009). In the Netherlands, excess mortality rates during the Spanish Flu were especially high in Drenthe and parts of Groningen and Overijssel -rural and poor regions where many people were involved in farming (Mourits et al., 2021). Locals noted that the poverty, bad living conditions, and lack of medical care might have elevated mortality in this region (Vugs, 2020). In normal circumstances, farmers were able to outlive their peers even if they were poor, as they continued to be physically active at advanced ages and had access to fresh food in a time when refrigeration was not commonplace (Mourits, 2019;Pes et al., 2013). However, there is no evidence that farmers in the Netherlands lived longer than the middle class or elite between ages 35 and 55 (Van Poppel et al., 2009), which was part of the age group hit hardest by the Spanish flu. It could be that poverty protected farmers from lifestyle diseases, enabling them to live to old age, but made them vulnerable to outbreaks of infectious disease due to poor housing, malnutrition, and lack of medical care.
Some drawbacks of our approach and the data used should be mentioned. We analyse all-cause excess mortality, while ideally we would use sufficiently detailed and reliable cause-specific mortality. High excess mortality during the Spanish Flu means we should mostly pick up Spanish Flu related deaths, but uncertainty remains. Second, incomplete digitisation of death certificates as well as our data selection steps mean we lose some of the large cities in the Netherlands, most notably Amsterdam. While our sample showed no large biases on the basis of observable characteristics, differences in the spread of the virus in high population density settings are possible, and could impact our results. Third, the specific age structure within occupational groups could affect results, if relatively more young men -who were affected hardest by the Spanish Flu -were unskilled labourers, for example. Finally, in this work we have only included men. Occupational characteristics of women were hard to reconstruct from the death certificates we use. Since both mortality and occupational structure are known to have differed between men and women, this is an important point for future research.
The Spanish flu did not hit the Dutch population evenly. Higher social classes experienced lower excess mortality rates. These differences between socioeconomic groups were not explained by occupationspecific risk factors included in the analyses in this work. Frequent social contact at the workplace was associated with increased excess mortality during the Spanish Flu pandemic, but cannot account for the socioeconomic gradient in excess mortality. Overall, the social factors that put populations at higher risk during the pandemic may also have been factors that determined existing health differences across the population, exacerbating existing social gradients in mortality.

CRediT authorship contribution statement
Auke Rijpma: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Ingrid K. van Dijk: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Ruben Schalk: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Richard L. Zijdeman: Acquisition of data, Analysis and/or interpretation of data, Revising the manuscript critically for important intellectual content. Rick J. Mourits: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content.

Funding and acknowledgements
We would like to thank Svenn-Erik Mamelund, Mar Rodríguez-Girondo, Niels van den Berg, participants to OsloMet's PANSOC webinar and two anonymous referees for their comments on earlier drafts of this paper. Any remaining mistakes are our own. Auke Rijpma, Ruben Schalk, Richard Zijdeman and Rick Mourits acknowledge financial support from the Netherlands Organization for Scientific Research (NWO) for the . Ingrid van Dijk was funded through the research programme ''Landskrona Population Study'' and project ''An Age Old Advantage?'', Swedish Foundation for the Humanities and Social Sciences (Riksbankens Jubileumsfond, RJ). All authors approved the final version of the manuscript.

Table 8
Regression models of log excess mortality rate, with alternative specifications for the exposure variables. Region-clustered standard errors between parentheses.

Table 11
Regressions of log excess mortality rate, using alternative model forms to deal with observations of 0 excess mortality. Region-clustered standard errors between parentheses.