Heterogeneity in disease resistance and the impact of antibiotics in the US

We hypothesize that the impact of antibiotics is moderated by a population’s inherent (genetic) resistance to infectious disease. Using the introduction of sulfa drugs in 1937, we show that US states that are more genetically susceptible to infectious disease saw larger declines in their bacterial mortality rates following the introduction of sulfa drugs in 1937. This suggests area-level genetic endowments of disease resistance and the discovery of medical technologies have acted as substitutes in determining levels of health across the US. We also document immediate effects of sulfa drug exposure to the age of the workforce and cumulative effects on educational attainment for cohorts exposed to sulfa drugs in early life.


A1 Summary Statistics
. This matching is listed in the separate Matching Appendix.
Individuals in the Census can report up to 2 ethnicities/ancestries. For those reporting 2 ancestries, we simply take the average of the matched HLA similarity score.
• Mixed HLA Susceptibility: Our primary way of calculating state-level HLA susceptibility takes the weighted average of each reported ethnicity's HLA homozygosity. This method assumes no mixing among different ethnic groups. The other extreme considers fully admixed populations. To account for this extreme, we take the weighted average of genetic variants (or alleles) to find the frequency of the variant in the larger (mixed) population. Expected homozygosity is then calculated using the admixed allele frequencies, creating a measure of HLA susceptibility for fully integrated populations.
State-level allele frequencies are found in a similar manner as the base/segregated measure of HLA diversity: we simply match reported ancestry for those born 5 years prior to the 1937 intervention, to ethnic allele frequencies of Cook (2015). We then take the weighted average of these frequencies to create a state-level allele frequency. Mixed HLA susceptibility is then expected homozygosity calculated from these state-level allele frequencies.

Outcomes
• Bacterial Mortality Rate: The sum (excluding missing) of mortality rates (deaths per 100,000) from typhoid, scarlet fever, pertussis, tuberculosis, diphtheria, influenza and pneumonia, diarrhea and enteritis, maternal mortality, and syphilis. Data are given at the state-year level. The availability of data differs by year. • Years of Schooling: An individual's reported years of schooling. Data are from the detailed educational attainment (EDUCD) of the 5% census samples for years 1980, 1990, and 2000(Ruggles et al. 2020).

Time Varying Controls
• Annual Average Monthly Temperature: The annual average of monthly average temperature (in Fahrenheit). These data are from https://www7.ncdc.noaa.gov/CDO/CDODivisionalSelect. jsp.

Demographic Controls
• Fraction Black: The 1936 state-level fraction of the population that is black. This data is by way of Lleras-Muney (2002, 2005.
• Urbanization Rate: The 1936 state-level urban fraction of the population. This data is by way of Lleras-Muney (2002, 2005.
• Fraction of Foreign Born: The 1936 state-level fraction of the population that is not native to the United States. This data is by way of Lleras-Muney (2002, 2005.
• Ethnic Fractionalization: Ethnic fractionalization is measured as a Hirfendahl index for the fraction of a state's population attributed to each reported ancestry/ethnicity for the base sample of Census respondents used to measure HLA diversity.

Additional Census Demographic Controls
• Race: An individual's reported race. From the RACE variable of the 5% census samples for years 1980, 1990, and 2000(Ruggles et al. 2020).
• Sex: An individual's reported sex. From the SEX variable of the 5% census samples for years 1980, 1990, and 2000(Ruggles et al. 2020).
• Rural: An indicator denoting whether an individual does not live in a metro area. From the METAREA variable of the 5% census samples for years 1980, 1990, and 2000(Ruggles et al. 2020).

Notes:
This table examines a differential trend break in our base DD analysis-i.e., with state and year (or year-by-census-division) FE and our full set of controls. In short, this table replicates Table 1 adding Post-1937 × Trend × Std. HLA Susceptibility to measure differences in the post-1937 linear trend tied to HLA susceptibility. As shown, this coefficient is negative and significant when regressing bacterial mortality suggesting that the estimated level decline in Table 1 is due to a differential linear trend tied to HLA susceptibility following treatment. No such difference is estimated for residual mortality in Panel B. Standard errors are clustered by state with *, **, and *** being respectively associated with statistical significance at the 1, 5, and 10% levels.  1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean  1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean   1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 (a) Segregated HLA  1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 (b) Mixed HLA

Effect of HLA Susceptibility on Age of Labor Force by Year
Notes: This figure plots annual coefficients relative to 1936 for the empirical specification of column (5) of Table A7. Sub-figure (a) plots the annual coefficient for the segregated measure of HLA susceptibility and shows positive effects on age following the introduction of sulfa drugs in 1937. As discussed in greater detail in the text, there is a pretrend for the segregated measure; however, we believe this is due spillover of the treatment into those early in childhood. Sub-figure (b) plots the annual coefficient for the mixed HLA susceptibility measure, showing very similar coefficients as those of the segregated measure.  1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean  1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean of Dependent Variable 9.58 9.58 9.58 9.58 9.58 9.75 Pre-period (Year<1937) 9.40 9.40 9.40 9.40 9.40 9.40 Post-period (Year≥1937) 9.73 9.73 9.73 9.73 9.73 10.13

A8 Effect on Income
Controls:   1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean  1932-1942 1932-1942 1932-1942 1932-1942 1932-1942 1900-1970 Mean Summary & Notes: Instead of contemporary relationships, this table examines life long impacts arising from the 1937 treatment and its differential impact across states. To do so, we examine how the natural log of individual income (from the same 1980, 1990, and 2000 5% census samples used to calculate the HLA score) changes by birth cohort exposure to treatment. This table shows that individuals in states that were more exposed to infectious disease by their ancestral susceptibility also experienced relative gains in their income. Panel A uses a measure of HLA susceptibility that assumes fully segregated ancestral populations, and Panel B uses a measure of HLA susceptibility that assumes fully mixed ancestral populations. The demographic set of controls include the fraction of a state's population that is black in 1936, the fraction of a state's 1936 population that is foreign born, the urbanization rate in 1936, the state's population in 1936, the number of state-level in and out migrants between 1935-1940, and a measure of ethnic fractionalization based on the census-level reported ethnicity. The set of infrastructure controls includes education expenditures per capita in 1936, schools per square mile in 1936, hospitals per square mile in 1936, physicians per capita in 1936, and state-level real income in 1936. Individual controls include indicators for sex, urban/rural status, age, and race. We also include an indicator for those with no income in all estimations. Standard errors are clustered by state. Statistical significance is denoted by *, **, and ***, representing significance at the 10, 5, and 1% levels, respectively.