1 Introduction

COVID-19 has had a disproportionate impact on racial and ethnic minority communities in the United States. Early in the pandemic, The New York Times noted that “The coronavirus is infecting and killing black people …at disproportionately high rates” [1]. Six months into the pandemic, as of August 2020, the Centers for Disease Control and Prevention (CDC) documented that American Indian or Alaska Native (AIAN) and Hispanic persons were experiencing the highest infection rates, at 2.8 times the rate for non-Hispanic Whites. The infection rate of Black or African Americans was 2.6 times higher than Whites. Blacks were also dying at twice the rate of Whites, followed by AIANs (1.4 times) and Hispanics (1.1 times) [2]. As of March 2021, a year into the pandemic and more than three months after the rollout of COVID-19 vaccinations, the disease burden, particularly mortality, remained markedly high in minority communities. Nationwide, mortality (deaths per 100,000 people) was highest among Blacks (178), followed by AIANs (172), Hispanics (154), Native Hawaiian or Other Pacific Islanders (NHPIs; 144), and Whites (124) [3].

It is generally recognized that racial/ethnic differences in COVID-19 mortality were broadly manifesting long-term structural inequities, whether in education, employment, housing, healthcare or the criminal justice system [4,5,6,7]. Some of the effects of these inequities were more direct and immediate. People of color are overrepresented in frontline and public-facing occupations—in healthcare and other service industries—that increased their exposure to the virus [8, 9]. They are also more likely to live in overcrowded, multi-family dwellings, which facilitated the transmission of the disease [10], and in areas that record higher levels of long-term air pollution [11, 12]. Exposure to pollution has been shown to be a significant factor in serious complications and fatality from COVID-19 [13]. But many of the structural inequities also caused these communities to have poor health generally and to be burdened by specific comorbidities. African Americans, for example, suffer disproportionately from comorbidities such as obesity, hypertension, diabetes mellitus and heart disease, all of which are associated with more severe COVID-19 outcomes [11, 14, 15].

A majority of the early reports and studies on the pandemic’s uneven impacts reported and analyzed the disparities in cases and disease outcomes (e.g. hospitalizations, deaths) by race/ethnicity (see for example, [16,17,18]). While informative and generally insightful, this type of bivariate analysis does not account for well-known socioeconomic and environmental correlates, thereby failing to shed much light on the mechanisms through which the race/ethnicity effect operates. Later studies did investigate the degree to which the race/ethnicity effect in COVID-19 outcomes is mediated by different factors, from social determinants of health [19,20,21,22] to structural racism [4, 23]. But a majority of these studies typically focused on a single racial/ethnic group (e.g. African Americans) and analyzed the race/ethnicity effect at a single point in time during the pandemic.

The purpose of this study is two-fold. First, in its approach, the paper aims to provide the a more complete picture of the race/ethnicity effect in COVID-19 mortality in the U.S. To do this, it assesses the effect for all racial/ethnic groups (as classified by the U.S. Census Bureau) and over time (across four dates in the first 13 months of the pandemic). There have been calls for a more encompassing analysis of the race/ethnicity effect that considers all groups and captures the evolution of racial/ethnic disparities in COVID-19 outcomes over time [18, 19, 21, 24]. Second, in its analysis, the study estimates the race/ethnicity effect “net” of basic socioeconomic factors (henceforth SEF), namely poverty, employment, income and education. By explicitly considering the role of SEF in racial/ethnic disparities in COVID-19 mortality, the results of the analysis would be instructive for policy. If, for example, much of the race/ethnicity effect turns out to be mediated by basic SEF, addressing structural inequities in these areas through targeted policies will help avert similar outcomes in a future health crisis (not to mention the imperative of addressing each dimension of inequity in and of itself).

The data required to conduct multivariate analysis of this type are rather profound. Ideally one would use a person-level dataset that maps disease outcome to clinical background and demographic and socioeconomic information [25]. Data records of this kind are unavailable in the U.S. except in some specific clinical settings (e.g. from a particular health provider; even then the socioeconomic data are typically very thin) [26]. In fact, researchers have generally lamented the lack of systematic collection and reporting of disaggregated race/ethnicity data on COVID-19 nationally [27, 28]. Except for a handful of states that report county-level data, mortality by race/ethnicity and related demographics in the U.S. are typically reported at the state level only [3]. Given the data shortcomings, this study will use county-level data in an ecological regression framework [11, 13]. Specifically, county-level COVID-19 mortality will be regressed on measures of county racial/ethnic composition, basic SEF, and a set of covariates.

2 Data

The data for the study come from various sources, all of which are publicly available. County-level cumulative deaths are obtained from The New York Times and are based on reports from state and local health agencies. The Times dataset documents cumulative cases and deaths beginning with the first reported case in the U.S. on January 21, 2020 (the first documented death was on February 29, 2020). The population and race/ethnicity data are from the Census Bureau’s Annual County Resident Population Estimates (2010–2018) and are sourced from the dataset put together by Killeen et al. [29]. The demographic and socioeconomic variables are obtained from the Social Vulnerability Index (SVI) database (2018) of the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry (CDC/ATSDR). The underlying data are based on the U.S. Census Bureau’s five-year (2014–18) American Community Survey (ACS). The environmental variable, which measures county-level pollution using estimates of fine particulate matter in the air (PM2.5), comes from Wu et al. [13].

Because one of the aims of this study is to document and analyze how the relationship between race/ethnicity and COVID-19 mortality evolved during the course of the pandemic, four dates during roughly the first year of the pandemic are chosen for analysis. Each of these dates is associated with some kind of a milestone. The dates are: May 15, 2020 (toward the end of the first wave), August 15, 2020 (toward the end of the second wave), December 15, 2020 (a day after the rollout of the vaccination program), and March 15, 2021 (almost exactly a year after the World Health Organization declared COVID-19 a global pandemic and the U.S. declared a national emergency).Footnote 1 All U.S. counties for which cumulative deaths are reported for at least one of the above dates and for which data on the other variables are not missing are included in the analysis. Puerto Rico does not report COVID-19 deaths at the county (municipio) level and is not part of this study. Alaska is also excluded because it lacks the air quality data. Because The New York Times dataset combined the five boroughs of New York City into a single area for reporting COVID-19 related statistics, data from the New York City Department of Health and Mental Hygiene at the borough/county level are used instead. Figure 1 presents cumulative deaths in all sample counties across the analysis timeline.

Fig. 1
figure 1

All-county cumulative deaths across analysis timeline

Table 1 presents summary statistics on the relevant county-level variables that are used in the regression exercises. As of May 15, 2020, county COVID-19 mortality, measured as deaths per 100,000 people, averaged 10.7, which rose to 28.9 as of August 15, approximately six months into the pandemic. At the beginning of the vaccination campaign, average county deaths reached almost 100 (98.6). As of March 15, 2021, a year from the official declaration of a national emergency in the U.S., county-level mortality averaged 180.5. The distribution of deaths was highly skewed at the beginning of the pandemic, with mean mortality at five times the median. As the pandemic spread across the country, the skewness declined considerably—a year into the pandemic, the mean was only about 1.1 times the median. Even so, there was significant variation in county mortality and about 2 percent of counties recorded zero deaths.

Table 1 Summary statistics on important variables

For the purposes of this study, county racial/ethnic composition is measured in two ways. The first is by the percentage contribution of each racial/ethnic group to county population. Following the standard classification adopted by the U.S. Census Bureau, all racial/ethnic groups are considered. These are: non-Hispanic White alone (White); non-Hispanic Black or African American alone (Black); non-Hispanic American Indian and Alaska Native alone (AIAN); non-Hispanic Asian alone (Asian); non-Hispanic Native Hawaiian and Other Pacific Islander alone (NHPI); non-Hispanic Two or More Races (mixed); and Hispanic. Looking at the racial/ethnic compositions in Table 1, in the average county, Whites comprised three-quarters of the population, followed by Hispanics and Blacks, at 9.6 and 9 percent, respectively. AIANs contributed to about 1.9 percent of county population, on average, about the same as that of people with two or more races (1.8 percent).

The second measure stratifies counties according to their largest racial/ethnic group (that is, plurality group) and constructs dummy (or indicator) variables. A dummy variable equals 1 if the largest share of a county’s population comes from a given racial/ethnic group, and is zero otherwise. For example, the dummy Largest_Black is equal to 1 for counties where Blacks comprise the largest racial group in a county, zero otherwise. Largest_AIAN and Largest_Hispanic are similarly defined for counties where AIANs and Hispanics comprise the largest racial category, respectively. Due to the small number of counties where Asians or NHPIs individually comprise the largest racial group, a single dummy variable is defined for counties where either group is the largest (Largest_ANHPI). No county in the U.S. has Two or More Races (mixed) as the largest racial group. The summary statistics in Table 1 show that Whites comprise the largest group in 91 percent of U.S. counties. Blacks and Hispanics each made up the largest group in roughly the same proportion of counties (4.1 and 3.9 percent, respectively). Only 1.1 percent of counties had AIANs as the largest racial/ethnic group.

To motivate the empirical analysis, Figs. 2 and 3 show the nature and evolution of the bivariate association between county racial/ethnic composition and county COVID-19 mortality. Figure 2a–d plot, respectively, cumulative county deaths per 100,000 people on the county population share of Blacks, AIANs and Hispanics as of each of the four dates (May 15, August 15 and December 15, 2020, and March 15, 2021). A visual inspection reveals that, three months into the pandemic, only the Black percentage of a county’s population showed a positive association with county deaths. As of August 15, the bivariate relationship turned positive for Hispanics as well, while the gradient on the Black share also sharpened further. By December 2020, each of the three groups exhibited very similar plots, with a discernible positive association between their contribution to county population and total county deaths. As of March 2021, as cumulative deaths rose, the deaths vis-à-vis population share curve got steeper for all three groups.

Fig. 2
figure 2figure 2

Deaths and share of racial/ethnic group in county.

Fig. 3
figure 3

Mean COVID-19 deaths by county type based on largest racial/ethnic group

Figure 3 documents further evidence of this trend by plotting mean deaths by type of county, where as noted counties are stratified by their largest racial/ethnic group. In the first six months of the pandemic, counties where Blacks were the plurality experienced the highest death rates—92 deaths (per 100,000 people) on average as of August 2020, nearly twice the rate of the groups with the next highest mortality (Hispanic and AIAN plurality counties). As of December 2020, however, the average death rate in AIAN plurality counties marginally surpassed that of Black counties. This remained to be the case roughly a year into the pandemic (March 15, 2021), when counties where AIANs made up the largest racial/ethnic group had an average death rate of 273, followed by Black (267), Hispanic (248) and White (173) plurality counties. Although very few in number, counties where Asians/NHPIs comprised the largest group had by far the fewest deaths (39). In sum, these simple bivariate relationships present strong baseline evidence that during the first year of the pandemic, U.S. counties that had higher shares of minority populations on average experienced higher mortality.

3 Regression methodology

For the multivariate analysis, two types of regression models are adopted depending on how county racial/ethnic composition is defined and measured. In the first model, the percentage share of each racial/ethnic group in county population is used. Accordingly, the model is:

$$\begin{gathered} D_{i} = \beta _{0} + \beta _{1} {\text{Percent}}\_{\text{Black}}_{i} + \beta _{{\text{2}}} {\text{Percent}}\_{\text{AIAN}}_{i} {\text{ + }}\beta _{{\text{3}}} {\text{Percent}}\_{\text{Asian}}_{{\text{i}}} {\text{ + }}\beta _{{\text{4}}} {\text{Percent}}\_{\text{NHPI}}_{i} \hfill \\ \quad {\kern 1pt} {\kern 1pt} \quad {\text{ + }}\beta _{{\text{5}}} {\text{Percent}}\_{\text{Mixed}}_{i} {\text{ + }}\beta _{{\text{6}}} {\text{Percent}}\_{\text{Hispanic}}_{i} {\text{ + SEF * }}\gamma {\text{ + X * }}\Theta {\text{ + }}\varepsilon _{i} \hfill \\ \end{gathered}$$
(1)

where \({D}_{i}\) is COVID-19 cumulative deaths in county i, the \(Percent\_\) variables represent the contribution to county population of each racial/ethnic group, \(SEF\) is the vector of basic SEF with associated parameters \(\gamma\), \(X\) is a vector of covariates with associated parameters \(\Theta\), and \({\varepsilon }_{i}\) is the regression error. The \(SEF\) vector includes two income-based measures (county poverty rate and median per capita income), employment opportunities (unemployment rate) and education (ratio of population with no high school diploma). The covariates vector \(X\) includes measures of population age distribution (ratio of county population 65 years and older, and 17 years and younger), health insurance coverage (uninsured rate), and county air quality (amount of fine particulate matter, PM2.5). Summary statistics on all the SEF and covariates are also provided in Table 1.

In this setup (model 1), an estimated race/ethnicity coefficient measures the impact on mortality of a unit (percentage point) increase in the population share of the given racial/ethnic group, all else equal, commensurate with a unit decrease in the proportion of non-Hispanic Whites. As such, the coefficient assigns an average (linear) effect for every percentage point increase in population share regardless of the level of the share. Conversely, if the size of the given racial/ethnic group has no systematic effect on COVID-19 related deaths, the estimated coefficient should be statistically indifferent from zero. A separate regression is estimated for each of the four dates during the first year of the pandemic.

In the second model, the largest group dummy/indicator variables are used to measure county racial/ethnic composition. Accordingly, the regression equation takes the form:

$${D}_{i}={\beta }_{0}+{\beta }_{1}{Largest\_Black}_{i}+{{\beta }_{2}Largest\_AIAN}_{i}+{{\beta }_{3}Largest\_ANHPI}_{i}+ {{\beta }_{4}Largest\_Hispanic}_{i}+SEF*\gamma +X*\Theta +{\varepsilon }_{i}$$
(2)

where all previous notations apply. Again, a separate regression is run for each of the four dates. In this regression setup (model 2), an estimated race/ethnicity coefficient measures the differential death rate in the average county where the given racial/ethnic group comprises the largest share of the population, relative to one where non-Hispanic Whites make up the largest racial group (the reference category).

Because cumulative deaths is a count variable, a negative binomial model is adopted with county population as the offset term. A negative binomial model is preferred due to the overdispersion in the cumulative deaths distribution. Robust standard errors with clustering at the state level are employed, the latter to allow for correlated errors between counties in the same state (e.g. due to state-level policy responses). All regressors are entered in standardized form. To facilitate interpretation of estimated impact, incidence rate ratios (IRRs, or exponentiated coefficients) are reported. An IRR measures the effect on death rate as a multiplicative factor.

4 Results

4.1 Racial/Ethnic composition measured by share of county population

The first set of results is from the regressions that use the percentage contribution to county population of each racial/ethnic group (model 1). These are presented in Table 2. Panel A contains the results for the first two dates (May 15 and August 15, 2020), while panel B contains the results for the latter two dates (December 15, 2020 and March 15, 2021). For each date, results from four specifications are reported. The first specification enters the racial/ethnic variables only. This gives an initial (bivariate) estimate of the race/ethnicity effect in COVID-19 mortality with respect to each racial/ethnic group. Specification (2) adds the model covariates (age, uninsured rate and pollution) and the basic SEF. Specification (3) adds state fixed effects, the latter to account for any state-level heterogeneities. In the last specification, the five counties of New York City are excluded from the analysis due to extraordinary outbreak the city experienced early in the pandemic, and is used to assess the robustness of the results to excluding potential outliers (otherwise the specification includes the full set of regressors from specification 3).Footnote 2

Table 2 Covid-19 deaths and share of racial/ethnic group in county

The first half of panel A presents the results for the earliest date in this study—May 15, 2020, roughly three months into the pandemic. Because IRRs are reported, an estimate greater than one shows a positive (increasing) effect and an estimate less than one shows a negative (decreasing) effect. Seen across all specifications, the results generally indicate that early in the pandemic, counties with a larger share of Black, AIAN and Asian populations experienced higher mortality due to COVID-19. In contrast, an increase in the share of the mixed (two or more races) population was associated with decreased mortality, as did the share of NHPI (when statistically significant). Statistically, the share of Hispanics had no impact on mortality.

When only the racial/ethnic variables are included as regressors (specification 1), a 1- standard deviation (14.3 percentage point) increase in the share of the Black population was associated with a 57 percent increase in mortality. A corresponding unit-standard deviation increase in the AIAN and Asian share raised mortality by 14 percent and 68 percent, respectively. The Black coefficient more or less holds steady when covariates, basic SEF and state effects are included in specifications (2) and (3). However, interestingly, the addition of these variables has noticeably contrasting effects on the AIAN and Asian estimates—the former effectively triples (to 44 percent in the full specification, 3) while the latter decreases by three-quarters (to just 17 percent). Once the full set of potential confounders are controlled for, however, excluding potentially outlier counties does not seem to affect the estimated racial/ethnic effects much (specification 4). In sum, the results show that at the beginning of the pandemic, the estimated race/ethnicity effect in COVID-19 mortality was highest for Blacks and AIANs. Furthermore, whereas the Asian effect was also large and significant, basic covariates and SEF collectively mediate about 75 percent of the effect. Geography may have played a role in the higher mortality of Asians during the early months, as the disease took a heavy toll in areas where there are large concentrations of Asian populations (U.S. Northeast, such as New York, New Jersey and Massachusetts).Footnote 3

Roughly six months into the pandemic, the race/ethnicity effect for the various groups has already evolved some. The Black effect starts out high (67 percent) but, once accounting for all potential confounders, declines to 41 percent, an effect that is comparable to that for AIANs. The Asian effect had decreased substantially—based on the full/extended specification, a unit standard deviation (2.9 percentage point) increase in the Asian share of the county population was associated with only a 6 percent rise in mortality (p < 0.1). The share of a county’s population with Hispanic ethnicity, however, was by then associated with an elevated county death rate. Higher share of the mixed population continued to be a predictor of lower county-level COVID-19 mortality.

As of December 15, 2020 (panel B), only the Black and AIAN shares of the population were consistently associated with raised mortality. Accounting for county variables and state effects (specification 3), a unit-standard deviation increase in the Black share was, on average, associated with a 15 percent increase in county death rate, virtually the same as the corresponding estimate for AIANs. The mixed share again consistently indicated decreased mortality. Interestingly, the Asian and Hispanic shares were associated with increased mortality (p < 0.01) only in the full specification (3) and sensitivity check (4). Their estimated coefficients were also somewhat comparable (6.4–7.6 percent, respectively).

As the pandemic proceeded through various waves and enveloped the country, racial/ethnic composition, perhaps unsurprisingly, became less of a factor in county-level variation in mortality. Still, as of March 15, 2021, a year into the pandemic, Black and AIAN population shares remained statistically significant predictors, even after accounting for basic SEF and confounders. A 1-standard deviation (7.3 percentage point) increase in the share of AIANs in a county increased mortality by 10 percent. A 1-standard deviation increase in the Black population raised mortality by 6 percent. The corresponding effect for the Asian share was 3.5 percent, but the estimate was only sporadically statistically significant across specifications, while the mixed share of the population continued to be consistently predictive of decreased mortality. The Hispanic effect was statistically insignificant in the full specification.

Finally, examining the estimates on the covariates and basic SEF, the share of the older population (65 years and over) as expected was associated with higher mortality in a majority of specifications. The share of the youngest population (17 and younger) also indicated a higher risk of county mortality in the latter two dates. The ratio of the uninsured population, whenever significant, was mildly associated with lower county mortality. Poor air quality, on the other hand, consistently predicted increased mortality. Among the basic SEF, lower education was the most potent predictor of higher county COVID-19 mortality. A unit-standard deviation (6.3 percentage point) increase in the share of a county’s population with no high school diploma was associated with about a 26 percent increase in mortality during the first six months of the pandemic. The effect was about half in size, on average, for the latter two dates, but it remained statistically highly significant. The result is likely signifying the role of occupation, whereby the less educated populace is systematically overrepresented in frontline and essential jobs, which carry higher risk of infection and mortality from COVID-19.

A county’s per capita income was a positive predictor of deaths as of the first two milestones, but by December 2020 and March 2021 the estimated coefficient has switched signs and higher per capita income was associated with lower mortality. The pattern of results may simply be an artifact of the pattern in the geographical spread of the disease during the course of the year, from the relatively higher income Northeastern and Western United States early on to the rest of the country in later months. For the most part, higher county unemployment was also associated with slightly reduced mortality.

4.2 Racial/Ethnic composition proxied by a county’s largest racial/ethnic group

The second set of results, reported in Table 3, employs dummy/indicator variables on the largest racial/ethnic group to proxy county racial/ethnic composition (model 2). Again, results are reported for the four dates that are analyzed in this study and the table presents results from multiple specifications that, beginning with just the racial/ethnic variables, incrementally add basic covariates and SEF, and state effects. The last specification excludes New York City. White plurality counties form the reference category for the racial/ethnic estimates.

Table 3 Covid-19 deaths and largest racial/ethnic group in county

According to the results in panel A, three months into the COVID-19 pandemic, counties in which Blacks or AIANs comprise the largest racial/ethnic group had considerably higher mortality. The estimated race/ethnicity effect for these groups is quite high. In the full specification (3) that accounts for covariates, basic SEF and state effects, a typical county that has Blacks as the largest racial group had 2.3 times the mortality rate of a White plurality county (or, put differently, a 130 percent increase in mortality). The corresponding estimate for an AIAN county is even higher – a death rate that is 523 percent higher. Similar to the earlier results, incrementally accounting for potential confounders is found to be consequential, albeit in contrasting patterns for the two groups. The Black estimate is about a third smaller in the extended specification (3) than in the simplest (bivariate) specification (1). The AIAN estimate, however, does not even turn statistically significant until confounding variables are included in the regressions. These results underscore the importance of properly accounting for sociodemographic and related factors when assessing racial/ethnic disparities in COVID-19 mortality. Similar to the results that were based on county population shares, as of May 2020, being a county where Hispanics make up the largest racial/ethnic group was not associated with a statistically significant difference in the death rate.

Six months into the pandemic, mortality outcomes by race/ethnicity remained qualitatively similar. Counties where Blacks or AIANs are the largest group had disproportionately higher deaths than White plurality counties, while those where Asians/NHPIs make up the largest group had lower mortality. Relative to earlier in the pandemic, however, the Black estimate declined considerably (by about a third) while the AIAN estimate remained quite elevated. As of August 2020, the typical county in which AIANs make up the largest racial group experienced a death rate that was 460 percent higher than one in which Whites are the plurality. By comparison, Asian/NHPI counties on average experienced mortality rates that were on average about 30 percent lower. Again, it is worth noting that the estimated effects for Black and AIAN show quite contrasting responses to the inclusion of covariates and basic SEF. Specifically, it appears that failure to account for potential confounders leads to an upward bias in the Black estimate while it has the opposite effect on the AIAN estimate. Simple bivariate analyses therefore can be misleading when assessing the impact of race/ethnicity on COVID-19 mortality.

By December 2020, the race effect remained statistically significant for the two groups (Blacks and AIANs) but its size was much smaller (panel B). Relative to the average county where Whites make up the largest group, mortality was about 23 percent and 106 percent higher in the typical county in which Blacks and AIANs form the largest group, respectively (based on the estimates in the full specification, 3). A county in which Hispanics make up the plurality also experienced a higher death rate (about 20 percent higher). The race/ethnicity effect waned even further a year into the pandemic, as of March 15, 2021. Even so, a county in which AIANs are the plurality had deaths that were 1.75 times higher. The corresponding effect for Black plurality counties was 1.1. Notably, Asian/NHPI estimates showed elevated mortality, albeit only in the full/extended specification (3) and sensitivity check (4) and as of the last two dates (December 2020 and March 2021).

Again looking at the impact of basic covariates and SEF, counties with higher shares of older as well as younger populations experienced higher mortality rates, particularly in the latter half of the first year of the pandemic. In this model, higher poverty, all else equal, predicted higher mortality during most of the year. Conversely, higher unemployment and uninsured rate was each associated with a small but statistically significant decrease in mortality in the second half of the year (December 2020 and March 2021). Richer counties (by per capita income) experienced more deaths early on in the pandemic, an effect that is reversed later in the year. Lower education, however, again consistently and potently predicted higher mortality across all four dates.

4.3 Summary of regression results

To summarize, Fig. 4a collates the estimated race/ethnicity coefficients (and confidence intervals) across the four dates from the regressions that are based on population shares (Table 2). The coefficients are from the full/extended specification (specification 3). The figure visually conveys the two important trends that are highlighted earlier. First, for all groups, the race/ethnicity effect—as measured by the impact on county-level mortality of a group’s share of county population—waned as time elapsed during the first year of the pandemic. Second, the effect remained highly statistically significant (p < 0.01) across all four dates only for three groups (Black, AIAN and mixed). Black and AIAN were associated with elevated mortality. In contrast, a higher share of people from two or more races (mixed) in a county’s population was consistently associated with reduced mortality.

Fig. 4
figure 4

a The race/ethnicity effect during the first year of the pandemic—estimated IRR on share of county population. Incidence rate ratios (IRR) and 95% confidence intervals are shown. b The race/ethnicity effect during the first year of the pandemic – estimated IRR on largest racial/ethnic group dummy. Incidence rate ratios (IRR) and 95% confidence intervals are shown. The dates are May 15, August 15, and December 15, 2020; and March 15, 2021

Figure 4b also collects the estimated coefficients on the largest racial/ethnic group dummies for all four dates. These are again based on the full/extended specification (specification 3 in Table 3). Observing the pattern over time, again only the coefficients on Black and AIAN reveal clearer trends. Both decreased during the course of the year, although they remained statistically significant throughout. The AIAN estimate, however, was considerably larger, especially as of the first two dates (notice the different scales for the two variables) albeit with much larger confidence intervals. As for the other two groups, the Hispanic effect was statistically indifferent from zero for most of the year once accounting for basic SEF, while the Asian/NHPI coefficient oscillated between positive and negative values.

5 Conclusions

This study revisited the important issue of disparities in COVID-19 outcomes in the United States along racial and ethnic lines. Focusing on mortality during the first year of the pandemic, its aim was to: (1) provide a more complete picture of racial and ethnic disparities and their evolution; and (2) examine the extent to which the disparities could be explained by existing differences in basic socioeconomic factors. Because many of the extant studies on the topic took place relatively early during the pandemic and mostly gave a snapshot picture of disparities, a retrospective, longitudinal investigation of the kind performed in this study is valuable.

The analysis adopted an ecological regression framework and used county-level data. Regressions gauging the impact of county racial/ethnic composition on county mortality were performed as of four dates during the first 13 months of the pandemic. All major racial/ethnic classifications as identified by the U.S. Census Bureau were considered and county racial composition was flexibly defined in two ways—by each group’s contribution to county population and by indicator variables of group plurality.

Bivariate plots indicated a positive association between county COVID-19 mortality and the size of Black, Hispanic and AIAN residents. Yet, results from the multivariate ecological regressions revealed two main trends. First, when accounting for basic covariates and socioeconomic factors, the race/ethnicity effect behaved in notably different ways for the three groups. The Hispanic effect was often close to fully mediated (that is, became statistically insignificant). The Black effect often decreased—by 4–56 percent, depending on the date of analysis and specification, especially in the model that uses group plurality as a measure of racial composition—but always remained statistically significant. The AIAN effect, however, was either largely unchanged (in the specifications that used contribution to county populations) or increased (in the specifications that used group plurality). Second, for all three groups the race/ethnicity-mortality association generally waned in magnitude during the latter months of the first year of the pandemic.

On a basic level, the question is “what is the likely cause of differences in the COVID-19 mortality risk along racial/ethnic lines?” Absent genetic or biological explanations, the answer broadly rests with the legacy and ongoing effects of structural racism—the multi-faceted ways in which racial discrimination causes inferior health outcomes in minority communities [4, 23, 28, 35]. Many of the effects of structural racism on health operate indirectly through social, economic and institutional channels, such as education, employment, housing, healthcare and the justice system. But there are direct channels as well, for instance when perceptions and experiences of discrimination become health stressors themselves or worsen the impact of environmental stressors [28, 36]. Regardless, among the main effects on health is the preponderance of various comorbidities in these communities, which has made them particularly susceptible to the worst effects of the COVID-19 pandemic.

Viewed through this lens, the almost total mediation of the Hispanic effect on mortality by basic socioeconomic factors is notable. Policy-wise, it suggests that, for Hispanic populations, addressing basic inequities in education, employment and income can be effective in preventing similarly worse outcomes in a future health crisis (and also perhaps in remedying inferior health outcomes generally). For Black and AIAN populations, however, the fact that an unexplained, statistically significant race effect remains after accounting for basic socioeconomic factors means that tackling other channels and manifestations of structural racism in health is necessary.

For Black Americans, these other channels have been identified and acknowledged in the context of COVID-19. In healthcare, they include barriers to access, inferior quality care and low utilization [36, 37]. For example, higher mortality has been shown to be positively associated with lack of internet access, an important means of accessing up-to-date health and safety information, not to mention an indispensable tool for learning and working remotely [19, 38]. Medical mistrust—a result of the community’s history with healthcare discrimination, scientific racism, and everyday discrimination, perceived and actual—is associated with reluctance to access care, decreased engagement in safety and protective practices, and elevated tendency to involve in conspiracy theories and beliefs [24, 35, 39, 40]. African Americans, for instance, had significantly lower COVID-19 vaccination rates, particularly in the early phases of the vaccination campaign [41, 42]. More broadly, social, community and institutional structures have been shown to raise the mortality risk for the Black community. These include, among others, concentrated deprivation [23] and residential segregation [4].

Historically, AIAN communities have also long endured discrimination [43, 44]. And, although the evidence pertaining to their specific experiences and the mechanisms that have led to their worse outcomes during COVID-19 is more limited, some of the very same channels that are highlighted above that afflict Black communities are also likely to apply to them. For example, in surveys conducted during the pandemic, AIANs reported comparable levels of medical mistrust as Blacks [24]. The results in this paper underscore that in the discussion of the pandemic’s effect on minority communities, AIANs should also be at the forefront. They have borne a disproportionate share of the mortality burden and, given that their higher mortality risk does not appear to be mediated to a great extent by socioeconomic conditions, a targeted policy response, grounded in scientific investigation and evidence, would be necessary to prevent similarly devastating outcomes for the community in the event of a future pandemic [45].

Even so, for both Black and AIAN communities, the overriding implication of the results in this study is that, in order for a policy response to effectively address the observed disparities, it needs a multi-pronged approach and a long-term commitment. It must confront many of the broader structural inequities in the social and economic arenas, including in education, employment, housing and healthcare, as well as disparities in wealth and power that have resulted in entrenched inequalities in health outcomes. But it should also be attuned to the specific cultural and historical sensitivities that have caused these communities to access and utilize healthcare at sub-optimal rates [27, 44, 45]. As such, the policy response requires careful investigation, planning, building societal awareness and consensus and, inevitably, significant mobilization of resources. Only through such a committed and deliberate approach would we be able, as a society, to eliminate racial and ethnic health inequities in the long run.

Certain limitations and caveats apply to the analyses and results in this study. First, given that county-level data and an ecological regression framework are used, direct inferences cannot be drawn about the association between race/ethnicity and COVID-19 outcomes at the individual level. As noted in the introduction, ideally the kind of analyses conducted in this study are best performed with person-level data on COVID-19 outcomes, race/ethnicity and other requisite covariates, and with the necessary adjustments for age. But such data are not reported at scale in the U.S. Second, county death data may suffer from some measurement error due to, among others, inconsistencies in definition and reporting standards. Third, given the inherently complex nature of the issue at hand, the estimated race/ethnicity coefficients are measuring simple, reduced-form associations, not causal effects (in the statistical sense). Typically, the issue is further complicated by questions surrounding covariate choice, model selection and other potential sources of confounding. For instance, to the extent that the prevalence and impact of comorbidities is not adequately captured by basic SEF and environmental variables, inclusion of community disease proxies as additional covariates would be warranted. Such measures are available [19] and their inclusion can perhaps further mediate the race/ethnicity effect by capturing direct effects on mortality. Similarly, in future work, a more flexible regression model can be employed to investigate potential nonlinearities in the race/ethnicity effect, including interaction effects between the various racial/ethnic compositions. Despite these limitations, the comprehensive nature of the study—in its analysis of the race/ethnicity effect for all minority groups and over time, its flexible measurement of county racial/ethnic composition, and its internal consistency in measurement and estimation approaches—should make for a valuable contribution and an instructive resource for policymaking.