Racial and ethnic disparities in the observed COVID-19 case fatality rate among the U.S. population

Purpose During the initial 12 months of the pandemic, racial and ethnic disparities in COVID-19 death rates received considerable attention but it has been unclear whether disparities in death rates were due to disparities in case fatality rates (CFRs), incidence rates or both. We examined differences in observed COVID-19 CFRs between U.S. White, Black/African American, and Latinx individuals during this period. Methods Using data from the COVID Tracking Project and the Centers for Disease Control and Prevention COVID-19 Case Surveillance Public Use dataset, we calculated CFR ratios comparing Black and Latinx to White individuals, both overall and separately by age group. We also used a model of monthly COVID-19 deaths to estimate CFR ratios, adjusting for age, gender, and differences across states and time. Results Overall Black and Latinx individuals had lower CFRs than their White counterparts. However, when adjusting for age, Black and Latinx had higher CFRs than White individuals among those younger than 65. CFRs varied substantially across states and time. Conclusions Disparities in COVID-19 case fatality among U.S. Black and Latinx individuals under age 65 were evident during the first year of the pandemic. Understanding racial and ethnic differences in COVID-19 CFRs is challenging due to limitations in available data.


Introduction
In the U.S., racial and ethnic minority groups including Black/African American, and Latinx individuals have experienced higher rates of COVID-19 infection and mortality than their White counterparts, as well as higher rates of excess mortality associated with COVID-19 [1][2][3] . Assessing differences in outcomes among those infected is challenging at the national level due to limitations in available data. Studies among subpopulations have often found little or no evidence of racial and ethnic disparities in case fatal-ity. For example, using data from the Michigan Disease Surveillance System from March 8 through July 5, 2020, Zelner et al. found Black individuals had substantially higher rates of COVID-19 infection and mortality. However, their case fatality rate was only modestly higher than that of White individuals, suggesting that differences in COVID-19 mortality were driven by differences in the rate of infection [4] .
Several studies have examined racial and ethnic differences in hospitalization and/or case fatality rates among health care systems, where access to electronic health records permits adjusting for demographic and clinical covariates. Among individuals receiving care in the U.S. Department of Veterans Affairs tested for COVID-19 between February 8 and July 22, 2020, Rentsch et al. found that while Black and Hispanic patients were more likely to test positive than White patients, once infected their 30-day mortality was less and, after adjusting for age, sex, rural/urban residence and comorbidities was equivalent to that of Whites [5] . Similarly, Price-Haywood et al. studied COVID-19 positive patients within Ochsner Health in Louisiana and found that although Black patients had roughly twice the odds of hospitalization than White patients, their risk of in-hospital death was similar [6] . Finally, Ogedegbe et al. examined patients within New York University's Langone Health system tested for COVID-19 between March 1 and April 8, 2020 [7] . The odds of hospitalization among those positive were similar for Black, Hispanic, and White patients, while among those hospitalized, Black and Hispanic patients were less likely to die despite adjusting for demographics, comorbidities and insurance.
Griffith et al. discuss the potential of collider bias to distort understanding of COVID-19 disease risk and severity [8] . One potential source of collider bias is restricting analyses to a nonrepresentative subset of the population, such as those hospitalized for COVID-19. For example, if Black/African American individuals were more likely to seek and/or receive care later in the acute setting rather than earlier in primary or other non-urgent care settings-perhaps due to lack of access, biases on the part of providers or prior negative experiences with the health systemthis may partly account for the higher rates of admission observed in some studies [ 9 , 10 ]. If true, focusing on in-hospital mortality could reduce the apparent difference in case fatality between Black and White individuals due to a larger proportion of Black individuals with COVID-19 being hospitalized.
The purpose of this study was to examine racial and ethnic differences in COVID-19 case fatality rates (CFRs) at the U.S. national level during the first year of the pandemic. We used data from The Atlantic's COVID Tracking Project (CTP), whose Racial Data Tracker was widely regarded as the most complete source of information on race and ethnicity of COVID-19 cases and deaths during this period. We performed a parallel analysis using the Centers for Disease Control and Prevention (CDC) COVID-19 Case Surveillance Public Use data-an independently compiled and regularly updated individual-level data source that contains information on age. Adjusting for age is critical to get an accurate understanding of differences in COVID-19 CFR [11] . We focused on Black/African American, and Latinx individuals and their comparison to Whites, since these categories are reported most widely by a large number of states and permit approximate comparability between the two datasets.

Data
The COVID tracking project (CTP) The CTP was a volunteer-run effort started in March 2020 to compile nationwide data on the COVID-19 pandemic and ran until March 2021. During its operation, the CTP was considered an authoritative source of COVID-19 data, and their data have been used in over 10 0 0 academic articles. CTP datasets remain available on their website, and may be used under the Creative Commons CC BY 4.0 license [12] .
The COVID Racial Data Tracker, part of the CTP, collected statelevel data on cases and deaths separately by race and ethnicity for the purpose of examining the disproportionate impact of the pandemic on minority communities. The dataset contains twice weekly counts of the cumulative number of cases and deaths for each state for which information was available. Separate counts by racial group (White, Black, Latinx, Asian, American Indian/Alaskan Native, Native Hawaiian/Other Pacific Islander, Multiple, or Other) and ethnicity (Hispanic or Non-Hispanic) are also provided. Because information on the joint distribution of race and ethnicity was not provided, we utilized information on racial group only (in many cases, the counts for Latinx were the same as for Hispanic). Information on age is not available.
Our analyses of CTP data used cumulative counts as of February 28, 2021-the last full month for which data were provided. This allowed us to construct a corresponding dataset from the CDC data which are recorded on a monthly basis.

CDC COVID-19 case surveillance public use data
This dataset contains individual-level data on all cases reported to the CDC [13] . The public use dataset contains 19 data elements for each case, including the state and month in which the case was reported, whether the individual died as a result of COVID-19, and individual-level characteristics such as gender (Male or Female), age group (0-17, 18-49, 50-64, and 65 + ), race (White, Black, Asian, American Indian/Alaskan Native, Native Hawaiian/Other Pacific Islander, Multiple/Other) and ethnicity (Hispanic or Non-Hispanic). We define the three groups for our analysis as follows: White non-Hispanic, Black, and Hispanic non-Black. For comparability with the CTP dataset, we use all cases reported through February 2021.

Statistical analysis
CFR were computed by dividing the total number of deaths by the total number of cases reported as of February 28, 2021. Monthly rates for the CTP data were computed by dividing the total number of new deaths reported in that month by the total number of new cases. While some deaths reported in a month corresponded to cases reported from previous months, sensitivity analyses including cases from the 2-3 previous months yielded similar results. Monthly rates for the CDC data were computed as the proportion of cases reported in that month that resulted in death; this is therefore a leading indicator since some of those deaths occurred in subsequent months. The following states or territories were excluded from all analyses of the CDC data: nine states (AK, DE, HI, MO, NE, SD, TX, VI, WV) reported no deaths, with survival status missing for all or nearly all cases; three states reported either only deaths (WA), nearly all deaths (IL) or half as many deaths as cases (RI), also with survival status missing for a large fraction of cases; and one state (GA) and Puerto Rico were outliers in a log-log plot of deaths versus all cases (including those with missing survival status). These exclusions reduced our analyses of CDC data to 38 states.
Since the distribution of racial and ethnic groups differs across states, state-specific differences in the CFR may bias comparisons between groups. Thus, we calculated CFR ratios comparing Black and Latinx to White individuals separately by state and estimated an overall CFR ratio using a random effects model: where ˆ θ j is the natural log of the CFR ratio for state j, the u j ∼ N( 0 , τ 2 ) represent between-state differences in the log CFR ratio, and the ε j ∼ N( 0 , ˆ σ 2 j ) represent sampling variability. The betweenstate variance ( τ 2 ) was estimated using the DerSimonian-Laird method [14] , and an estimate of the overall CFR ( θ ) was obtained as a weighted sum of the ˆ θ j ; this estimate was then exponentiated to obtain a CFR ratio, together with the endpoints of the corresponding 95% Wald confidence interval. The I 2 measure of heterogeneity (representing the percentage of variability in the ˆ θ j due to between-state differences) is reported [15] .
To estimate CFR ratios adjusting for differences in the CFR across time, gender, age group and state, we used the CDC data to compute the number of cases ( c i jk ) and deaths ( d i jk ) for each demographic subgroup i (based on the Cartesian product of age group, gender and racial and ethnic group), state j, and month k , and fit the following mixed-effects Poisson model to the data for White, Black, and Latinx individuals age 18 and older [16] : where μ i jk is the expected number of deaths, X i is a vector of discrete covariates describing the i th group including age group, gender, racial and ethnic group and an interaction between age group and racial and ethnic group, and f (k ; λ) is a restricted cubic spline function of k depending on coefficients λ that captures the change over time (we used five knots based on Harrell's recommended percentiles) [17] . The term v j is a random effect capturing differences between states, and the term w jk is a random effect, nested within state, capturing remaining differences across time within state. Variance estimates were obtained using the clustered form of the robust (sandwich) variance estimator with clustering at the state level [18] . Estimated coefficients ( β) were exponentiated to obtain CFR ratios, together with the endpoints of their corresponding 95% Wald confidence intervals. Estimates of the v j were obtained using empirical Bayes means and plotted on a map.

Results
The CTP dataset includes 28.4 million cases and 512,627 deaths as of February 28, 2021, corresponding to an overall CFR of 1.8% ( Table 1 ). The CDC dataset contains 22.1 million cases over the same period, however survival status is missing for nearly half (10.4 million). Among the other 11.7 million cases, there were 263,398 deaths corresponding to an overall CFR of 2.3%. Removing those states and territories with problematic data (see Methods) left 235,635 deaths with an overall CFR of 2.0%. Both datasets show a similar pattern in the CFR over time, with the CDC curves running ahead of the CTP curves by approximately 1-2 months, as expected ( Fig. 1 ). Based on the CDC data, the CFR decreased from the start of the pandemic through September/October 2020. This decrease was followed by an increase and a subsequent decline, consistent with the estimate of f (·) from the mixed-effects model when fit to the CDC data below ( Fig. 2 ).
The CFR was substantially higher for those 65 and older (13.5%) as compared to those 50-64 (0.8%) and 18-49 (0.1%) ( Table 1 ). No   ( Fig. 1 ). In the CTP dataset, CFRs for the three racial and ethnic groups were similar from July 2020 through November 2020, though Black and Latinx individuals still had lower CFRs than Whites before and after this period. , respectively. The proportion of cases under age 65 was higher among Black (89%) and Latinx (93%) individuals than among Whites (80%). These differences were even larger for the proportion of cases under age 50 (69% for Black and 78% for Latinx individuals as compared to 58% for Whites). When stratifying by age group, the statespecific CFRs-for states reporting any deaths for a particular age by racial and ethnic group-among those aged 18-49 and 50-64 were higher, on average, among Black and Latinx individuals than among Whites ( Fig. 3 ). The CFR ratios comparing Black to White individuals were 3.9 for ages 18-49, 2.1 for ages 50-64, and 0.9 for ages 65 and older; corresponding CFR ratios comparing Latinx to White individuals were 9.8, 3.0, and 0.9, respectively ( Table 2 , Panel A).
Results from the mixed-effects model fit to deaths reported in the CDC data are shown in Table 2  We observed considerable variation between states for both datasets, both in the CFRs and in the CFR ratios comparing racial and ethnic groups ( Fig. 3 ). The value of I 2 was over 95% in all cases.
The estimated variance of the v j was 0.31, corresponding to an increase in the CFR of 75% for a state one standard deviation above the mean. Similarly, the estimated variance of the w jk was 0.26, corresponding to a 67% increase in the CFR for a month one standard deviation above the mean; this variability within state over time is above and beyond that already accounted for by the estimated overall time trend ˆ f (·) ( Fig. 2 ). Fig. 4 plots estimates of the v j for all 38 states included in the model. Table 2 Case fatality rate (CFR) ratios comparing minority racial and ethnic groups to Whites, estimated from statespecific log CFR ratios using random-effects models (CTP and CDC) and based on a mixed-effects Poisson regression model fit to individual-level data ( Fig. 3. State-specific CFRs, separately by age group and racial and ethnic group (CDC data). CFRs equal to 0 due to missing data are excluded.

Discussion
While the overall CFR was lower among Black and Latinx individuals than among their White counterparts, their CFRs within age category were higher among those under age 65. This reversal in the direction of the association when stratifying by subgroup is a partial example of what is known as Simpson's Paradox [20] , and reinforces the point made by  CFRs that are not age-specific "may hide more than they reveal." This observation is in contrast to the CDC's COVID-19 mortality data from the same period which reveal higher COVID-19 mortality rates among Black and Latinx individuals than among Whites both overall and separately by age group [21] . We also found evidence of an interaction between age and racial and ethnic group such that Black and Latinx disparities in COVID-19 CFR increased at each successively younger age group. A similar interaction was reported by Bassett et al. [22] . for COVID-19 mortality rates from February 1 through May 20, 2020. This finding deserves further study as it could have substantial implications for health policy-especially for a disease where public attention concerning mortality, at least during the early stages of the pandemic, was primarily focused among the oldest ages.
Our results reflect differences in the observed CFR; that is, the number of deaths divided by the number of reported cases. For COVID-19, this differs substantially from the true, underlying CFR since many infected individuals, especially those who were asymptomatic, were never tested. Moreover, some individuals who tested positive may not appear in the datasets used here. Thus, we expect that much of the change over time, as well as of the substantial variation across states, reflects variation in the rates of testing and reporting. Several studies have found higher testing rates among Black and Hispanic individuals than among Whites, including Rentsch et al. (though the VA is a special population with lower healthcare barriers) and a study of roughly 50 million patients in the Epic health record system [ 5 , 23-25 ]. While there is no way to know how many of these individuals appear in the CTP and CDC datasets analyzed here, higher testing rates among Black and Hispanic individuals would be expected ceteris paribus to reduce their CFR relative to Whites. Furthermore, by adjusting for differences across time and between states, our results should be robust to confounding due to differences in the racial mix of cases over time or in the distribution of minorities across states.
The CDC dataset contains only 78% of the cases in the CTP dataset, and of these, information on survival is missing for 47% of cases. In addition, among the cases with survival status for the 38 states included in the analysis, race and ethnicity was missing for 41%. Douglas et al. reviewed reporting of race and ethnicity of COVID-19 cases and deaths from April 12, 2020 through November 9, 2020 and concluded that while there were improvements during this period, significant problems with data quality persisted [26] . Such problems might be expected to introduce both bias and additional variability into our results; indeed, potential variability in reporting of race and ethnicity across states is one reason we conducted a state-level analysis. Despite these limitations, our results are consistent with prior literature showing improvements in COVID-19 survival over the first 6-8 months of the pandemic and a lower risk of death for women relative to men [27][28][29] .
Finally, our study was also intended to highlight the limitations in data available on race and ethnicity of COVID-19 cases and deaths at the national level. Specifically, using information on the joint distribution of racial and ethnic group and age (available in the CDC dataset only) reveals an entirely different understanding than when not using it, and acquiring such data from representative samples of the national population is sorely needed. In addition, Chowkwanyun and Reed argue convincingly that while identifying racial and ethnic disparities in COVID-19 is important, a lack of context as provided by data on SES and place-based risk, together with appropriate consideration of the role of stress due in part to racial discrimination, "can perpetuate harmful myths and misunderstandings that actually undermine the goal of eliminating health inequities" [30] .