Assessing the association between area deprivation index on COVID-19 prevalence: a contrast between rural and urban U.S. jurisdictions

Background The COVID-19 pandemic has impacted communities differentially, with poorer and minority populations being more adversely affected. Prior rural health research suggests such disparities may be exacerbated during the pandemic and in remote parts of the U.S. Objectives To understand the spread and impact of COVID-19 across the U.S., county level data for confirmed cases of COVID-19 were examined by Area Deprivation Index (ADI) and Metropolitan vs. Nonmetropolitan designations from the National Center for Health Statistics (NCHS). These designations were the basis for making comparisons between Urban and Rural jurisdictions. Method Kendall's Tau-B was used to compare effect sizes between jurisdictions on select ADI composites and well researched social determinants of health (SDH). Spearman coefficients and stratified Poisson modeling was used to explore the association between ADI and COVID-19 prevalence in the context of county designation. Results Results show that the relationship between area deprivation and COVID-19 prevalence was positive and higher for rural counties, when compared to urban ones. Family income, property value and educational attainment were among the ADI component measures most correlated with prevalence, but this too differed between county type. Conclusions Though most Americans live in Metropolitan Areas, rural communities were found to be associated with a stronger relationship between deprivation and COVID-19 prevalence. Models predicting COVID-19 prevalence by ADI and county type reinforced this observation and may inform health policy decisions.


Introduction
The 2019-2021 coronavirus pandemic has underscored many of public health disparities in the United States. Minority communities and people living in poverty account for disproportionately more COVID-19 cases and fatalities [1,2]. The same communities may be inherently more vulnerable, due to underlying health conditions, poverty and lack of access to care [3][4][5]. Comparatively, less attention has been given to the spread of COVID-19 in rural communities, even though recent evidence suggests a rapid spread in rural areas [6].
Greater prevalence of chronic disease and remoteness of rural areas are cause for concern, even though they make up only a fraction of total COVID-19 cases in the U.S. [7,8]. Rural communities are more vulnerable to economic hardship, have worse healthcare access, health literacy and outcomes [9][10][11][12]. By extension, we may expect worse outcomes for more impoverished rural jurisdictions during the pandemic [11,12].
Past health disparities research has established a relationship between poor health outcomes and low socioeconomic status, often taken as a ranked measure of geographic area deprivation index, or ADI [13,14]. Few researchers have made use of ADI when evaluating COVID-19 prevalence across U.S. geographies, but early evidence seems to confirm a general positive relationship between deprivation and prevalence exists [15,16]. The ADI also permits inspection of its individual components to better understand nuanced or subtle population effects of social determinants of health (SDH), at the county level [17]. Other models, such as the social vulnerability index (SVI), may not be as readily amenable to the county level geography [18]. Proper disease management and policy efforts must understand these contrasts and public health needs to properly combat the spread of COVID-19 [19].
ADI is an important tool for this discovery as it is publicly available and identifies which communities are at risk for poor health outcomes (e.g. mortality, hospitalization, emergency care, etc.). Effective policy could be validated and informed by such an index. ADI is used in this analysis as a predictor for COVID-19 prevalence that permits contrast between diverse communities. Our hypothesis is that ADI and its components are predictive of COVID-19 prevalence and that this correspondence is differentiated at least partially by county type.

Data sources
Current estimates for COVID-19 cases were obtained from the JHU CSSE Coronavirus tracking project [20,21]. This data repository contains county level time series data for confirmed cases reported to the U.S. Centers for Disease Control and Prevention (CDC) dating back to January 22 nd , 2020 and commonly used by population health researchers for modeling COVID-19 spread [22][23][24]. We selected cumulative COVID-19 case estimates as of August 20 th , 2020 for analysis. This was the latest data we had retrieved before a resurgence in cases thru Winter 2021, which may represent the start of a distinct, seasonal phase in the ongoing pandemic. Population by race/ethnicity, and gender per county were based on 2019 estimates from the 2010 U.S. Census [25,26]. Case prevalence was calculated as a count of confirmed cases per 100k persons in each county. County data were linked across sources using their unique Federal Information Processing System (FIPS) geocodes.

Ethical issues
COVID-19 prevalence and population characteristics are made publicly available by the US CDC and US Census Bureau respectively. No personally identifiable or protected health information was included as part of this research and no attempt was made to associate cases to either identifying information or protected health records. This analysis was therefore exempt from institutional review and approval.

Area Deprivation Index
We constructed county level ADI by weighting 17 widely used measures in population health literature for poverty, income, and education [13,27,28]. The 5-year estimates of 2018 American Community Survey (ACS) data were used for calculating ADI and each of the composite measures, using an approach as described by Singh et al. [13,26,28]. Higher raw ADI corresponds to more deprivation and therefore lower socioeconomic status (SES). A high ADI national percentile rank corresponds to high raw ADI and more deprivation. We made use of national rank ADI for modeling of COVID-19 prevalence.

Urban vs. rural designation
We classified 3,142 counties across the U.S. as "urban" or "rural" and stratified the relationship between prevalence and ADI accordingly. It was necessary to rely on a classification scheme developed for the county level geography. The National Center for Health Statistics (NCHS) developed such a mechanism for classifying rural and urbanized areas in 2001 for the accurate assessment and measurement of health differences between residents [29][30]. The 2013 NCHS Urbanization scheme defines Metropolitan Statistical Areas (MSA) as at least 50,000 residents with an urban nucleus of at least 1,000 persons per square mile. Urban counties possess an urbanized core or are surrounding counties with at least 500 people per square mile included in the MSA. Nonmetropolitan counties (hereafter, "rural") are micropolitan or noncore geographies of fewer than 50,000 residents.

Statistical tests
Descriptive statistics for population, population density, ADI, ADI components, Census variables and COVID-19 case-mortality figures were tabulated across county type. Effect sizes for each comparison were estimated using Kendall's tau and considered statistically significant at a p < 0.001 level. Additional county-level social determinants of health (SDH) variables included percent male, percent non-Caucasian minority and percent aged 65 years or older. A subset of SDH variables are presented in this work to reduce redundancy of ADI measures, while illustrating resident demographics and domains of the ADI.
Spearman rank correlation coefficients were calculated for ADI national rank, ADI components and prevalence estimates and for each county type. These correlation statistics were summarized as correlation matrices for inspection. All underlying rho coefficients and p-values were calculated, but only a subset presented as part of the results.
Finally, five models using logarithmic link functions were fitted to explore an effect of county type (i.e., urban vs. rural) on the relationship between ADI and COVID-19 prevalence. A base comparison model is defined as a mapping of ADI national rank to cumulative COVID-19 prevalence. Pairs of test models reflect stratification based on county type. Model 2 and 3 fitted national rank ADI to COVID-19 Prevalence for Urban and Rural jurisdictions respectively. Models 4 and 5 fitted constituent variables of ADI to COVID-19 Prevalence with respect to county type. For each model, we compared relative residual deviance and McFadden R 2 as an OLS analogue for deviance explained [31,32]. This permitted comparison of either constituent model (models 4 and 5) with their corresponding ADI base model (models 2 and 3). Inspection of model effect sizes allowed us to interpret which features of ADI contributed most to differential performance by county type. This was summarized as a variable importance plot ranking the absolute t-values obtained from inputs of models 4 and 5. Table 1 reflects common SDH, including household income (in USD), percent of families below poverty, percent of households without vehicles and percent of households with more than one person per bedroom. Rural counties were found to have significantly worse outcomes, including median family income (mean = $59,097) and percent of residents under 150% of poverty (mean = 28%). They were also characteristically more male (mean = 50.4%), had fewer non-Caucasian residents (mean = 15.4%) and more residents aged 65 or older (mean = 17.1%). No significant difference was found in percent of households with more than one occupant per bedroom (mean = 2.5%), percent unemployed (mean = 5.8%) or percent single parent households (mean = 34.1%). Rural counties had significantly fewer COVID-19 average cases, cases per capita and deaths as of August 20, 2020.

Correlation between prevalence and ADI by county type
COVID-19 prevalence was higher in urban counties, but less correlated to national rank ADI when compared to rural (ρ = 0.27; 0.45, respectively) ( Figure 1). Prevalence for urban counties was also less strongly correlated with family income (ρ = −0.18; −0.33), percent of households under 150% of poverty (ρ = 0.31; 0.42), and percent of residents with a white-collar job (ρ = −0.08; −0.29). In urban counties, prevalence was more correlated with % of residents with less than 9 th grade education (ρ = 0.49; 0.39, respectively) and percent of households with more than one person per bedroom (ρ = 0.39; 0.22, respectively). The aforementioned observations were each significant at the p < 0.001 level.

Figure 1.
Correlation matrices for COVID-19 prevalence and ADI components across county type.

Modeling prevalence by ADI and county type
The base model for overall county level prevalence as a function of ADI (Model 1) yielded a large total residual deviance and only around 16% of deviance explained ( Table 2). The parameter estimate for ADI was significant, but a unit increase in ADI rank was only associated with 1.2% change in prevalence (Table 3). ADI within urban jurisdictions (Model 2) was less predictive of prevalence (McFadden R 2 = 0.132) but had better set of deviance residuals than did the rural comparison, Model 3. The estimated change in prevalence from a unit increase in ADI was around 0.9% for urban counties, and more than double (2%) for rural. Models 4 and 5 obtained roughly equal McFadden R 2 values for urban and rural jurisdictions (0.371 and 0.386). Compared to their simpler counterparts (Models 2 and 3), both model 4 and 5 had substantial improvements in deviance explained but median deviance residual remained unchanged for rural counties.   Figure 2 shows the ADI component with strongest effect for models 4 and 5 was the percent of people with at least a high school education (t = −177.047; −287.523, respectively). This was statistically significant at p < 0.001, inversely related to COVID-19 prevalence and stronger for rural jurisdictions. The least influential component was also the same, the percent of people unemployed, which was higher and positive for urban jurisdictions (t = 13.331; 3.640). Much of the variable ranking otherwise differed considerably between jurisdictions, with Median House Value, and Median Rent ranking 2 nd and 3 rd for urban, but only rising to 9 th and 10 th in rural communities.

Discussion
Discrepancies between urban and rural counties were evident both in individual SDH measures and their combined effect on prevalence estimates. The differences observed in rank correlation and variable importance appear characteristic of the communities they reflect. Rent and home values tend to be lower in rural jurisdictions, for example, and inversely related to COVID-19 prevalence. In urban areas, fewer residents own private vehicles, thus number of vehicles was less predictive than in rural communities. Generally, stronger associations between ADI components and prevalence were found among rural jurisdictions. Rural models M3 and M5 demonstrated higher deviance explained and M3 had more than twice the change in prevalence per unit ADI compared to urban jurisdiction model M2. Model performance metrics illustrate national rank ADI was more predictive of COVID-19 prevalence in rural communities than urban ones. Together, these results suggest (1) the overall prevalence of COVID-19 is more varied among rural jurisdictions, and (2) the effect of socioeconomic disparity on COVID-19 prevalence is worse for rural jurisdictions over urban ones.
ADI and component measures were instrumental in assessing this contrast between jurisdictions and can aid lawmakers in identifying regions most in need. The health policy implications are (1) that geosocial factors should be considered when identifying communities most at risk of an outbreak, (2) disparate prevalence, morbidity and amenability to interventions can be evaluated for geographic regions and (3) interventions should consider these needs and disparities to adequately control disease spread within a geography. For example, mobile vaccination and testing centers could alleviate limited health access due to low vehicle ownership or poverty.
These results require several qualifications. First, they are time-dependent and reflect an evolving pandemic. Our analysis was limited to the end of August 2020 to inspect the initial phases of disease spread as it relates to geographic characteristics and SDH. Other researchers have found the same general pattern for high COVID-19 incidence in rural communities during the early stages of the pandemic [33]. Temporal modeling with the implementation of various health policy measures and locations may be required to further our understanding of these associations between deprivation and spread but was out of scope for this work.
Second, the granularity of both the classification scheme and level of geography are not ideal for detecting small or nuanced effects. We expect much greater heterogeneity in ADI composites for densely populated regions. Census tract or block group level data may have been more appropriate, but this information for testing results is not currently available nationwide [34]. Third, ADI only captures a handful of SDH that, while widely used, do not account for racial disparities in COVID-19 spread. Race, age and gender should be considered in future modeling efforts for coronavirus prevalence. Asymptomatic spread of the disease likely also undermines our understanding of differential prevalence by county type.
Finally, mortality is a parallel outcome that has substantial weight in policy decisions. Most efforts to understand level of COVID-19 mortality risk are conducted at the patient-level however. Mortality can be evaluated at by geography but is time-lagged, limiting its usefulness in prevention and planning. Another approach might be to use estimates of healthcare access and comorbidities at geographic scales to gauge localized risk of COVID-19 mortality. Such analyses were beyond the scope of this work but remain of interest wherever these data are available.
Additional work is also required to tie in known risk factors and SDH to adequately address longstanding disparities in health outcomes and predict geographies that are most impacted by a pandemic [35].
Rural communities have notably different challenges to access care than those in more densely populated areas [36,37]. During a pandemic, lack of reliable internet access and transportation may compound the effect of poverty on telehealth services or ambulatory care. Efforts targeting rural communities must navigate these challenges while reducing the disparate burden of poverty [38]. As more data become available on coronavirus cases, we expect finer resolution of geographic data, making it necessary to reevaluate and confirm these findings in smaller community levels.

Conclusions
Though the majority of COVID-19 cases and deaths occur in metropolitan areas, rural communities continue to struggle with highly disparate health outcomes and in some jurisdictions, higher per capita COVID-19 prevalence. The reasons for this geographic difference in prevalence are many but an abundance of research implicates rural health disparity, here measured as an index of deprivation, as exacerbating the pandemic. The underlying economic and practical burdens these communities face have influenced access to care and effective policy to combat the virus likely will need to address these concerns.