County-level factors influence the trajectory of Covid-19 incidence

With new cases of Covid-19 surging in the United States, we need to better understand how the spread of novel coronavirus varies across all segments of the population. We use hierarchical exponential growth curve modeling techniques to examine whether community social and economic characteristics uniquely influence the incidence of Covid-19 cases in the urban built environment. We show that, as of May 3, 2020, confirmed coronavirus infections are concentrated along demographic and socioeconomic lines in New York City and surrounding areas, the epicenter of the Covid-19 pandemic in the United States. Furthermore, we see evidence that, after the onset of the pandemic, timely enactment of physical distancing measures such as school closures is imperative in order to limit the extent of the coronavirus spread in the population. Public health authorities must impose nonpharmaceutical measures early on in the pandemic and consider community-level factors that associate with a greater risk of viral transmission.


Introduction
The presence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was initially detected in Wuhan, China in December 2019 (1). Since then, the outbreak of coronavirus disease 2019 , the illness caused by SARS-CoV-2, has spread worldwide (2). As of May 3, 2020, the United States finds itself atop the list of countries most heavily impacted by the pandemic, both in terms of total diagnoses and fatalities (3). Within the United States, the residents of the State of New York are disproportionately affected by SARS-CoV-2 virus, with New York City accounting for the vast majority of cases and deaths in the country (4). Following the first confirmed New York case in March of 2020, the incidence of  in and around New York City continues its alarming exponential rise (4). The upward trend, however, is not uniformly patterned, as evidenced by substantial spatial heterogeneity of confirmed cases (5). Higher population density increases the potential for virus transmission and may, in part, account for geographic differences in numbers of COVID-19 cases (6).
In addition to the readily predictable effects of population density, communities exhibit clear stratification along dimensions of demographic and socioeconomic characteristics that associate with vulnerability such as poverty that influences infectious disease patterning (6).
Indeed, social epidemiologists and medical sociologists argue that individuals and groups possessing fewer social, educational, and economic resources may prove less able to avoid areas where infectious disease runs rampant (7). Here, we propose that variation in existing socioeconomic indicators of counties. For concentrated disadvantage we followed Sampson et al. and take the first dimension of a principle components factor analysis on all the county-level characteristics described, with the exception of population density and median age (18).
We test the impact of distinct sociodemographic environments on the relative risk of Covid-19 infection using hierarchical exponential growth curve modeling techniques that account for correlation in the number of cases by county. All models use maximum likelihood estimation with adaptive quadrature that adjust for problems that would otherwise downwardly bias estimated standard errors (19). Preliminary analyses indicate that time is most appropriately captured by a polynomial time function due to non-linearity, and thus we present only results from these models. The estimated coefficients in our models represent the estimated effect of concentrated disadvantage on change in the infection incidence rate over time. In the models, time points are nested within counties. The fully specified level-1 model fits the number of cases per day as a function of time across observations for each county, while simultaneously accounting for variation in the enactment of the physical distancing measures and a spatial lag.
The fully specified level-2 model fits the level-1 intercept and coefficients across all counties as a function of the county-level characteristics. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Results
The copyright holder for this preprint this version posted May 9, 2020. .

age of the population varies significantly across counties, from a high of 42 years old in Nassau
County to a low of 34 years old in Bronx County. Population density varies substantially across counties, with Kings County the most concentrated (37,253). Relative to other counties, Kings County is one of the most disadvantaged, with the highest percentage of the population below poverty (17.1%), highest percentage of female-headed households (30%), and highest unemployment rate (6.3%). Bronx County has the highest percentage of Black residents (36.4%) and the highest percentage of the population receiving public assistance (8%).
The results from our Poisson hierarchical polynomial growth regression models are shown in Table 2. For ease of interpretation, estimated fixed and random effects are displayed in terms of incidence rate ratios (IRR). Model 1 estimates the average incidence rate of Covid-19 cases across counties over time. Model 2 assesses IRR, with the spatial lag and physical distancing measures included at level-1. Model 3 adds the county-level characteristics of median age and concentrated disadvantage at level-2. We control for population density in all models.  Table 2). Model 2 further indicates that the enactment of physical distancing measures led to a 79% reduction in the daily number of reported cases (0.21, p<0.001).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 9, 2020. . https://doi.org/10.1101/2020.05.05.20092254 doi: medRxiv preprint Model 3 adds the county-level factors and indicates that as median age increases across counties, the daily number of cases increases by approximately 31% (p<0.05). After accounting for all the county-level characteristics, higher levels of concentrated disadvantage associate with a significant, nearly 3-fold, increase in the incidence rate of Covid-19 cases (2.74, p<0.01). As shown in the lower portion of Table 2, the random effects estimates indicate that the patterns of growth do indeed vary significantly across counties, both in terms of concentrated disadvantage Here, we add to this literature, and show that concentrated disadvantage is a potent ingredient in the observed pattern of detected Covid-19 infections across New York counties. We see evidence that the influence of communities on the spread of infectious disease reaches far . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 9, 2020. . https://doi.org/10.1101/2020.05.05.20092254 doi: medRxiv preprint beyond the physical characteristics of the urban built environment (22). Furthermore, if community sociodemographics exhibit elevations in median age and/or have higher prevalence of disadvantage (e.g., poverty), such factors will likely have far-reaching consequences for SARS-CoV-2 infection incidence, and, Covid-19-related case fatality rates (23). Although it is beyond the scope of our analysis to explain precisely why this variation exists, Covid-19 has exposed major social and economic disparities that already exist in the United States. Indeed, individual's risk of being infected is higher if they reside in counties of New York with more socioeconomic disadvantage. This may be important because, at least at the county level, living in areas characterized by disadvantage increases the risk of infection.
To the best of the authors' knowledge, this is the first study that employs hierarchical exponential growth curve modeling techniques to examine whether county-level demographic and socioeconomic characteristics influence the incidence of Covid-19 cases in New York City and surrounding areas. This study, however, is not without limitations. As this devastating pandemic continues to unfold in this epicenter of the outbreak in the United States, we assume that the number of Covid-19 confirmed cases will likely shift in response to improved testing and reporting practices, and thus, over time, enable the emergence of more accurate and useful data.
Still, we use the latest data available to us (8). Similarly, our sample is drawn from New York City and surrounding areas, reducing the generalizability of our findings to a particular portion of people in the New York region, through May 3, 2020. Furthermore, in order to reduce virus transmission and keep case-fatality rates as low as possible, the New York State government instituted a number of mitigation efforts aside from school closures, including social distancing, school and workplace closures, cancellation of large-scale public gatherings, and stay-at-home orders (24). In the present analysis, we use school closure dates to account for variation in time . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 9, 2020. . of enactment of physical distancing measures across counties. Thus, our inclusion of one variable is by no means comprehensive. However, if school closures were the only measure enacted, intervention effectiveness would be substantially reduced and variation in Covid-19 incidence across counties would be magnified (20,21). Similarly, although we are unable to confirm whether individuals who tested positive for SARS-CoV-2 reside in the area where testing was performed, local government agencies direct people to contact their local health department for Covid-19-related concerns (24), and since stay-at-home measures were enacted at the same time as school closures we are confident in the results reported here.
Taken together, the dire circumstances in which we find ourselves demand better understanding of how distinctive geographic spaces influence infection incidence in order to potentially isolate the community-level factors that associate with higher likelihood of Covid-19 diagnosis and disease progression. Our analysis serves as a substantial jumping off point for future researchers interested in disentangling which community factors relate to SARS-CoV-2 transmission.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 9, 2020. . https://doi.org/10.1101/2020.05.05.20092254 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 9, 2020. . https://doi.org/10.1101/2020.05.05.20092254 doi: medRxiv preprint