Annual PM2.5 and cardiovascular mortality rate data: Trends modified by county socioeconomic status in 2,132 US counties

This article contains data on county-level socioeconomic status for 2132 US counties and each county's average annual cardiovascular mortality rate (CMR) and fine particulate matter (PM2.5) concentration for 21 years (1990–2010). County CMR, PM2.5, and socioeconomic data were obtained from the US National Center for Health Statistics, US Environmental Protection Agency's Community Multiscale Air Quality modeling system, and the US Census, respectively. Annual socioeconomic indices were created using seven county-level measures from the 1990, 2000, and 2010 US Census using factor analysis. Quintiles of this index were used to generate categories of county socioeconomic status. This national data set contains data for annual PM2.5 and CMR changes over a time-period when there was a significant reduction in US air pollutants (following the enactment of the 1970 Clean Air Act). These data are associated with the article “The contribution of improved air quality to reduced cardiovascular mortality: Declines in socioeconomic differences over time” [1]. Data are stored in a comma separated value format and can be downloaded from the USEPA ScienceHub data repository (https://doi.org/10.23719/1506014).


Data
This dataset contains 1 data dictionary and 6 data files that describe annual county cardiovascular mortality rates (CMR) and fine particulate matter (PM 2.5 ) concentrations for 21 years (1990e2010) and county-level factors including socioeconomic factors, socioeconomic status (SES) index, density of healthcare establishments, and population characteristics. Data files are available online through the EPA ScienceHub [https://doi.org/10.23719/1506014]: County_annual_PM25_CMR.csv has the annual average PM 2.5 (mg/m 3 ) and cardiovascular mortality rate (deaths/100,000 person-years) for each county for each year between 1990 and 2010. County_RAW_variables.csv includes data on county factors including socioeconomic, population, and healthcare variables for each county. Socioeconomic variables for the years 1990, 2000, and 2010 included civilian unemployment rates, median household (HH) income, percent female HHs with no spouse, percent of owner occupied housing units, percent of individuals aged 25 and older that graduated high school or had greater education, and percent of households and families below poverty. Population variable included county population size from the 2000 Census. Healthcare facility variables included the density of facilities for the years 1999 and 2005 which were identified from the US Census's County Business Patterns (CBP) files (cbp99co.txt and cbp05co.txt) based on Specifications Table   Subject Public Health and Health Policy Specific subject area Air pollution, cardiovascular mortality, social epidemiology Type of data Differences between SES quintiles with respect to the three 1990 census variables that contributed the most to the index (percent high school graduate or greater education (age 25þ), median household income, and percent households below poverty) are shown in Figure S2 of the related research article. Table S1 in that article describes the SES factor loadings that created the 1990 SES index. County SES classifications using the 1990 Census data for the 2132 counties included in the dataset are depicted in Figure S3 of the related research article. Trends in the cardiovascular mortality rate and PM 2.5 between 1990 and 2010 nationally and by SES quintile are depicted in Figure 1 of the related research article [1].

County-level socioeconomic, population, and healthcare related characteristics
County socioeconomic variables acquired from the 1990, 2000, and 2010 US Census included seven socioeconomic variables: percent households below poverty, median household income, percent high school graduate or greater education (percent of individuals aged 25 and older that graduated high school or had greater education), civilian unemployment rate, percent female households with no spouse, percent vacant housing units, and percent owner occupied housing units [3e5]. These variables were used to calculate county SES indices by performing maximum-likelihood factor analysis with a standard rotation (varimax). Loadings from the first factor (~40.5% of the total variance) were used to calculate SES scores. The SES index loadings for the baseline year (1990) are shown in Table S1 of the related research article [1]. Quintiles of scores were used to define SES strata: Q1, highest SES; Q2, high SES; Q3, medium SES; Q4, low SES; and Q5, lowest SES.
Data on county population was obtained from the 2000 Census. Data on county healthcare facilities were determined using information from the US Census's County Business Patterns. The density of health care facilities (facilities per 1000 individuals) in each county was estimated using NAICS codes to identify healthcare establishments. Potential healthcare related codes were identified by using the same codes used in creating the healthcare component of the EPA's Environmental Quality Index [2].

Mortality data
Annual age-adjusted cardiovascular mortality rates (CMR) were calculated from individual-level records from the nationally inclusive dataset, recorded by the U.S. National Center for Health Statistics (NCHS), for each year between 1990 and 2010, standardized to the National Cancer Institute's 2000 standard population [6], and expressed as the number of deaths/100,000 person-years. Cardiovascular mortality was identified using cause of death codes (International Classification of Diseases, Ninth Revision, 390e434 and 436e448). We used data for 2132 counties across the contiguous U.S. with a population of at least 20,000, to comply with NCHS data privacy standards.

Air quality data
Average annual fine particulate matter (PM 2.5 ) concentrations were estimated using the Community Multiscale Air Quality (CMAQ) modeling system [7] that predicts concentration of airborne gases and particles using a deterministic model of atmospheric chemistry and transport from anthropogenic and non-anthropogenic sources. Air quality was simulated over the whole period on a 36 km-grid using internally consistent historic emissions with lateral boundary conditions derived from the hemispheric simulations. Annual concentrations of PM 2.5 were calculated using the daily averages of hourly concentrations. CMAQ-estimated PM 2.5 concentrations were first interpolated to census block population centroids and then aggregated into county population-weighted concentrations based on Census 2000 population data.

Disclaimer
The views expressed in this manuscript are those of the individual authors and do not necessarily reflect the views and policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.