Reliability of the American Community Survey for unintentional drowning and submersion injury surveillance: a comprehensive assessment of 10 socioeconomic indicators derived from the 2006–2013 annual and multi-year data cycles

Background Our objective was to evaluate the reliability and predictability of ten socioeconomic indicators obtained from the 2006–2013 annual and multi-year ACS data cycles for unintentional drowning and submersion injury surveillance. Methods Each indicator was evaluated using its margin of error and coefficient of variation. For the multi-year data cycles we calculated the frequency that estimates for the same geographic areas from consecutive surveys were statistically significantly different. Relative risk estimates of drowning-related deaths were constructed using the National Center for Health Statistics compressed mortality file. All analyses were derived using census counties. Results Five of the ten socioeconomic indicators derived from the annual and multi-year data cycles produced high reliability CV estimates for at least 85 % of all US counties. On average, differences in socioeconomic characteristics for the same geographic areas for consecutive 3- and 5-year data cycles were unlikely to be caused by sampling error in only 17 % (5–89 %) and 21 % (5–93 %) of all counties. No indicator produced statistically significant relative risk estimates across all data cycles and survey years. Conclusions The reliability of the annual and multi-year county-level ACS data cycles varies by census indicator. More than 75 % of the differences in estimates between consecutive multi-year surveys are likely to have occurred as a result of sampling error, suggesting that researchers should be judicious when interpreting overlapping survey data as reflective of real changes in socioeconomic conditions. Although no indicator predicted disparities in drowning-related injury mortality across all data cycles and years, further studies are needed to determine if these associations remain consistent at different geographic scales and for injury morbidity.


Background
Drowning-related deaths continue to be a major public health problem in the United States (US). Between 2009 and 2013 it was the third most common type of injury-related death among children under five and the second most common among adolescents ages five to fourteen, accounting for 3488 child and adolescent deaths (Centers for Disease Control. Causes of Injury Death: Centers for Disease Control and Prevention & Control 2013). Over this same period it remained a top ten cause of all injury-related deaths among all persons under the age of 55.
Factors such as lower socioeconomic status, age, gender, ethnicity, alcohol, and living in a rural environment all increase risk of hospitalization or death from drowning (Brenner et al. 2001;Browne et al. 2003;Burrows et al. 2013;Giashuddin et al. 2009;Gilchrist & Parker 2014;Kim et al. 2007;Peek-Asa et al. 2004;Saluja et al. 2006; Thompson & Rivara 2000;Wright et al. 2013). Among rural populations, risks are attributable to greater access to open water sources, irrigation ditches, and watering troughs for livestock. Lack of adequate fencing and pool enclosures often characterize urban-related incidence rates, whereas factors such as low income are pervasive. These factors are routinely monitored and are part of numerous prevention and intervention initiatives (Cortes et al. 2006;Depczynski et al. 2009;Girasek 2011). Recent systematic reviews show that these factors remain indicative of variation in risk both in the US and abroad (Leavy et al. 2015a;Leavy et al. 2015b;Wallis et al. 2015).
However, US investigators tasked with monitoring drowning-related injuries now face a particular difficulty measuring its burden. This is because the 2010 replacement of the long form decennial census with the American Community Survey (ACS) fundamentally changed the structure of the geographic and socioeconomic data for the population. There are two primary differences between the ACS and the decennial census. First, the ACS is a rolling survey. Each month, approximately 250,000 households are interviewed using multiple survey mediums (US Census Bureau 2008) The survey responses are then aggregated into 1-year periods for annual release, with larger pools obtained prior to the release of the multi-year cycles. The structure and sample size of the ACS is in contrast to the decennial census, which obtained point-in-time responses from 1 out of 6 households every 10 years. Second, the ACS provides improved access to current socioeconomic information on the US population through the release of an annual, 3-year, and 5-year data cycles. The multiyear cycles are considered to be more reliable as they contain estimates from three and five times as many responses and represent a larger number of areas (US Census Bureau 2009) This is particularly relevant for subgroup questions that impact a smaller proportion of the population (e.g. female head of household with no husband and with children at home under 18 years of age) compared to the population profile questions (e.g. % male).
Accordingly, there are now two primary challenges attributed to the shift from the decennial census to the ACS. The first challenge pertains to the reliability of the ACS. For example, one study found that the margins of error (MOE) for certain Census Tracts increased over 80 % from the 2000 decennial census (Alexander 2002). Another study found that the MOE for the 5year cycle was large enough to cause census tracts to move from highest into the lowest income quartile (Spielman et al. 2014). A more profound problem was that the variability predominated within marginalized areas. The second challenge pertains to coverage. For example, the annual data cycle is only released for geographic areas of at least 65,000 persons whereas the 3-year cycle is only available for areas of at least 20,000 persons. This effectively eliminates approximately 75 % of what are predominantly rural and non-metropolitan counties from annual surveillance, while providing current information for approximately 58 % of the population every 3 years.
One proposed strategy to circumvent limited access to current socioeconomic information from the ACS is to use consecutive multi-year surveys to represent 'annual' socioeconomic characteristics (US Census Bureau 2009). However, the fundamental challenge of this approach is that any difference in the estimates could be driven by data from non-overlapping survey years. For example, 80 % of the ACS estimates from the 2012 5-year data cycle (e.g. 2008-2012 surveys) overlap the 2013 release file (e.g. years 2009-2013). If used consecutively, 67 % of the estimates for consecutive 3-year data cycles overlap. As such, any comparison using overlapping survey periods must determine whether the observed differences are unlikely to have occurred by chance due to sampling error.
We examined the reliability and predictability of the annual and multi-year ACS data cycles for measuring socioeconomic disparities in drowning-related injury mortality across the US between 2006 and 2013 using county-level data. Our analysis included ten indicators previously used for quantifying socioeconomic disparities in drowning-related injury rates both in the US and abroad. Our primary objective was to record the amount of variability in the indicators when constructed using the annual and multi-year data cycles. Our secondary objective was to determine whether there was a particular data cycle and indicator that could facilitate better detection of differences in injury risk over time.

Methods
This study was a retrospective cohort study of drowningrelated injury mortality incidence calculated from the annual and multi-year ACS data cycles. Data regarding the injured population were obtained from the National Center for Health Statistics (NCHS) annual detailed mortality file for all US residents for the period 2006-2013. Drowning-related fatalities were specified using International Classification of Disease, Tenth Revision (ICD-10) cause of death codes, inclusive of W65-W74. Mortality and socioeconomic data files were linked using county identification numbers and evaluated as age adjusted rates. County geographic boundaries were selected for analysis because they are the finest administrative data available for the annual ACS data cycle. County boundaries are also robust for national injury and disease surveillance (Singh 2003;Wilkinson & Pickett 2008).
Reliability of the ACS data cycles was evaluated using the coefficient of variation (CV) for each estimate. The National Research Council defines a "reasonable standard of precision" as a CV percentage between 0 % and 12 % (National Research Council 2007). Here, we use a categorical rankings of "high reliability" (0-12 %), "moderate reliability" (12-40 %), and "low reliability" (>40 %) to assess the estimates (Environmental Systems Research Institute. The American Community Survey: An ESRI white paper: ESRI 2014). The standard error required to construct each CV was calculated using an adjustment factor of 95 % confidence. This requires multiplying the margin of error (MOE) for each estimate or its numerator/denominator by a factor equal to 1.960/ 1.645. We chose a 95 % confidence as opposed to the 90 % confidence that is released by the ACS in order to reduce the likelihood of false inference from the number of comparisons of each indicator using the three survey cycles. All calculations were derived using ACS guidelines for constructing proportions and ratios (US Census Bureau 2009). The calculation steps for each indicator are listed in Table 1. The indicators we evaluated were selected based on their previous application for monitoring drowning-related injuries in the US (Brenner et al. 2001;Browne et al. 2003;Gilchrist & Parker 2014;Peek-Asa et al. 2004;Saluja et al. 2006;Thompson & Rivara 2000;Wright et al. 2013) and abroad (Burrows et al. 2013;Giashuddin et al. 2009;Kim et al. 2007) All geographic and socioeconomic files were downloaded from the American Fact Finder database.
Statistically valid measures of change between consecutive multi-year surveys can be calculated to determine whether ACS estimates for the same geographic areas are statistically significantly different.
The steps required to approximate the difference in standard errors between overlapping surveys are published by the ACS (US Census Bureau 2009) In this study, we report the number and percentage of census areas within each multi-year data cycle having estimates that likely reflect true changes in socioeconomic conditions. Here, we report two different models for evaluating the reliability of overlapping surveys. In the first model, differences in the estimates for the same geographic area were compared using the maximum amount of overlap between adjacent surveys. For example, the maximum amount of overlap for consecutive 5-year survey occurs when using the 2012 and 2013 survey cycles to represent annual change in socioeconomic conditions. This is the only method that allows for annual surveillance of socioeconomic conditions for all US counties. In the second model, we evaluated differences between the estimates using the minimum amount of overlap between consecutive surveys. The minimal amount of overlap occurs when comparing the 2009 and 2013 5-year data cycles, effectively placing a 4-year gap in annual surveillance. We used negative binomial regression to compare relative risk estimates for each estimate using the annual and multi-year data cycles. All rates were age-adjusted and weighted using the 2000 US standard million population. For each data cycle, relative risk estimates were derived from a single calendar year of drowning-related injury deaths. For example, only drowning-related deaths recorded in 2009 were used when comparing estimates generated from the 2009 annual cycle and the 2009 3year and 5-year data cycles. Our rationale for comparing estimates for a single calendar year was two-fold. First, our objective was to determine which data cycle and socioeconomic indicator produced the greatest number of statistically significantly relative risk estimates over time. Second, if relative risk estimates are similar across survey cycles it would support the use of using a limited but annually updated data file for approximating injury risk on a national scale. The data analysis for this study was generated using SAS software, Version 9.4 of the SAS System for Windows.
The proportion of high, moderate, and low CV scores for each ACS estimate for the annual and multi-year data cycles are shown in Table 3. The first column in the table represents the number of census counties for each cycle that had reportable data to build the estimate. The subsequent columns list the percentages of high, moderate, and low reliability CV estimates for each indicator. Five of the ten indicators within the annual ACS data cycle, including % college/postgraduate degree, % high school graduates, Gini coefficient, median household income, and county population, generated high reliability CV estimates for at least 98 % of all counties each year between 2006 and 2013. These same indicators generated high reliability estimates for at least 95 % of all census counties for the 2007-2013 3-year file. Within the 5-year cycle, the proportion of census areas with high reliability CV estimates was reduced, but none of the same indicators generated high reliability CV estimates for less than 85 % of all counties. Across all cycles, the proportion of 'low reliability' CV estimates was greatest for the indicator % US residency less than 1 year, with at least 60 % of all census areas for all data cycles      Table 4 shows the percentage of the same geographic areas having estimates that can be considered statistically different (p < 0.05) between sequential and semisequential multi-year data cycles. Of all indicators and across all configurations, changes in county population were the least likely to be attributed to sampling error. Among the other indicators, the likelihood that the observed differences in the estimate was attributed to real changes in area socioeconomic conditions varied. For example, changes in area unemployment rates between consecutive 3-year surveys were statistically significantly different in as few as 13 % of all counties to as many as 89 % of all counties depending on survey year.       For both multi-year data cycles, using the minimum number of years of overlap between survey years improved the reliability of the estimates. For the 3-year data cycle, changes in socioeconomic characteristics for approximately 25 % of all census counties (range 4 to 70 %) were attributed to real changes in area conditions when survey cycles were used every 2 years, compared to an average of 17 % of all counties (range 5 to 89 %) when survey cycles were used consecutively. Similarly, changes in socioeconomic characteristics for approximately 32 % (range 5 to 83 %) of all census areas in the 5-year file were attributed to real changes in area conditions when there was a 4-year lag between survey years compared to an average of 21 % of all counties (range 5 to 93 %) when survey cycles were used consecutively. Table 5 shows age-adjusted relative risk estimates in drowning-related deaths across the US from 2006-2013 using all ten indicators. The indicators are partitioned by survey year as well as data cycle. Due to the release of the first 5-year data cycle in 2009 there is 1 year of overlap in the comparison of the 5-year data cycles. Annual mortality estimates for 2005 were excluded because the same person file (DP05) used to construct the ageadjusted rates was not publically available until 2006. Deaths occurring in census counties having no recorded population denominator were excluded.
Three results from the relative risk evaluations stand out. First, although some indicators were more predictive of injury disparities than others, including % Hispanic, % population without a high school diploma, and % residency less than one year, no indicator produced statistically significant estimates consistently over time or across data cycles. Second, the association between the indicators and drowning-related mortality rates did not consistently improve when the number of census areas included in the comparison increased. For example, on average the 3-year data cycle produced the greatest proportion of statistically significant estimates for all indicators and the annual file produced more statistically significant estimates than the 5-year file. Third, the proportion of African American persons per area consistently showed a protective effect of drowning-related mortality rates. This association should be interpreted with caution given the different demographic structure of rural census areas.

Discussion
We evaluated the reliability of 10 socioeconomic indicators derived from the annual and multi-year ACS data cycles that have previously been employed for monitoring disparities in injury risk. In addition, we assessed the predictability of each indicator for facilitating the detection of differences in injury risk on an annual and semi-annual basis at a national scale using county-level injury mortality data. To our knowledge, this is the first study to evaluate the reliability and predictability of the ACS for drowning-related injury mortality surveillance.
Three key findings from this study stand out. First, only five of the ten indicators produced reliable estimates of area socioeconomic conditions, but these five indicators were consistently reliable across all three data cycles. Second, when overlapping multi-year surveys were used to represent changes in annual socioeconomic conditions fewer than 25 % of all US counties generated estimates that were unlikely to have occurred by chance due to sampling error. Third, no socioeconomic indicator was consistently predictive of disparities in drowningrelated mortality over time or across data cycles at the national scale. Taken together, the evidence uncovered by our analysis indicates that improvement in the reliability of the ACS can be made and that continuously predicting disparities in drowning-related injury risk at the national scale using census-level geographic data is not possible using these ten indicators.
Although these findings can be considered somewhat discouraging, in our view the ACS data cycles nonetheless provide two opportunities to advance injury surveillance and data collection. First, the study identified five socioeconomic indicators that are consistently reliable across ACS data cycles: % college/graduate degree, % high school graduation, Gini coefficient, median household income, and county population. The majority of these indicators are known to be predictive of other injury-related outcomes (Bell et al. 2014). For these indicators, researchers could employ the annual ACS data cycle as a barometer for current socioeconomic conditions, while using the multi-year files to confirm and add precedence to the changing landscape of disparities in injury risk. Second, the indicator county population was the most consistently reliable indicator across ACS survey cycles and also the least likely to be effected by sampling error in the overlap analysis. This supports the use of using either an annual or multi-year population file to estimate yearly or semi-yearly changes in drowning-related deaths, which is one of three drowning-related indicators prioritized by the National Center for Injury Prevention and Control (NCIPC) (Thomas & Johnson 2014). By extension, the same indicator could be used to inform the Healthy People 2020 target of a 10 % reduction target for drowning related deaths over the next decade (U.S. Department of Health and Human Services. Healthy People 2020) However, additional tests are required to determine if the annual file can substitute for either of the multi-year data cycles with respect to monitoring age-adjusted trends relevant to hospitalizations or emergency department visits, both of which are monitored by the NCIPC. That only a small portion of all census areas generated statistically significant differences between overlapping multi-year surveys suggests that researchers should be judicious if using the yearly released multi-year files to represent annual socioeconomic conditions. While socioeconomic conditions do change over time, that the majority of socioeconomic indicators were not statistically significantly different between overlapping survey years suggests that any observed differences obtained from the ACS could be attributed to sampling error rather than any 'real' changes in area unemployment rates, poverty levels, and other demographic changes. This suggests that the best way to employ the multi-year files for injury surveillance is on a semi-annually basis, using the full 3-and 5-year gaps between release years. The results of the regression model lend credence to this finding as the association between the indicators and drowning-related deaths did not increase as a result of increasing the number of census areas in the analysis. Notwithstanding this limitation in the data, access to current socioeconomic informationeven if limited to the annual ACS data fileis a significant strength of the survey as it brings greater currency to efforts to monitor trends, (Injury Surveillance Workgroup. Consensus Recommendations for Using Hospital Discharge Data for Injury Surveillance. State and Territorial Injury Prevention Directors Association & Marietta (GA) 2003; Kozak et al. 2004) improves upon the information required for regional policy and planning for about prevention needs, (Centers for Disease Control 1998; Centers for Disease Control 1999; Hiller et al. 2009;Mireles et al. 2015) and improves access to information that can impact legislation to improve civic safety (Everett et al. 1997;Helmkamp et al. 2012;Pressley et al. 2009).
Importantly, findings from this study must be viewed within the context of three key limitations. Firstly, the evaluation of injury risk was limited to mortality records. While evidence does show that indicators of injury mortality are similarly indicative of injury morbidity (Bell et al. 2014) additional evaluations of hospitalization data against these ACS indicators are needed. Secondly, we included only area socioeconomic indicators in our evaluation, excluding other important covariates known to lead to disparities in drowning-related injury risk, including alcohol consumption (Petridou & Klimentopoulou 2006) the use of safety device, (Treser et al. 1997) and duration of rescue (Topjian et al. 2012). These limitations stem from the lack of documentation in national mortality records. Although this information is obtainable from databases such as the National Trauma Registry (NTR), the evaluation would have to be completed on a state-by-state basis using hospital discharge abstracts because the NTR does not release fine-scale geographic identifiers at the patient-level. Thirdly, the study was limited to census counties as opposed to smaller geographic scales, such as zip codes, census tracts, or block faces. Previous studies have shown that differences in risk attributable to socioeconomic position, or status, are typically largest when viewed using census tracts or blocks (Hameed et al. 2010;Krieger et al. 2003). However, fine-scale evaluation of the ACS would require state-bystate evaluations as the NCHS does not release US mortality data at a geographic scale below Census County. Within the context of this study's objectives, US county boundaries are the smallest geographic unit available to assess the reliability and predictability of socioeconomic indicators derived from all three ACS data cycles.

Conclusion
Despite being highly preventable, injuries are the leading cause of death in the US among all persons under the age of 44 and also the cause of morbidity and mortality with the steepest social gradient (Steenland et al. 2004). For decades, the long-form decennial census served as the primary source of socioeconomic information on the injured patient due to the lack of contextual data contained in hospital registries. Although the full impact of the 2010 replacement of the long-form census with the ACS is still being determined, evidence thus far has shown key weaknesses of the survey. To date, there has been no report on the reliability of the ACS for injury surveillance despite considerable versatility in its application (Aytur et al. 2013;Curry et al. 2015;Hong et al. 2015;Huguet et al. 2014;Landy et al. 2011;Perry et al. 2015;Sastry & Gregory 2013). Our study adds much needed information about the reliability of the ACS and provides a guideline for appropriate monitoring strategies using the annual and multi-year ACS data cycles for drowning-related injury surveillance and data collection.