Persistent and extreme outliers in causes of death by state, 1999–2013

In the United States, state-specific mortality rates that are high relative to national rates can result from legitimate reasons or from variability in coding practices. This paper identifies instances of state-specific mortality rates that were at least twice the national rate in each of three consecutive five-year periods (termed persistent outliers), along with rates that were at least five times the national rate in at least one five-year period (termed extreme outliers). The resulting set of 71 outliers, 12 of which appear on both lists, illuminates mortality variations within the country, including some that are amenable to improvement either because they represent preventable causes of death or highlight weaknesses in coding techniques. Because the approach used here is based on relative rather than absolute mortality, it is not dominated by the most common causes of death such as heart disease and cancer.


INTRODUCTION
This paper builds upon the findings of the paper, "The Most Distinctive Causes of Death by State, 2001State, -2010 published in the online journal Preventing Chronic Disease in May, 2015 (Boscoe & Pradhan, 2015). That paper-formally a "GIS Snapshot," consisting of a single map and accompanying short description-presented the most distinctive cause of death for each state and the District of Columbia for the 2001-2010 period. "Most distinctive" was defined as the highest ratio of state-specific death rate to national death rate for each of the causes of death included in the 113 Selected Causes of Death List published by the National Center for Health Statistics (2002). For example, the age-adjusted death rate due to pneumoconiosis nationwide was 0.3 per 100,000, but in Kentucky it was 1.0 and in West Virginia it was 3.9. The respective ratios of 3.3 and 12.4 were higher than for any other cause of death in these states, making them the most distinctive. The mapped causes of death can also be understood as those with the highest state-specific relative risks, the highest location quotients (Mayer & Pleeter, 1975), or as the largest outliers. In general, the identification of outliers is useful for assessing the integrity of a data set and to identify genuinely unusual phenomena that can give rise to hypotheses (Osborne & Overbay, 2004).
In the time since the original paper was first submitted for publication, three additional years of data have become available. I incorporate these data into an alternative way of conducting the analysis that identifies what I term persistent outliers and extreme outliers. Persistent outliers were those causes of death with an age-adjusted rate that was at least twice the national rate in each of the five-year time periods 1999-2003, 2004-2008, and 2009-2013. Extreme outliers were defined as those causes of death with an age-adjusted rate that was at least 5 times above the national rate in at least one of the time periods. Identifying all of the outliers in this manner instead of identifying exactly one per state, as was done on the original map, is a more inclusive means of summarizing the data.

METHODS
National and state-specific age-adjusted death rates for all of the causes of death included in the 113 Selected Causes of Death List for the period 1999-2013 were obtained from the Centers for Disease Control and Prevention (CDC) Wide-ranging Online Data for Epidemiologic Research (WONDER) web site (CDC, no date). This list was developed for the general analysis of mortality data and for ranking causes of death and is based on International Classification of Diseases version 10 (ICD-10) codes. The list includes some overlapping cause of death categories; including these results in a total of 136 causes of death. Data were divided into three 5-year periods: 1999-2003, 2004-2008, and 2009-2013. The ratios of the state rates to the national rates for each cause of death in each period were calculated, and persistent and extreme outliers were identified, as defined above. State-level counts below 10 were suppressed by WONDER and excluded from the analysis; fewer than 0.01% of deaths fell into this category, which included causes of death so rare that no data was reported for any state (as with measles, 12 deaths nationally in 15 years) or only reported for a few of the largest states (as with whooping cough, with data only reported for California and Texas). State-level counts between 11 and 19, marked by WONDER as "unreliable," were included in the analysis. 95% confidence intervals around the ratios were determined using the RELRISK option in the FREQ procedure in SAS version 9.3 (SAS Institute, Cary, North Carolina, USA). Results were tabulated for all causes of death in the list, even where the classifications overlapped, as for example with homicide, homicide by firearm, and homicide by other and unspecified means. An exception was made for "other and unspecified events of undetermined intent and their sequelae" and "events of undetermined intent," because these two categories were nearly identical-the first comprised over 99% of the second. Only the second, more inclusive category is reported here.

RESULTS
There were 62 persistent outliers among 28 states plus the District of Columbia (Table 1). The District of Columbia had the most persistent outliers, with 9, while there were 22 states without any. There were 38 extreme outliers among 14 states plus the District of Columbia ( Table 2). The District of Columbia led with 7, while 36 states did not have any. Twelve of the persistent outliers also appeared on the list of extreme outliers: water and air accidents (Alaska), events of undetermined intent (Maryland and Utah), other acute ischemic heart disease (Oklahoma and Virginia), influenza (South Dakota), and pneumoconiosis (West Virginia), plus five in the District of Columbia-HIV, homicide, homicide by firearm, hypertensive heart disease, and atherosclerotic cardiovascular disease.

DISCUSSION
The tables highlight instances where state mortality rates exceeded national rates by substantial margins. These can be understood as either genuine phenomena-where the risk of death due to a certain cause was truly elevated-or as artifacts of state-specific coding practices. The former category includes unambiguous infectious and chronic diseases such as viral hepatitis and pneumoconiosis, and well-specified types of accidents such as accidental drowning and exposure to smoke, fire and flames. The latter category includes causes of death containing the words "other," "unspecified" and "unknown," where a state, for whatever reason, was unable to code deaths to the same level of specificity as other states. There are a number of possible explanations for this lack of specificity. The information could have truly been absent-a physician or coroner might have only indicated something like "cardiac arrest" on the death certificate, for example, and there were insufficient resources to follow up and obtain something more precise. It is also possible that coding guidelines may have been interpreted overly strictly or literally, or may have been perceived as unclear, outcomes that are influenced by the experience level of the death certifier (Johnson et al., 2010). There could have also been instances of "motivated misreporting," in which the person filling out the death certificate may have had an incentive to be vague (Osborne & Overbay, 2004). An example of this has occurred in Maryland, where the state's chief medical examiner is on record that many "events of undetermined intent"-which include unresolved homicides, suicides, and accidents-cannot be coded more specifically without input from the legal system, even though the medical determination of intent is distinct from the legal determination (Fenton, 2012). Critics have argued that this practice substantially suppresses the official homicide rate. Indeed, Maryland's rate of "events of undetermined intent" was 6-7 times above the national average in each of the 3 time periods.
For some of the reported outliers, it is not obvious whether the findings were genuine, an artifact, or some combination of the two. For example, influenza, which appeared as an outlier in 9 different states (Iowa, Maine, Minnesota, Montana, Nebraska, North Dakota, South Dakota, Vermont, and Wyoming), would seem to be a clearly defined cause of death. Yet the number of deaths due to influenza is small, totaling 3,697 in 2013 (Centers for Disease Control and Prevention, 2015). Influenza deaths are perceived as common because people tend to be more familiar with the counts of influenza and pneumonia combined (56,979 in 2013, placing it among the top ten causes of death nationwide when so grouped), or the number of influenza-associated deaths (estimated at 20,000-30,000 annually, a number derived from mathematical models rather than death certificates Doshi, 2008;Thompson et al., 2009;Centers for Disease Control and Prevention, 2015). Of the comparatively small number of deaths officially ascribed to influenza, a minority were confirmed with a lab test (these receive ICD-10 codes J09 and J10), while the remainder were based on observation (these receive code J11). The nine states with unusually high influenza death rates may simply have been more aggressive in ordering lab tests, or more willing to have called influenza-like illness influenza, than to have had a true excess risk.
Note that this analysis was only able to identify likely examples of substantial overreporting in certain causes of death. There have also been well-documented examples of substantial underreporting, such as with suicide (Klugman, Condran & Wray, 2013), pregnancy-related deaths (Deneux-Tharaux et al., 2005), and injuries from falls (Betz, Kelly & Fisher, 2008). In some cases, such as with "events of undetermined intent," the overreported category can imply which categories were likely underreported, but a separate analysis would be required to identify properly these negative outliers; such an analysis would be complicated by the suppression of counts less than 10. For the present analysis, the suppression of counts less than 10 might have masked some potentially interesting information (for example, if the 12 measles deaths had been concentrated in just a few states), but by definition would not have included anything of widespread public health importance.
Each one of the causes of death highlighted in the tables suggests a story about mortality disparities, mortality coding disparities, or some combination of the two that demands further investigation. In the interest of brevity, I will comment only on the dozen entries which appeared in both tables. The District of Columbia, with 5 of the 12, revealed itself as an outlier among outliers. Although not a state, its data are typically reported with the 50 states, as was done here. It is unique among "states" in having an African-American majority and being entirely urban. It also has the highest poverty rate and income inequality of any "state," making it an outlier by numerous measures. The high rates of HIV-related deaths and homicide seen here reflect the urban pathologies of intravenous drug use and crime, while hypertensive heart disease and atherosclerotic cardiovascular disease reflect DC's racial composition, even while the precise reasons for greater hypertension among black Americans remain elusive (Fuchs, 2011).
Moving to Alaska, the classification "water, air and space, and other and unspecified transport accidents and their sequelae" has a straightforward explanation: travel by water and air is vastly more common here than in other states, and is the only way to reach many settlements within the state. Pneumoconioses, more commonly known as black lung disease, has a similarly obvious association with West Virginia, the state most closely associated with coal mining. "Events of undetermined intent," with high rates in Maryland and Utah, has already been discussed, as has influenza, with particularly high rates in South Dakota.
That leaves "other acute ischemic heart disease," which appeared in both Oklahoma and Virginia. From 1995 to 1999, the rate for this cause of death was over 25 times the national average in Oklahoma, making it the most extreme outlier in this entire analysis. The rate subsequently dropped to 9 times the national average in 2008-2013, still one of the more extreme values. This is a clear example of coding imprecision, reflecting an inability to distinguish among chronic heart disease, heart attack (myocardial infarction), and a few other less common conditions in a manner not shared by other states. For any studies