Public Safety and Faulty Flood Statistics

We appreciate and sympathize with Manuppello and Willett’s concerns about the inappropriate use of animals in research. Significant advances in animal protection have been secured in the United States through standards that have been promulgated, and we hope that further progress will be made.

We thank Black (2008) for his thoughtful assembly of diverse perspectives including ours on why another disastrous Mississippi River flood occurred so soon after 1993. We have an update that elucidates and quantifies several key points of his article, particularly that a) flood frequency and heights are increasing, b) water levels for regulatory "100-year" floods are profoundly underestimated, and c) misconceptions about risk confound appropriate responses to flooding. At many sites in Iowa and Missouri the Flood of 2008 neared or exceeded the record "200-year" or "500-year" levels attained in 1993 [National Weather Service (NWS) 2008; U.S. Army Corps of Engineers (USACE) 2008b], prompting many to wonder how two such events could occur in only 15 years. Defenders of flow-frequency predictions cite their rigorous methods and assure us that improbable outcomes are possible, even two "100-year" floods in consecutive years (U.S. Geological Survey 2005). Such arguments miss the key issue: Are the flood probabilities calculated by USACE (2004USACE ( , 2008a credible? Problems with the USACE (2004, 2008a) probabilities are exemplified by recent flooding at Hannibal, Missouri ( Figure 1). The record stage set in 1993 exceeded the calculated 500-year level, whereas 2008 was a 200-year event. In addition, 2001 suffered a 50-to 100-year flood, 1986 and 1996 experienced 25-to 50-year floods, and five more years had 10to 25-year floods. Are these calculated recurrence intervals reasonable, or is it more likely that the dice, in effect, are loaded?
Statistically, two 200-year floods would likely not occur in an interval < 330 years, but Hannibal has recently had far worse (Figure 1). Two 500-year floods would probably not occur in < 840 years, yet two such floods recently occurred at Canton, Missouri, and Burlington, Iowa. A chi-square statistical test rejects the assumed correctness of the USACE frequencies with 99.9% confidence. The 100-year flood stage at Hannibal should be realistically redefined as a 10-year flood, as reported by Black (2008).
Because floods are becoming more frequent and more severe over much of the Mississippi River basin (Black 2008;Criss and Shock 2001;Remo and Pinter 2007), statistical calculations based on the historical record are not appropriate predictors of future flooding, particularly for extreme events (Klemes 2000). The likelihood of attaining a given flood stage today is not the same as it was a century ago, and besides, the record is far too short to calculate what a 200-year or a 500-year flood might be.
As Black (2008) discussed, calculated flood probabilities are not merely academic. The Federal Emergency Management Agency (FEMA) and the National Flood Insurance Program use these calculations to delimit 100-year flood zones and to set insurance requirements and rates. Understated risk burdens everyone with debt (Crichton 2002;Davidson 2005) and places humans and property at risk from water that is not only too high but also laden with contaminants and sediment. Unfortunately, erroneous calculations (USACE 2004(USACE , 2008a are being used in the very latest proposals (USACE 2008c) for flood protection and ecosystem restoration. We need to use realistic concepts about flooding in our management plans.
In their commentary, Huff et al. (2008) proposed that exposing experimental animals to test substances in utero and for 30 months or until their natural deaths increases the sensitivity of bioassays, avoids false negative results, and strengthens the value and validity of results. Instead, longer exposure results in increased numbers of spontaneously arising tumors, as well as increased cost and animal suffering, while failing to address the bioassay's fundamental flaws.
Although it is troubling when the bioassay produces false negative results, a far A 516 VOLUME 116 | NUMBER 12 | December 2008 • Environmental Health Perspectives

Perspectives | Correspondence
The correspondence section is a public forum and, as such, is not peer-reviewed. EHP is not responsible for the accuracy, currency, or reliability of personal opinion expressed herein; it is the sole responsibility of the authors. EHP neither endorses nor disputes their published commentary. more pervasive problem is that of false positives. In our analysis of > 500 National Toxicology Program (NTP) bioassays [People for the Ethical Treatment of Animals (PETA) 2006], we found that more than half of the substances evaluated (259) produced evidence of carcinogenicity in at least one group of animals, but only about one-third of these (89) were subsequently classified as known or probable human carcinogens by the NTP itself. Even fewer of the substances, 40 and 16, respectively, were classified as carcinogens by the U.S. Environmental Protection Agency (EPA) and the International Agency for Research on Cancer (PETA 2006). This high falsepositive rate is thought to be largely an indirect effect of increased cell proliferation in response to cell injury and death caused by the near toxic doses of test substances used in the bioassay (Gaylor 2005). Speciesspecific modes of action operating in rats or mice but not in humans, such as those mediated by 2µ-globulin, peroxisomes, and thyroid-stimulating hormone, also contribute to the high rate of false positives (Cohen 2004). Huff et al. (2008) asserted that one of the "well-accepted observations" upon which the "relevance of experimental bioassays to humans" rests is that "findings from independently conducted bioassays on the same chemicals are consistent." In fact, in a comparison of 121 bioassays from the NTP database with those in the published scientific literature, Gottmann et al. (2001) found that the studies produced consistent results only 57% of the time. Huff et al. (2008) cited questions about the safety of aspartame raised by 3-year bioassays conducted by the Ramazzini Foundation to support their conclusions. Although they noted that the European Food Safety Authority (FSA) and the U.S. Food and Drug Administration (FDA) dispute these studies' conclusions, the FSA's Committee on Carcinogenicity (COC 2006) observed that In view of the inadequacies in design of the [Ramazzini Foundation] study and the use of rats with a high concurrent infection rate, the COC considered that no valid conclusions could be derived from it.
Further, the COC noted that groups of animals fed aspartame had lower body weights and thus lived longer, which may have compromised the results by leading to an apparent increase in spontaneously arising tumors. Considering that lower body weights are typically observed among animals in the bioassay's experimental groups, this is likely to generally confound the interpretation of longer bioassays.
We must stress that animals suffer during the bioassay: They live in the barren, stressful conditions of the laboratory-often including daily forced feeding or inhalation-and many also suffer from exposure to near toxic doses of test substances. These exposures often produce lethargy, anemia, diarrhea, weight loss, and other symptoms of sickness and distress. The proposal of Huff et al. (2008) to extend the length of the bioassay would obviously result in a proportional increase in this suffering.
Further, extending the bioassay runs counter to current trends in regulatory testing. Concern for the suffering of animals has caused regulatory agencies to review the usefulness of long-term studies, resulting in elimination of the 1-year dog toxicity test (U.S. EPA 2007) and an international effort to replace the two-generation reproductive toxicity test (Cooper et al. 2006). Huff et al.'s proposal thus clearly represents a step backward for toxicological science.
According to the NTP's own estimates, each bioassay requires 5 years to plan, conduct, and evaluate; 860 animals to be killed; and $2-$4 million. As a result, the NTP has conducted an average of only 12 bioassays/year over the past several decades. Considering that humans are thought to be exposed to approximately 80,000 environmental toxicants (Ward et al. 2003), it would take more than 32 millenia, 68 million animals, and $160 billion to test them all at this rate. Once again, extending the length of the bioassay would only increase these already ridiculous numbers.
The time has clearly come for antiquated animal tests such as the bioassay to be abandoned in favor of modern, human-relevant methods such as epidemiologic studies, highthroughput in vitro methods, and computational toxicology. We appreciate and sympathize with Manuppello and Willett's concerns about the inappropriate use of animals in research. Significant advances in animal protection have been secured in the United States through standards that have been promulgated, and we hope that further progress will be made. Drugs are developed today often using methods that evaluate impacts in vitro and take advantage of innovations in threedimensional structure-activity modeling. Still, because in vitro and theoretical methods are imperfect and because relying on them alone could result in considerable harm to humans, drugs and other chemicals are studied-and are required to be studied-in animals before being tested on humans or introduced into the human environment.
Epidemiologic studies, high-throughput in vitro methods, and computational toxicology would be preferable to animal research if they provided sensitive, accurate measures of human risk. Unfortunately, epidemiology studies are insufficiently sensitive, in vitro methods are insufficiently predictive, and computational toxicology is insufficiently developed. Ideally, government would provide much greater funding to further develop those and other technologies to expedite less expensive and more accurate means of risk assessments and reduce the need for animal bioassays.
The fact that positive animal bioassays have not always been mirrored in positive human findings provides an ethical public policy challenge. If we must wait for human evidence of cancer before acting to prevent future cases, then we are conducting Environmental Health Perspectives • VOLUME 116 | NUMBER 12 | December 2008 experiments on unwitting subjects without controls-especially if animal evidence already indicated a risk.
Current standards for concluding that an agent is a human carcinogen require statistically significant proof of sufficient numbers of cases of cancer in humans with measured or estimated exposures. Human cancers can take up to several decades to become evident in populations. Surveillance of the workplace and general environmental monitoring are not being widely conducted at this time. The last national survey of workplace carcinogens was conducted in the last century, and no new survey is planned at this time. Furthermore, except in cases where the cancer risks are enormous, such as tobacco smoking and workplace exposure to asbestos, linking a chemical outside the workplace, for example, in the diet, air, or water, to human cancers is virtually impossible.
In fact, the 2.5-to 3-year bioassays of the European Ramazzini Foundation of Oncology and Environmental Sciences on toluene, benzene, radiation, and aspartame (Soffritti et al. 2004(Soffritti et al. , 2007 consistently indicate that the results of 2-year studiesthe normal length of rat studies-underpredict potency/carcinogenicity and that true lifetime studies more accurately reflect cumulative impacts. In addition, the paradigm-busting work of researchers on transgenerational effects, male-mediated teratogenesis, and the general critical impact of early developmental windows all strengthen the case for regarding 2-year postnatal bioassays as incomplete indications of toxicity/carcinogenicity (e.g., Dolinoy et al. 2007;Newbold and McLachlan 1982;Sonne et al. 2008;Swan et al. 2006).
In our commentary (Huff et al. 2008) we indicated that 2-year results tend to be biased toward the null hypothesis. In no way do we state or imply that a positive finding in a 2-year assay would be negated by a longer test, nor do we indicate the contrary, that the absence of a positive finding in a 2-year assay should be construed as proof that there is no impact.
The current typical bioassay has served society well, but it does have several flaws: It cannot provide adequate prediction for the growing proportion of the population that is now living well into their eighties and nineties; it is not designed to evaluate the impact of prenatal exposures on later life; rodents are not a perfect model for humans; and countless animals must be sacrificed to obtain admittedly imperfect results.
By focusing improved bioassays on highvolume chemicals for which there are a priori grounds for concern, current approaches promise to yield the greatest good with the least harm. Simultaneously, more research must be done to identify fast, sensitive, accurate, and economical means of identifying chemicals that might be harmful to humans. Brauer et al. [Environ Health Perspect 116:680-686 (2008)] identified several minor errors in their recent publication. First, in the "Results" section of the abstract, the values given for low full-term birth weight (LBW) were incorrect; "Residence within 50 m of highways was associated with … a 22% (95% CI, 0.81-1.87) increase in LBW." There were also inconsistencies in the description and application of missing data rules across all pollutants (Tables 3-7); the description of these rules in the text (bottom of p. 681) should read "For both approaches, a subject was considered missing if there was a gap of > 5 consecutive days in air monitoring data or if there were > 10 missing days within the term of the pregnancy."

M.F.J. is employed by an advocacy organiza
In Tables 2, 3, and 5, results for the small-for-gestational-age (SGA) outcome were presented for births restricted to gestational periods of ≥ 37 weeks, and not for the full cohort; in Tables 2, 3, 5, and 7, coding errors in the application of birth weight and gestational period cutoffs resulted in slight differences in the number of cases of SGA and preterm births; and in Tables 5 and 6, one road proximity measure was not labeled correctly: < 150 m from a major road/highway was actually 50-150 m from a major road/highway.
Correcting these errors results in small changes in the numbers of subjects included in specific analyses and slight differences in odds ratios (ORs) but does not alter our overall findings. The corrected tables are presented below.
The authors apologize for the errors.