Hormesis: Calabrese Responds

In his letter, Mushak revisits his criticism (Mushak 2009) of previously reported hormesis frequency estimates (Calabrese and Baldwin 2001, 2003; Calabrese et al. 2006, 2008). In my commentary (Calabrese 2009), I addressed and/or rebutted in considerable detail his arguments (Mushak 2009), and no new data require me to revise that response. Here I address the key areas raised by Mushak’s letter, two of which relate to the frequency of hormesis, and the third considers the acceptance of hormesis by the scientific and regulatory communities. 
 
First, a central point of Mushak’s commentary (Mushak 2009) and his letter is his assertion that the reported hormesis frequency of 37% (Calabrese and Baldwin 2001) is incorrect and should be 11%. Unfortunately, Mushak used the wrong denominator in his commentary, and he perpetuates this error in his letter. Briefly, we (Calabrese and Baldwin 2001) estimated the frequency of hormesis using a priori entry and evaluative criteria; some 668 dose responses satisfied the entry criteria. There were three independent evaluative criteria (i.e., hypothesis testing, nonoverlapping 95% confidence intervals, and alternative quantitative criteria). Of the 668 dose responses, 213 (31.8%) involved hypothesis testing. Of this total, 74 (74/213; 34.7%) satisfied the evaluative criteria for hormesis, a percentage similar to the other two evaluative approaches. When totaled, the three approaches yielded the 37% estimate. Mushak’s error is that he used the 74 dose responses that satisfied the evaluative criteria for hypothesis testing not only against the 213 dose responses that had hypothesis testing (which would have been a correct approach) but against all 668 dose responses, even though the remaining 455 dose responses that satisfied the entry criteria lacked hypothesis testing. None of these 455 dose responses could have been evaluated by the statistical criteria. Nonetheless, Mushak combined all the dose responses that satisfied the entry criteria and derived a hormesis frequency based on only dose responses with statistical significance. In so doing, he mistakingly reduced the 37% frequency to 11%. His method is the equivalent of using a raw score for the math component of the Graduate Record Examination (GRE) as the only source of correct answers, and then using all the questions on the math, verbal, and analytic components of the exam as the denominator, even though the student did not take these other components of the test. Such a calculation would give a useless GRE score. His method of hormesis calculation is clearly why he obtained the incorrect lower frequency. 
 
Second, in his leter Mushak continues to cite a letter by Crump (2007) for which there is no support in the literature; also, Crump’s letter is based on an assumption about methods that was refuted by the National Cancer Institute investigators who actually did the original work (Calabrese et al. 2007). Mushak apparently does not grasp that Crump’s exercise inappropriately introduced 8-fold more variability into the data analysis. In his leter, Mushak incorrectly and inexplicably claimed that Crump’s analysis resulted in my conceding that the hormetic responses that we reported were not different from control responses. 
 
Third, Mushak’s inflexibility concerning hormesis is reflected in his comments that minimize the impact of hormesis and its growing applications. Despite the significant biomedical impact of hormesis, Mushak fails to acknowledge the reality that hormetic effects are the basis for how most anxiolytic (Calabrese 2008a), antiseizure (Calabrese 2008b), memory (Calabrese 2008c; Zoladz and Diamond 2009), Alzeheimer disease (Calabrese 2008c; Congdon et al. 2009), and numerous other classes of drugs work (Kastin and Pan 2008; Mattson 2008; Sonneborn 2008; Thong and Maibach 2008), with all such drugs having to pass the regulatory oversight of the Food and Drug Administration for efficacy and safety. On the environmental side, Mushak—in both his letter and his commentary (Mushak 2009)—did not acknowledge that the largest-ever rodent cancer bioassay (24,000 mice) that was designed to determine the nature of the dose response in the low-dose zone for carcinogens revealed hormetic responses for acetyl aminofluorene- induced bladder cancer and that this was affirmed by the 14-member Society of Toxicology expert panel convened to assess these findings (Society of Toxicology ED01 Task Force 1981). In both his letter and his commentary (Mushak 2009), he also failed to acknowledge that hormesis has had a meteoric rise in recognition and journal citations within the scientific community, with 15 citations per year in the 1980s to > 2,400 in 2009 alone. 
 
On these grounds and those presented in my commentary (Calabrese 2009), I conclude that Mushak’s arguments are without merit.


In his commentary in Environmental Health
Perspectives, Calabrese (2009) offered a num ber of responses to my critique of hormesis methodology (Mushak 2009). Here I will provide a counterpoint to that effort.
• Calabrese (2009) falsely asserted that I erred in calculations associated with entry and evaluatory criteria for hormesis frequency, specifically by choosing the wrong denomi nator for examining the proportion of entry candidates eventually found to be hormetic using the most conventional form of statis tical significance. The choice of a denomi nator for these calculations depends on the question asked. My key question was, What proportions of 668 dose-response entry candidates from 20,285 original arti cles, using the three criteria identified by Calabrese and Baldwin (2001), partition into each of three hormesis categories? A total of 245 of the 668 candidate doseresponses (37%) had hormetic character, but only 74 of those (30%) were derived using the typical statistical significance test, yielding 11% overall. • Calabrese (2009) mis charac terized my statements about the reliability of the two unvali dated selection criteria (Mushak 2009). My comments addressed applying criteria to screening large databases of pub lications for a putative new phenomenon. I was not concerned about routine uses of statistical forms for empirical data (e.g., analyses using 95% confidence intervals on independent means). • Calabrese (2009) misunderstood my con cerns about the two tallies of dosing points (1,089 and 1,791 points) from two of his previous studies Baldwin 2001, 2003b). The still unanswered ques tion is how the 871 (80% of 1,089) controlequivalent and threshold responsecompatible dosing points reported by Calabrese and Baldwin (2001) are mathe matically incorporated into a high pre ponderance of hormetic dosing points (to a 2.5:1 ratio) they reported later (Calabrese and Baldwin 2003b In his letter, Mushak revisits his criticism (Mushak 2009) of previously reported hormesis frequency estimates Baldwin 2001, 2003;Calabrese et al. 2006Calabrese et al. , 2008. In my commentary (Calabrese 2009 (Mushak 2009) and his letter is his assertion that the reported hormesis frequency of 37% (Calabrese and Baldwin 2001) is incorrect and should be 11%. Unfortunately, Mushak used the wrong denominator in his commentary, and he perpetuates this error in his letter. Briefly, we (Calabrese and Baldwin 2001) estimated the frequency of horme sis using a priori entry and evalua tive crite ria; some 668 dose responses satisfied the entry criteria. There were three independent evalua tive criteria (i.e., hypothesis testing, non overlapping 95% confidence intervals, and alternative quantitative criteria). Of the 668 dose responses, 213 (31.8%) involved hypothesis testing. Of this total, 74 (74/213; 34.7%) satisfied the evaluative criteria for hormesis, a percentage similar to the other two evaluative approaches. When totaled, the three approaches yielded the 37% esti mate. Mushak's error is that he used the 74 dose responses that satisfied the evalua tive criteria for hypothesis testing not only against the 213 dose responses that had hypothesis testing (which would have been a correct approach) but against all 668 dose responses, even though the remaining 455 dose responses that satisfied the entry cri teria lacked hypothesis testing. None of these 455 dose responses could have been evalu ated by the statistical criteria. Nonetheless, Mushak combined all the dose responses that satisfied the entry criteria and derived a hormesis frequency based on only dose responses with statistical significance. In so doing, he mistakingly reduced the 37% fre quency to 11%. His method is the equivalent of using a raw score for the math component of the Graduate Record Examination (GRE) as the only source of correct answers, and then using all the questions on the math, verbal, and analytic components of the exam as the denominator, even though the student did not take these other components of the test. Such a calculation would give a useless GRE score. His method of hormesis calcula tion is clearly why he obtained the incorrect lower frequency.
Second, in his leter Mushak continues to cite a letter by Crump (2007) for which there is no support in the literature; also, Crump's letter is based on an assumption about methods that was refuted by the National Cancer Institute investigators who actually did the original work (Calabrese et al. 2007). Mushak apparently does not grasp that Crump's exercise inappropriately introduced 8fold more variability into the data analysis. In his leter, Mushak incor rectly and inexplicably claimed that Crump's analy sis resulted in my conceding that the hormetic responses that we reported were not different from control responses.
Third, Mushak's inflexibility concern ing hormesis is reflected in his comments that minimize the impact of hormesis and its growing applications. Despite the signifi cant biomedical impact of hormesis, Mushak fails to acknowledge the reality that hormetic effects are the basis for how most anxiolytic (Calabrese 2008a), anti seizure (Calabrese 2008b), memory (Calabrese 2008c;Zoladz and Diamond 2009), Alzeheimer disease (Calabrese 2008c;Congdon et al. 2009), and numerous other classes of drugs work (Kastin and Pan 2008;Mattson 2008;Sonneborn 2008;Thong and Maibach 2008), with all such drugs having to pass the regulatory over sight of the Food and Drug Administration for efficacy and safety. On the environ mental side, Mushak-in both his letter and his commentary (Mushak 2009)-did not acknowledge that the largestever rodent can cer bioassay (24,000 mice) that was designed to determine the nature of the dose response in the lowdose zone for carcinogens revealed hormetic responses for acetyl aminofluorene induced bladder cancer and that this was affirmed by the 14member Society of Toxicology expert panel convened to assess these findings (Society of Toxicology ED 01 Task Force 1981). In both his letter and his commentary (Mushak 2009), he also failed to acknowledge that hormesis has had a meteoric rise in recog nition and journal cita tions within the scientific community, with 15 citations per year in the 1980s to > 2,400 in 2009 alone.
On these grounds and those presented in my commentary (Calabrese 2009), I con clude that Mushak's arguments are without merit. The author's host institution, the University of Massachusetts, has received annual financial contributions from ExxonMobil to support low-dose research activities; these contributions were not used to support activities related to this manuscript. The author directs the BELLE project and two annual conferences and obtains funding for these activities from a variety of sources. These funds are processed by the host university. These contributions were also not used to support activities related to this manu script. During the last 3 years he has also received support for travel and honoraria for seminars on hormesis delivered at Lilly and Sanofi-Aventis and several universities.

Edward J. Calabrese Environmental Health Sciences Division School of Public Health and Health Sciences
University of Massachusetts Amherst, Massachusetts Email: edwardc@schoolph.umass.edu doi:10.1289/ehp.1001979 In drinking water supplies the intake of the toxic heavy metal lead is commonly due to metal corrosion in the peripheral water distri bution system, especially the user's plumbing or lead service lines. Recently, the prob lem again received attention in the United States when testing data of drinking water A 154 volume 118 | number 4 | April 2010 • Environmental Health Perspectives Correspondence Correspondence at schools was published (Renner 2009). In Europe several countries are known to have significant numbers of buildings with elevated lead tap water concentrations, for example, the United Kingdom (Watt et al. 1996), Austria (Haider et al. 2002) and Germany (Becker et al. 2001).

Lead in Drinking Water as a Public Health Challenge
Lead exposure from drinking water has been a topic of public health prevention programs in several parts of Germany before, for example, Hamburg (Fertmann et al. 2004) and Frankfort (Hentschel et al. 1999). In 2005 in the northern German state of Lower Saxony, a prevention program was initiated comprising three different approaches at the same time to achieve a widespread effect. To assess the present state of drinking water contamination with lead, a free examination of lead in tap water (after nocturnal stagnation) was offered in cooperation with local public health departments for private households that included young women and families with children (Zietz et al. 2007(Zietz et al. , 2009. Along with the collection of data, the program aimed to focus public attention on this public health problem. In another part of this program, data from local public health departments on existing lead measurements, especially in public buildings, were collected and analyzed (Zietz et al. 2007(Zietz et al. , 2009). Finally, a working group on lead replacement, consisting of representatives of all relevant parties (e.g., tenant and landlord associations, crafts people, building and health administrations) was initiated. In the screening part of the project, a total of 2,901 tap water samples from households were collected during 2005-2007. Of these, 7.5% had lead concentrations > 10 µg/L (recommended limit of the World Health Organization) and 3.3% had concentrations above the present limit of the German drinking water ordinance (25 µg/L) (Zietz et al. 2009). We found remarkable regional differences in the frequency of tap water contamination. An additional inclusion criterion in this study was that buildings must have been constructed before 1974 (after which no new lead pipes were installed); therefore, the results cannot be compared directly to other studies. From the data, we roughly estimated that about 4.7% of all households in Lower Saxony have lead concentrations > 10 µg/L (Zietz et al. 2009). In an earlier study in southern Lower Saxony (Zietz et al. 2001a), households with mothers of newborn babies from the area around the university city of Göttingen were investigated. Of the 1,434 stagnation samples, 3.1% had lead concentrations > 10 µg/L.
A moderately higher percentage of households with elevated composite water samples was found in the geographic area of the city of Berlin using two composite water sampling methods (5.6% and 7.0%, respectively. In total, 2,109 households were tested with both methods in the federal state of Berlin (Zietz et al. 2001b). In a representative study of samples collected in all parts of Germany during 1997-1999 (Becker et al. 2001), the 90th percentile of lead concentrations in 4,761 stagnation samples was 7.6 µg/L.
Projects in association with epidemiologi c investigations also provide an opportunity to design prevention programs in this field. Generally, we favor the precautionary measure of preventing exposure to lead by replacing pipes completely. The addition of anti corrosive substances to the public water supply can be effective in lowering lead concentrations. In contrast, changing water chemis try (e.g., a new water disinfectant method, as in Washington, DC, USA) can have a substantial effect in elevating lead (Renner 2009). Flushing the water pipes and using only cold water are short-term methods of decreasing exposure to lead from tap water. Using bench-top water filters can also decrease lead concentrations, but problems such as leaching of different substances into the water or microbial contamination may arise under certain conditions. Thus, lead plumbing material in buildings still poses a challenge for public health in the United States and in Europe.