Bioassay Bashing Is Bad Science The Spheres of Influence

The Spheres of Influence, “Assessing Assays” (Schmidt 2002), in the May 2002 issue of EHP criticizes the National Toxicology Program rodent bioassay (NTPRB) without discussing its importance in regulation and public health. The Spheres article (Schmidt 2002) does not express concern for the validity of alternative transgenic methods proposed to replace the NTPRB for detecting carcinogens. By focusing on the limitations of the NTPRB without mentioning limitations of transgenic alternatives, Spheres creates the impression that transgenics are superior. Attempting to supplant the rodent bioassay with various mutation tests, DNA repair tests, cell transformation tests, and many others is a history of failure (Johnson 2000, 2001; Johnson and Snell 1986; Rall et al. 1987). The Spheres article (Schmidt 2002) does not provide any evidence to persuade us that transgenics will change that history. To illustrate bias, Spheres (Schmidt 2002) states that

The Spheres of Influence, "Assessing Assays" , in the May 2002 issue of EHP criticizes the National Toxicology Program rodent bioassay (NTPRB) without discussing its importance in regulation and public health. The Spheres article  does not express concern for the validity of alternative transgenic methods proposed to replace the NTPRB for detecting carcinogens. By focusing on the limitations of the NTPRB without mentioning limitations of transgenic alternatives, Spheres creates the impression that transgenics are superior. Attempting to supplant the rodent bioassay with various mutation tests, DNA repair tests, cell transformation tests, and many others is a history of failure (Johnson 2000(Johnson , 2001Johnson and Snell 1986;Rall et al. 1987). The Spheres article  does not provide any evidence to persuade us that transgenics will change that history.
To illustrate bias, Spheres  states that Rodent models used to test potential carcinogens are by their nature "wrong" because they merely simulate the response of the real target specieshumans.
However, the Spheres article  gives no evidence to explain why rodent models are "wrong," or why by similar reasoning transgenics would not also be wrong. NTPRB results sometimes provide and precede the same indication of carcinogenicity found in humans (Huff, 1993;IARC, 2002;Tomatis, 1979). These responses in rodents have proven to be accurate, that is, not "wrong." Also, according to a statement attributed to James MacDonald at Shering Plow Research Institute (Kenilworth, NJ), conventional rodent models predict human cancer no better than "a flip of the coin." This statement is disingenuous. Although the NTPRB has identified human carcinogens, no one knows for sure with what accuracy it will predict future human carcinogens. If the "coin flip" notion were true, then it would be just as true for transgenics. However, no reasonable person would suggest replacing established methods to identify carcinogens with "coin flips." Importantly, few human carcinogens have been tested thoroughly in any model, and very few known human noncarcinogens are available to evaluate the negative responses in the model systems.
The Spheres article ) concludes with a quotation by Samuel Cohen: "My hope is that this is the first step toward getting rid of the two-year bioassay altogether." Obviously, Cohen ignores the fact that the rodent bioassay is an accepted regulatory standard and before we "get rid of it," an alternative method must be developed and validated.
Validation is a rigorous scientific process whereby the performance of a model is compared against the performance of accepted methods. Validation is governed by the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM 2002). Although some people may express a preference for transgenic alternatives, no new method will gain regulatory acceptance without ICCVAM. In the case of rodent carcinogenesis tests, until proposed replacements are validated, regulatory agencies will be expected to use the standard bioassay for identifying carcinogenic agents.
The NTPRB represents a serious effort to evaluate agents by essentially one standard protocol: two species (usually Fischer rats and B6C3F 1 mice), both sexes, exposures lasting for two-thirds of the lifetime, and multiple doses including the maximum tolerated amount (Bucher 2000;Chhabra et al. 1990;Haseman et al. 2001;Huff 1999aHuff , 1999bHuff and Haseman 1991). Virtually all tissues are examined for tumors. This standard protocol has been adopted throughout the world.
Although criticism has been leveled against the bioassay (Ames andGold 1990, 1997;Johnson 1999Johnson , 2000, some of it has been misguided (Tomatis et al. 2001). While questions of exposure levels and numbers of animals have been raised (Johnson 2002), these issues are not necessarily relevant for all of the > 500 rodent studies that have been done. For example, methylene chloride was tested at concentrations to which workers are exposed; butadiene at levels 150 times below the Occupational Safety and Health Administration standard (Huff et al. 1985;Melnick and Huff 1992); benzene at concentrations near those found in gasoline (Huff et al. 1989); dibromoethane at or below concentrations found in grain silos (Huff 1983b); tetranitromethane at levels found in the munitions industry (Bucher et al. 1991); and dibromochloropropane at levels found in manufacturing plants (Huff 1983a). Several other chemicals may fit into this category (Huff 1999a(Huff , 1999bNTP 2002).
When transgenic models are contemplated as replacements for standard rodent systems, many of the same uncertainties arise, and effects of dose, numbers of animals per dose group, duration, routes of administration, and genotype are generally unknown. In the case of transgenics, however, we also need to know to what extent the power of detection may be altered compared to that provided by the NTPRB. For example, would a suggested 6-month study using 15 transgenic mice per dose per group be adequate to detect a rare, late-developing or weak tumor response we find in the standard NTPRB protocol? Even for the NTPRB, duration of exposure might be lengthened to increase sensitivity (Haseman et al. 2001;Maltoni 1995), especially for late-appearing carcinogenic effects (Soffretti et al. 2002).
For transgenic models, standards for practically all experimental variables remain under debate, and few comparisons between the NTPRB and transgenics have been made using the same chemicals. We have much less experience with transgenic models than we have with the NTPRB. Background control tumor rates in transgenic systems are only now becoming sufficiently well known for comparative evaluation. A transgene may alter responsiveness to different classes of chemicals. Thus, a transgene may make a system more sensitive for detecting one chemical class or tumor type, and less sensitive for another. Also, statistical false positives and false negatives, investigated in the NTPRB (Haseman and Elwell 1996), have not been explored in transgenics.
The question of how to predict human carcinogens more effectively would benefit from fundamental research. The number of genes contributing to increases or decreases in susceptibility to the carcinogenic effect of one chemical or another is unknown. In order for one or a few transgenic models to be able to accurately predict which chemicals will be carcinogenic and which will not, the number of genes that actually determine susceptibility to chemical carcinogens in human populations must necessarily be quite small. However, the number and variety of genes could be large, and different genes are likely involved in carcinogenicity of different chemicals. Conceivably, one-half or more of the genome could be involved in carcinogenic responses. Until we better appreciate the number and kind of genotypes involved in carcinogenic responses, our understanding of what a carcinogen is will remain incomplete and how we manage carcinogenic risks in our environment uncertain. The need for fundamental research is not a reason to change existing prevention strategies; however, results of that research could lead to future changes. Advocates of transgenic models do not seem to realize the importance of fundamental research and thus may repeat errors made in development of conventional rodent assays.
Transgenics are not a panacea for avoiding uncertainties associated with rodent bioassays (Johnson 2001). Because of transgene-specific and chemical-specific enhanced sensitivity, negative results in transgenics may only mean that the "correct" transgene was not used. Positive results may only reflect hypersensitivity not found in humans.
Further, choices of particular transgenes are virtually limitless. Therefore, particular models, such as the mouse embryonic zetaglobin promoter fused to activated v-Ha-ras (Tg.AC), may or may not show the appropriate response to a given chemical. Furthermore, having additional responses from other similarly uncertain indicators will not help differentiate carcinogens from noncarcinogens.
Regardless of some uncertainties, the NTPRB is the accepted regulatory standard currently being used to protect worker safety and public health from carcinogens (e.g., Huff 1999aHuff , 1999bMaltoni 1995;Rall 2000;Tomatis et al. 2001;Tomatis andHuff 2001, 2002). Transgenic models must demonstrate equal or better performance before they can be accepted as replacements. Tomatis L. 1979

"Bioassay Bashing Is Bad Science": Cohen's Response
Thank you for the opportunity to respond to the comments made by Johnson and Huff regarding the article assessing alternative assays, which appeared in the May 2002 issue of EHP . In the comments that I made to Schmidt, it was not my intent to bash the bioassay. Instead, it was my intent that the bioassay be put in perspective, which is why I used the quote from George Box indicating that all models are wrong, but that some are useful. Clearly, the 2-year bioassay has been useful. The difficulty is that, like all models, it is not perfect. There are innumerable examples now of chemicals that have been identified as carcinogenic in the rat and/or mouse that for a variety of reasons are now considered not to be carcinogenic to humans, either because of qualitative differences in the mechanism of action or, more commonly, striking quantitative differences between species or between exposures. The mechanism of action for certain rodent chemical carcinogens has been identified in animal models and is not relevant to the human situation. Examples include α 2u -globulin-related male rat renal tumors (d-limonene), luteinizing hormone abnormalities and the induction of breast tumors in Sprague-Dawley rats (atrazine), and calcium phosphate-containing precipitate-related bladder tumors in rats (sodium saccharin, sodium ascorbate). Others are likely to be identified. This has led to the delisting of sodium saccharin from the National Toxicology Program's (NTP) congressionally mandated list of carcinogens. Also removed from that list of carcinogens was ethyl acrylate, a rat forestomach carcinogen for which much of the research was performed at the NTP. The International Agency for Research on Cancer has also made an effort to reclassify chemicals based on mechanism of action, and has done so for a variety of chemicals and known mechanisms. In addition to qualitative differences, numerous quantitative differences exist between carcinogenic effects in rodents and humans, such as urinary calculi (melamine, fosetyl-Al) and rat thyroid tumors (sulfamethazine), suggesting no potential carcinogenic risk to humans.
The intent of these studies is to try to predict human carcinogenicity. The same is true for the alternative models. I agree with Johnson and Huff in their statement that The question of how to predict human carcinogens more effectively would benefit from fundamental research.
The 2-year bioassay is useful and is likely to remain so for several years to come. However, it is imperfect, and additional research is needed not only to come up with better screening models but also to provide the necessary mechanism of action information that is needed to interpret the results in the screening assays.
The alternative models involving transgenic and knockout mice have many of the same imperfections as the 2-year bioassay. Johnson and Huff are correct in indicating Environmental Health Perspectives • VOLUME 110 | NUMBER 12 | December 2002

A 737
Correspondence that protocol variables need to be more fully evaluated and defined. Nevertheless, it is already clear that some of these alternative models can provide useful information that is as reliable as the standard 2-year mouse bioassay. The U.S. Food and Drug Administration and the European authorities are already utilizing results from these assays in their evaluation of pharmaceuticals.
The 2-year bioassay is nearly 40 years old. It has proven useful and will continue to do so. However, the intent is not to determine carcinogenicity in rats or mice, but ultimately to accurately predict carcinogenicity in humans. There is no doubt that sometime in the future assays will be developed that are better able to make these predictions than the 2-year bioassay. The development of the alternative animal models that were discussed in the Spheres of Influence article ) is a step in that direction. Additional research is required, but the 2year bioassay should not be viewed as sacrosanct nor as the gold standard by which other assays are to be validated. Appropriate validation is the comparison with human carcinogenesis.

Department of Pathology and Microbiology
University of Nebraska Medical Center Omaha, Nebraska E-mail: scohen@unmc.edu "Bioassay Bashing Is Bad Science": MacDonald's Response I appreciate the opportunity to respond to the letter from Johnson and Huff on the May 2002 Spheres of Influence, "Assessing Assays" . Several points brought up by Johnson and Huff need to be clarified and some need to be challenged: It is disappointing that Johnson and Huff see this article as "bashing the bioassay." Criticism of the rodent bioassay as performed by the National Toxicology Program (NTP) does not imply that contributions to human hazard identification and risk assessment have not been made by this methodology over the 40 years that it has been employed, but is it "bad science" to seek ways to improve this process with new technology? It would seem that, as concerned scientists, our debate should not center around which assay is "better" but rather how we can best identify and use data from a variety of sources to improve our ability to prospectively identify chemicals that pose human risk.
The Spheres article ) described in some detail the efforts to assess the response to a list of well-characterized chemicals in several newly developed transgenic and knockout mice. Although many groups are evaluating these assays now, the efforts of the International Life Sciences Institute/Health and Environmental Sciences Institute (ILSI/HESI) Alternatives to Carcinogenicity Testing Committee were highlighted in the article. The specific intent of this effort by the ILSI/HESI was to gain experience with these newer assays in a carefully controlled experimental manner. The goal was not to replace the bioassay, as Johnson and Huff seem to suggest, but rather to see how (or if) data from these assays could be used in the process of assessing human risk of cancer from chemical exposure (Cohen et al. 2001). Research with these newer assays in the HESI initiative was focused on improving the process of human hazard identification and the use of the information from these assays in a global weight-of-evidence approach to risk assessment; it clearly was not focused on simply replacing one method with another.
Johnson and Huff reacted to my suggestion that conventional rodent models predict human cancer no better than "a flip of the coin." Perhaps this was a bit casual and even provocative, but is it really disingenuous? There is no argument that the rodent bioassay can detect human carcinogens. The problem is that it also identifies many other chemicals that are generally not regarded as human carcinogenic risks (Cohen and Ellwein 1991;MacDonald et al. 1994;McClain 1994). The data from the use of this assay are clear: approximately 50% of the chemicals tested in either the rat or mouse bioassay have yielded positive results (Contrera et al. 1997;Davies and Monro 1995;Gold et al. 1984;Van Oosterhout et al. 1997). Although this number has come down as fewer suspected carcinogens (genotoxic agents) are tested in the NTP Rodent Bioassay Program , there is still a very high rate of positive outcomes. Unless we are expecting a wave of carcinogens to be identified in the next 10-15 years from extensive human use of these agents, we can consider most of these responses as false-positive results. This may not be quite a flip of the coin, but it is not a great improvement for the significant expenditure of time and money for this assay.
Johnson and Huff state in their letter that alternative assays for carcinogen hazard identification such as the transgenic assays cannot be accepted for regulatory use without validation through a process such as the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM). Two important points should be mentioned here. First, the rodent bioassay, which is held up as the "gold standard," has never been validated through a rigorous process such as that proposed by ICCVAM. Given some of the concerns raised above, one wonders if even this well-established assay could satisfy our current demands for validation. Second, the International Conference on Harmonization has published guidelines (International Conference on Harmonization 1995) that clearly state that, for pharmaceutical chemicals, the appropriate use of alternative assays such as transgenic or knock-out mice can suffice to replace the use of one or another of the rodent bioassays. As this practice is occurring with pharmaceuticals in the approval process, we will come to better understand the utility (if not the validity) of these data just as we did with the rodent bioassay.
In their letter, Johnson and Huff point out an important, and appropriate, criticism of the current transgenic and knock-out mouse models: there are several protocol issues that need to be addressed. I participated in the ILSI/HESI initiative for about 6 years as this group struggled to understand how best to apply some of the then new assays to the problem of human hazard identification, specifically in the area of pharmaceutical chemicals. As the results from this effort suggest, data from these alternative assays can be used appropriately, but several important questions remain to be answered. ) Whether these are being adequately addressed with currently ongoing research or whether additional, focused research efforts would be appropriate is the subject of ongoing discussions with the HESI group and a broad cross-section of stakeholders. It is again important to emphasize that the focus of this effort was not and is not replacement of the rodent bioassay, but a better understanding of how data from newer models might aid in our ability to detect potential human carcinogens.
Johnson and Huff also point out appropriately that additional fundamental research on the genetics of human cancer has the potential to improve our ability to reliably detect human carcinogens before these chemicals are released into our environment. Studying transgenic animals and carefully assessing whether the nature of the transgene alters the response to particular chemicals should not be construed as failing to realize the importance of fundamental research, but rather as a central component of this important process.
Assessment of the human risk of cancer is perhaps one of the most difficult tasks in

Use of A-Bomb Survivor Studies as a Basis for Nuclear Worker Compensation
In the Spheres of Influence article in the July issue of EHP, Parascandola (2002) presented our concerns about the validity of extrapolating cancer risks from studies of A-bomb survivors to nuclear workers as a matter of differences in dose rate between the two populations. Our primary critiques of using A-bomb data, however, concern biases that arise from selective survival and dose misclassification (Wing et al. 1999), issues that are not mentioned. Stewart (1985Stewart ( , 1997Stewart ( , 2000 presented evidence of dose-and age-related selective survival in the Japanese cohort assembled for cancer studies 5 years after the nuclear bombing of Hiroshima and Nagasaki. This evidence is in concordance with basic biological principles of heterogeneity in susceptibility and may help explain the inability of A-bomb survivor studies to detect the impacts of in utero radiation exposures on childhood cancers, effects that have been demonstrated repeatedly in low dose studies (Doll and Wakeford 1997;McMahon 1962;Stewart et al. 1956). Age-related selective survival also helps to explain the reported decrease in radiation-cancer dose response among A-bomb survivors exposed at older ages, an observation that deviates from expectations based on the increased sensitivity of older adults to other physical, chemical, and biological agents, evidence of age-related decline in DNA repair capacity, and evidence from some studies of nuclear workers (Richardson et al. 2001;Wing 2000).
Epidemiologic studies depend on accurate exposure classification for valid doseresponse estimation. In addition to selective survival in a population subjected to nuclear attack and subsequent devastation of public health infrastructure, radiation-cancer dose-response estimates from A-bomb studies are further affected by a lack of individual dose measurements and the use of dose reconstruction based on interviews conducted in an occupied nation by a scientific team funded and directed by the U.S. government (Wing et al. 1999). The ability to elicit accurate information on location, position, and shielding was affected not only by traumatization of the survivors and their domestic stigmatization but by their distrust of medical teams working under occupation forces (Lindee 1994).
As Parascandola (2002) noted, we believe that findings from carefully conducted epidemiologic studies of badgemonitored nuclear workers exposed to chronic, low-level ionizing radiation should be considered in implementation of the Energy Employees Occupational Illness Compensation Program Act. Medical practices regarding exposures of pregnant women to diagnostic X rays were changed decades ago on the basis of low-dose studies, even though their findings were not predicted from studies of A-bomb survivors. The question today is, will A-bomb studies continue to dictate estimates of cancer risks in adulthood, despite evidence of bias and the availability of alternative epidemiologic data? The large number of highly exposed survivors in the study, cited as a major strength, may actually be a weakness if it encourages scientists and policy makers to confuse statistical precision with valid doseresponse estimates that depend on an absence of selective survival and correct exposure classification.