Ecological risk assessment of endocrine disruptors.

The European Centre for Ecotoxicology and Toxicology of Chemicals proposes a tiered approach for the ecological risk assessment of endocrine disruptors, integrating exposure and hazard (effects) characterization. Exposure assessment for endocrine disruptors should direct specific tests for wildlife species, placing hazard data into a risk assessment context. Supplementing the suite of mammalian screens now under Organization for Economic Cooperation and Development (OECD) validation, high priority should be given to developing a fish screening assay for detecting endocrine activity in oviparous species. Taking into account both exposure characterization and alerts from endocrine screening, higher tier tests are also a priority for defining adverse effects. We propose that in vivo mammalian and fish assays provide a comprehensive screening battery for diverse hormonal functions (including androgen, estrogen, and thyroid hormone), whereas Amphibia should be considered at higher tiers if there are exposure concerns. Higher tier endocrine-disruptor testing should include fish development and fish reproduction tests, whereas a full life-cycle test could be subsequently used to refine aquatic risk assessments when necessary. For avian risk assessment, the new OECD Japanese quail reproduction test guideline provides a valuable basis for developing a test to detecting endocrine-mediated reproductive effects; this species could be used, where necessary, for an avian life-cycle test. For aquatic and terrestrial invertebrates, data from existing developmental and reproductive tests remain of high value for ecological risk assessment. High priority should be given to research into comparative endocrine physiology of invertebrates to support data extrapolation to this diverse fauna.

disruptors and supports the establishment of new wildlife screening and testing protocols.
Our strategy for ecotoxicity screening and testing is discussed versus the proposed EDSP (2), based on the earlier report from the EDSTAC (1). Specifically, our critical review of the EDSTAC considers both the scientific rationale and ethical use of animals for ecotoxicity hazard assessment (5). Throughout this exercise, we support the internationally agreed definition from the 1996 Weybridge workshop, whereby an endocrine disruptor is "an exogenous substance which causes adverse effects in an organism, or its progeny, subsequent to changes in the endocrine system." (6). In vitro test systems are not addressed in our present discussion because it has been widely agreed at several international workshops that endocrine-disruptor assessments in wildlife should primarily focus on in vivo studies (7,8).

The Ecological Risk Assessment Context
Although evaluation of endocrine disruptors is a relatively new area, both field and laboratory studies have been conducted to define ecological effects, determine sources, and characterize the ecological risk of selected endocrine-active substances. For example, recent progress toward this goal is illustrated by tributyltin (9). In general, however, further work is needed to evaluate potential endocrine disruptors within the established ecological risk assessment concept (2,10,11) ( Figure 1). Although much will be learned through the on-going application of the ecological risk assessment paradigm to both natural and synthetic endocrine disruptors, we support the views of Kendall et al. (4) in that "There is no need to develop a new framework for ecological risk assessment of endocrine disrupters." What is needed, however, is a scientifically and ethically justifiable approach to prioritizing endocrine-disruptor screening and testing that effectively protects the diversity of wildlife.
The established risk assessment framework provides a robust tool with which to evaluate the impacts of natural and synthetic toxicants, endocrine disruptors, and other stressors on ecosystems. This framework balances exposure characterization versus effects characterization, taking into account the need for test validation, data acquisition, and field monitoring.
Exposure assessment for potential endocrine disruptors is important in directing specific-effects testing, such that risk assessment and risk management can proceed. Exposure assessment may be defined as the contact between the bioavailable fraction of the compound of interest and the organisms of concern. A tiered approach to exposure assessment whereby conservative assumptions in the estimate are progressively refined is likewise appropriate for endocrine disruptors as well as compounds that may be active via other mechanisms.
Potential for exposure should drive the selection of appropriate test organisms in hazard assessment. For example, where a pesticide is sprayed directly onto crops, it is reasonable to expect potential exposure to aquatic organisms and birds via spray drift. The expected environmental concentrations should then be compared with the toxic concentration to aquatic organisms and birds to determine the potential ecological risk. For natural or synthetic substances discharged via wastewater into rivers, aquatic organisms are expected to be exposed if the substance is not degraded during wastewater treatment. In addition, the bioaccumulation and biomagnification potential of the substance should be assessed to determine if fish-eating birds and mammals might also be at risk.
Once potential for exposure is determined, suitable effects tests should then be selected and the hazard assessed. The types of tests used should address the sensitivity of the faunal populations in the context of diverse exposure scenarios (Figure 2), adapted from Solomon (12).
In comparing the degree of toxicity with the level of exposure, the risk of the compound may then be characterized and any necessary risk management can be conducted ( Figure 1). Ultimately, potential impacts from synthetic substances should be considered in the context of natural stressors to help define what is an acceptable ecological impact. The concept of acceptability underpins the regulatory programs for pesticides in many countries. For example, the U.S. EPA characterizes "unacceptable" as "widespread and repeated mortality in the face of minor economic benefits to society" (11,13). Finally, it is essential that new test methods should provide data that can be related directly to the field monitoring method. The successful field and laboratory measurement of vitellogenin (VTG) induction in fish illustrates the biological linkage concept (14).

Overview of Conceptual Testing Framework
Although we fully support the principle of EDSTAC's tiered screening and testing program, we suggest that this can be enhanced, as shown in Figure 3, to provide more ecologically relevant and responsible use of limited animals for testing.
Key elements of our approach are, first, to focus testing needs through a greater consideration of exposure characterization data when deciding which types of chronic tests (e.g., avian, fish, or invertebrate) may be required at the higher tiers (15). Second, our scheme offers a pragmatic alternative to the technical and ethical concerns raised by the EDSTAC recommendation to move to extensive multispecies testing directly after the Tier 1 screening, without first recognizing the value of partial life-cycle (PLC) tests, which use fewer animals. The following concepts have proved useful in consideration of new wildlife protocols for endocrine-disruptor screening and testing: a) conceptual protocol, which refers to an idea for a novel protocol that has not yet been actually conducted and reported in the peer-reviewed scientific literature; b) developmental protocol, a protocol that has been published in the peerreviewed scientific literature and forms the basis for further research (e.g., using a range of reference endocrine-disruptor substances) toward the development and prevalidation of a reliable method; and c) validated protocol, the appropriate term for a protocol that has been demonstrated to be reliable and reproducible as a result of appropriate intralaboratory and interlaboratory comparisons. Protocols need to achieve the validated status before they can be reliably employed for regulatory purposes, for example, within the Organization for Economic Cooperation and Development (OECD) test guideline program.

Screening for Interactions with Endocrine Systems
In accordance with EDSTAC, we agree on the need for in vivo screening to identify the ability of a compound to interact with the endocrine system, primarily for androgen, estrogen, or thyroid hormone activities. Importantly, the purpose of Tier 1 screening ( Figure 3) should be to provide alerts to endocrine-active substances, whereas highertier tests should provide adverse effects data for risk assessment application. Previous international workshops have agreed that screening with laboratory mammals is sufficiently comprehensive to cover wild mammalian species; however, because of their distinct reproductive physiology, the screening battery should include oviparous species (7,8). It is also essential to undertake an efficient number and range of Tier 1 screens in order to develop a mechanistic rationale that will allow differentiation between substances which directly impact the endocrine system (primary endocrine effects) versus those substances which primarily affect other target organs before causing secondary endocrine effects (16). As argued by Purchase (17), there is a need for specific screening assays that use a minimum number of test animals, taking into account the environment compartment of concern for the risk assessment. The candidate screening assays included in the EDSTAC proposals are discussed in the subsequent section.
Frog metamorphosis screening assay. EDSTAC's main objective of the inclusion of the frog (Xenopus laevis) metamorphosis assay as a Tier 1 screen is to detect thyroid agonists and antagonists, rather than to represent amphibians as a taxonomic group (1). EDSTAC outlines a protocol for Amphibia in which the larvae (tadpoles) of Xenopus laevis are exposed to the test substance for 14 days, with the tail resorption rate as the main end point (18). Data are urgently required for a range of thyroid active and negative control substances to evaluate the specificity and sensitivity of the frog metamorphosis assay before international validation. Failure to address these needs, especially assay specificity, will have serious implications for the numbers of animals tested to detect thyroid disruption (17,19).
More broadly, thyroid hormones are critical to the normal processes of development and reproduction in many animals. The amphibian assay described by EDSTAC does not define the necessary experimental design to specifically detect thyroid disruption. We therefore recommend that thyroid hormone disruption may be detected in Tier 1 mammalian screens. If (anti-)thyroidal activity was detected in mammals and if exposure characterization suggested significant exposure in amphibian habitats, then a suitably validated frog larval metamorphosis test could be usefully deployed at a higher tier ( Figure 3).
In many vertebrates, a number of physiologic and morphologic processes are influenced by the thyroid (20). Mechanisms of thyroid action may vary between species and among tissues, with alpha-and beta-variant nuclear receptors having being identified in mammals (21), whereas frog thyroid hormone receptors (THRs) are highly homologous to THRs of other vertebrates. In Amphibia during normal development, thyroid hormone levels rise, resulting in growth and differentiation of the limbs, with tail resorption being the final gross morphologic change within metamorphosis. Metamorphosis is initiated by the binding of thyroid hormone to α-THR, whereas β-THR is presumed to play a later role in developmental programs such as tail resorption. Notably, steroids such as corticoids can modulate thyroid hormone activity during amphibian metamorphosis (enhancement of tail resorption) (22)(23)(24). Also, studies in populations of Great Lakes salmon have also demonstrated that fish can be affected by thyroid-disrupting chemicals (25,26). These observations in wild fish suggest that there is an important opportunity to also detect thyroid disruption in fish partial and full life-cycle tests. In conclusion, potential effects on the thyroid hormone system should in principle be detected by rodent mammalian Tier 1 screening. The decision to evaluate chemical effects in the frog metamorphosis assay would likely be best based on a positive result from the mammalian thyroid screen, and only if exposure characterization predicts significant potential exposure of amphibian habitats. The development of a peer-reviewed database on the sensitivity of fish and amphibians to thyroid disruptors is an important research priority.

Review • Ecological risk assessment
Environmental Health Perspectives • VOLUME 108 | NUMBER 11 | November 2000  (66). c Only required if the substance is active in the larval amphibian metamorphosis test at Tier 2. d Only required if positive in avian reproduction test at Tier 2. e Only required if the substance is active in fish PLC tests at Tier 2.

Exposure-related information
Effects-related information for mammals and wildlife

Ecological risk assessment
Consider exposure data from box 1 to refine exposure assessment, together with effects data from wildlife screening and testing

Fish Screening Assays
Fish gonadal recrudescence assay. EDSTAC originally recommended the concept of the fish gonadal recrudescence assay (GRA), whereby breeding fathead minnows (Pimephales promelas) of both sexes would first be transferred to "winter phase" (short day length and low temperature) conditions to inhibit spawning. Once fish were synchronized in this nonbreeding phase, fish would be transferred to a "summer phase" (increasing day length and temperature regime) to stimulate gonadal maturation and development of secondary sexual characteristics. Proposed end points include assessment of gamete quality, fecundity, gross pathology of gonads, measurement of relative gonad weight (gonadal somatic index), and measurement of vitellogenin levels (a widely used biomarker of estrogen exposure in fish) (8,27). The disadvantages of the fish GRA include the absence of any published data on gonadal recrudescence in OECD test species; unlike other approaches that seek to enhance existing regulatory test guidelines for testing endocrine disruptors (28), the GRA has no clear link to an existing regulatory protocol, and the practical use of end points such as gamete quality and fecundity in such a screening assay incurs a high degree of interindividual variability because cyprinid fish have widely variable fecundity between breeding pairs, even within control populations. Also, considering extending the international use of this approach beyond the fathead minnow, other OECD fish species (e.g., medaka and zebrafish) have different breeding requirements. For example, some zebrafish populations have been reported to display a normal juvenile hermaphrodite stage (29), making baselines potentially uncertain. Although the sexes in fathead minnows are usually very distinct, in other fish species secondary sexual characteristics in males develop differently as dominant (territorial) males appear during group culturing, whereas others show less obvious secondary sexual characteristics (30). In zebrafish, for example, the sex of the fish is difficult to determine when the fish is not in a reproductive stage. On the other hand, zebrafish can produce eggs even at suboptimal conditions; therefore, it is difficult to maintain this species in a nonreproductive stage (31). Overall, there is little prospect that a small fish GRA could be robust or practical, or could specifically detect endocrine-active substances.
Juvenile fish screening assay. An alternative approach using the juvenile stage of OECD fish species was originally proposed at the second Duke workshop, Screening Methods for Detecting Potential (Anti-) Estrogenic/Androgenic Chemicals in Wildlife (8). The use of juvenile fish as the basis of a screening assay has many advantages and should be seriously considered as a practical alternative to the conceptual fathead minnow GRA. Also, using juvenile fish avoids the potential problem associated with adult fish reproductive and territorial behavior, for example, where the presence of sexually dominant males (or females) may lead to behavioral and pheromone modulation of other fish in the laboratory population (30). Once endocrine baselines have been established for a given fish model, it is possible to use sensitive and specific end points (e.g., VTG induction) to screen for endocrine activity.
Using this approach, a number of research groups have shown that juvenile fish are very sensitive to estrogens and antiestrogens (32)(33)(34). Current research sponsored by the European chemical industry is now extending the evaluation of this juvenile fish assay for (anti) androgens and aromatase inhibitors (35). Adapted from the OECD test guideline 204 (36), the juvenile fish assay has significant potential for practical use, a potential that is currently being critically evaluated as part of the OECD prevalidation exercise.

Higher Tier Tests for Hazard Assessment
Higher tier tests should serve to identify effects of concern (e.g., development and reproduction) that are a consequence of endocrine disruption, supporting the effects characterization within ecological risk assessment. Working within the ecological effects and exposure characterization phases of the ecological risk assessment guideline (11), if significant exposure is predicted for a substance that is active in the Tier 1 mammalian or fish screens, then higher tier testing for potential adverse effects (endocrine disruption) is warranted. At this stage, the key terms "adverse effects" and "organisms and their progeny" become critical. Although this is not apparent within EDSTAC (1), such testing should be targeted at the specific ecosystem and wildlife populations of concern (e.g., aquatic animals or seed-eating birds) using data on exposure characterization, rather than unnecessary higher tier testing on all wildlife groups due to endocrine-disruptor concerns per se. In contrast, if the mammalian and fish Tier 1 screens are all negative, then no further testing for potential endocrine disruption is justified and the risk characterization phase should proceed, taking into account all information (both endocrine and nonendocrine related). The candidate higher-tier testing assays included in the EDSTAC proposals are discussed in the subsequent section.
Avian reproduction testing. Since the concerns raised over pesticide poisoning of birds during the 1950s and 1960s, there has been extensive use of avian reproductive toxicity testing in several countries and regions (37)(38)(39). More recently, the OECD is taking a leading role in promoting international harmonization of avian testing, including reproductive toxicity test guidelines and the new question of the endocrine-disruption in birds (40). This is warranted given the distinct reproductive endocrinology of birds, whereby reproductive dysfunction may be expressed in terms of impaired egg laying or abnormal hormonal imprinting of males and females. Importantly, the "default sex" for many bird species is the male, whereas in mammals the "default sex" is female (14). Avian sex ratios may be more plastic, however, than was traditionally thought, implying that sex determination is affected by environmental factors (41,42).
The EDSTAC proposal appears to suggest that for chemicals which are positive in the Tier 1 screens, avian reproduction testing is necessary at Tier 2, although the triggers for requiring such an avian study are not defined. Specifically, EDSTAC recommends conducting reproduction tests with two avian species, northern bobwhite quail (Colinus virginianus) and mallard duck (Anas platyrhynchos), using existing protocols with possible modifications to enhance the ability to detect endocrinerelated effects. The existing avian reproduction protocols are well known, and it is estimated that they have been used approximately 300-500 times (43,44). These methods are standardized and regarded as adequately reproducible; hence, protocols for both northern bobwhite quail and mallard duck have de facto acceptance by the scientific community as being valid. The EDSTAC recommendations for suggested modifications to the current one-generation avian reproduction protocol include extending the current study design to two generations, measuring circulating sex steroid and thyroid hormones, and monitoring a suite of morphologic and functional parameters in offspring.
Higher tier testing with Japanese quail. The use of Japanese quail (Coturnix coturnix japonica) promises to be a cost-effective alternative for endocrine-disruptor testing, and the draft revised OECD test guideline 206 is currently being adapted for this species (45). In general, reproductive parameters such as egg production and egg quality, fertility, embryo survival, hatching success, and growth and development are part of the current draft. Gonad histology is at present not included as a requirement for the draftrevised OECD test guideline 206, but histology is mentioned in the EDSTAC proposals as a possible further end point, subject to research. Also, gonadal gross morphology and accessory organ development of offspring may be a valuable addition to the current Review • Hutchinson et al.
draft in order to evaluate the potential effects on the endocrine system of birds (7). Two-generation avian testing. There are a number of scientific and practical challenges in moving directly to an avian two-generation study following Tier 1 screening with other species. These include the extension of the northern bobwhite quail or mallard duck test protocol to a two-generation study, makes these tests even longer and more costly, thus increasing international interest in the Japanese quail (45) and the modification mentioned in the EDSTAC report (1) for extending the current U.S. EPA protocol to include end points such as plasma steroid levels, organ weights, and histopathology. An extensive validation program with known endocrine disruptors is needed to identify the potential relevancy of recommended biochemical and neurobehavioral parameters for developmental and reproductive effects testing in birds. Extensive control data in the various avian species of potential interest are also essential to allow reliable interpretation of the many parameters in the EDSTAC proposals, and the specificity of the various endocrine, behavioral, and morphologic parameters are unknown among bird species or across chemical groups. Natural variations in blood hormone concentrations in most avian species is unknown; thus, comparisons of experimentally derived laboratory data to baseline normal values would not be possible without further research. Because biological significance of experimentally derived effects in birds may be difficult to define, a large research effort is needed to provide a scientific context in which to interpret the regulatory meaning of new end points proposed for the avian testing tier. The key reason for using the mallard duck is the demonstrated ability to show egg-shell thinning after exposure to contaminants (46); however, this effect has only been observed with a very small percentage of test compounds. Other end points (e.g., behavioral tests or steroid levels) may be added at a later stage, but not before fundamental research and validation efforts. Therefore, we propose that for both scientific and practical purposes, there is a useful role in higher Tier 2 for the Japanese quail reproduction test, with the multigenerational avian study being more appropriate for higher Tier 3 ( Figure 3).
Fish higher-tier testing. The EDSTAC proposals include the use of a full life-cycle test on the freshwater fathead minnow as a definitive Tier 2 test for endocrine disruption in fish. Additionally, it is suggested that other species such as the sheepshead minnow should be used when estuarine or marine environments are expected to be exposed to the test compound. In the fish full life-cycle (FFLC) test as standardized by the U.S. EPA, fathead minnows are exposed to the test compound from the egg (F0 generation) to early development of the F1 generation offspring, with a test duration of approximately 9-10 months. At maturation of the F1 generation, breeding pairs or groups are formed to promote spawning (47). The end points analyzed in the existing FFLC study include spawning frequency, number of eggs produced, viability of embryos, hatching success, growth, and development. The EDSTAC recommends the addition of other parameters such as gonad weight and histopathology, sperm motility and egg maturation, and plasma VTG and steroid hormone levels.
Although multigenerational studies are the ultimate test for evaluating the effects of compounds that may be bioaccumulative or have specific modes of action posing hazards to progeny, there are also significant limitations to the current regulatory FFLC method. A major disadvantage of FFLC tests is their technical complexity. In fact, because of these problems and high cost, FFLC tests are not routinely conducted for any class of substance. FFLC tests have, for example, been used to address concerns over persistent pesticides (48,49) or endocrine-active pharmaceuticals that affect the reproductive health of fish (50). EDSTAC proposes that the FFLC test should be conducted after a trigger from either the sorting of initial data or from the screening tier (1). A direct transition to such a high level definitive test apparently ignores the practical necessity to conduct a chronic range-finding test with the same fish species before conducting an FFLC test. We therefore recommend incorporation of an intermediate testing tier (Figure 3), including a fish development (extended embryo-larval) test (51) and a fish reproduction test based on pair-breeding adults (52). Our recommendation reflects the fact that for most chemicals, fish embryo-larval tests are predictive of effects in FFLC tests (48,49); a fish development test will cover the most sensitive stage of sex determination, whereas an adult fish reproduction test will address both specific and nonspecific impacts on fecundity and related parameters. The number of substances tested could be greatly increased, taking into account the duration and technical challenges of the studies and the capacities of laboratories with experience in FFLC tests. We estimate that probably 10 times more compounds can be tested worldwide using these PLC tests versus the EDSTAC proposal for FFLC tests (approximately 200 compounds per year as opposed to 20 per year, respectively).
The addition of new endocrine disruptor-relevant end points requires considerable development and validation before they can be considered useful for regulatory purposes. Subsequently, end points may best be targeted at the endocrine mechanisms of interest on a case-by-case basis, depending on results from lower tier assays. The ability to relate end points at testing Tiers 2 and 3 to those used at the Tier 1 screens would be beneficial, offering a mechanistic basis to support ecological risk assessments (Figure 3) (16). Therefore, the use of a short-term test procedure on fish using selected biochemical parameters (e.g., VTG) would be desirable to make such correlations on a clearer mechanistic basis. This approach offers a practical means to developing linkage tools for iteration between field and laboratory that can be used as necessary to support the predictive risk assessment process (11,14).
Currently, all fish early life-stage study (ELS), PLC, and FFLC tests include critical elements of growth and development, and thereby are amenable to examination of thyroid hormone-related effects (26). In the future, determination of such effects in fish may be sufficient to detect potential effects of thyroid-active compounds (with reference to the mammalian Tier 1 data for thyroid activity) and potentially negate the need for duplicative testing with Amphibia. This possibility should be reviewed as data become available; comparative research into this aspect is highly desirable. Determination of genetic sex differentiation in fish may also prove useful and would enable the sex ratio of phenotypically similar offspring to be determined. Knowledge of the genotypic sex ratio relative to the phenotypic sex ratio would be useful, but markers for genetic sex need to be developed for this purpose (53).
In conclusion, we support the conduct of higher tier tests using fathead minnows as the preferred test species for chronic studies of compounds that are active in Tier 1 and that are predicted to enter aquatic ecosystems. We suggest, however, that the EDSTAC recommendation to default to the conduct of FFLC tests at Tier 2 is neither scientifically optimal nor ethically responsible in terms of the numbers of animals involved. Instead, we recommend that a functionally equivalent approach be considered, incorporating fish ELS and PLC tests at Tier 2 and the FFLC test at Tier 3 ( Figure 3). Additional end points may need to be incorporated into the protocols for these various tests, and these end points should be validated and their usefulness assessed before regulatory implementation. Subsequent measurements may be targeted at different end points if the specific mode of action of a compound is known (e.g., androgens, aromatase inhibitors, thyroid-active chemicals). To be optimal, this will require expert judgment and dialogue in Review • Ecological risk assessment Environmental Health Perspectives • VOLUME 108 | NUMBER 11 | November 2000 the problem formulation and risk characterization phases of the ecological risk assessment paradigm (11) (Figure 1).
Higher tier invertebrate testing. EDSTAC (1) and the consequent EDSP proposal (2) both recommend the use of crustacean lifecycle assays for higher tier testing, with daphnids or mysid shrimps as the suggested test organisms. For daphnids, the EDSP proposes a 3-week chemical exposure period (flowthrough conditions), and the end points currently evaluated include length of time for the appearance of the first brood, body length of F0 females, cumulative number of young produced per female, and effects on the F1 generation (number of offspring, offspring lengths, and cumulative mortality). For the mysid shrimp (Americamysis bahia), the test protocol involves a 6-week chemical exposure period (flow-through conditions), and the end points currently evaluated include length of time for the appearance of the first brood, sex determination (sex ratio), body length of males and females, cumulative number of young produced per female, and effects on the F1 generation (number of males and females, body length of males and females, and cumulative mortality).
Given the scope of EDSTAC and the EDSP, the scientific criteria for selecting an arthropod species to test for (anti)androgenic or estrogenic effects is scientifically unclear, although it is a pragmatic use of existing regulatory test guidelines. Nevertheless, we support the view that where aquatic invertebrates may be exposed to endocrine-active substances, and because laboratory daphnid populations consist of parthenogenetic females, an alternative species to daphnids may be necessary for the assessment of chronic effects of endocrine disruptors in sexually reproducing Crustacea. For example, the EDSTAC proposal to consider mysid shrimp may be a valid option for a chronic invertebrate species in endocrine effects testing. The advantages are that mysids have been widely used in regulatory ecotoxicology, they are a sexually dimorphic species, and there is an extensive reference database on reproductive toxicity to this species. Unfortunately, the chronic mysid test includes several technical disadvantages, which include major knowledge gaps regarding the basic endocrinology of mysids; identification of the sexes is technically difficult, and parental cannibalism upon offspring can compromise assessment of reproductive output. Given these aspects, reproduction studies in other invertebrate species may prove to be more suitable for this test. Positive features of alternative invertebrate test guidelines would include much easier culturing and testing procedures than those for the saltwater mysid and use of freshwater or synthetic sea water for culturing and chronic toxicity testing.
Currently, however, there are relatively few data on the influence of endocrine disruptors on development and reproduction in Crustacea, and further research is necessary. Indeed, mysids and daphnids, like many other arthropods, have a specific hormone regulation for growth and reproduction, namely the ecdysteroid system (54). This is very different from the androgen-and estrogen-based regulation of reproduction in mammals and oviparous vertebrates. The EDSTAC suggestion that estrogen or androgen disruptors are likely to interfere with the ecdysteroid activity is speculation and does not reflect the other hormonal mechanisms known to be important in arthropods. Additionally, juvenile hormones should be considered in insects and crustaceans (55).
Although changes in estrogen and androgen metabolism have been measured in Daphnia magna exposed to high doses of xenobiotics (56), the putative role of these steroids in daphnid physiology remains unclear. Baldwin et al. (56) also found that high doses of diethylstilbestrol affected the reproduction and growth of D. magna, whereas Kopf (57) found a decrease in reproduction after exposure of D. magna to 17βestradiol in the low micrograms-per-liter range. In contrast, Schweinfurth et al. (58) found no effect of 17α-ethynylestradiol in D. magna exposed at a level of several hundred micrograms per liter. In conclusion, although daphnids may show developmental and reproductive effects associated with toxic effects of vertebrate endocrine disruptors (56,59), there is little evidence that such effects are mediated via the ecdysteroid or juvenile hormones. In addition to daphnids, a number of other aquatic invertebrate species have been used to evaluate development and reproductive effects of vertebrate endocrine disruptors. Recent work has shown inhibition of development and reproduction in marine crustaceans (Tisbe battagliai) by 20-hydroxyecdysone and diethylstilbestrol (60). In contrast, life-cycle (21 day) studies with this copepod species did not show significant effects after exposure at up to 100 mg/L of either 17β-estradiol, estrone, or 17α-ethynylestradiol (61). Protocols for developmental and reproductive end points have also been produced for chironomids (62) and have been applied to phthalates and other important classes of industrial chemicals known to be endocrine active (63,64). As part of the ongoing research program of the European Chemical Industry, a review addressing the use of aquatic invertebrates to detect endocrine activity potential has been carried out in collaboration with the U.K. Environment Agency (65). It was concluded that the role of sex steroid hormones in development and reproduction of arthropods and the potential sites of action of endocrine disruptors have yet to be established. In other invertebrate taxa, however, there is clear evidence of hormonal disruption. For example, in gastropod molluscs, the biocide tributyl tin is suggested to inhibit the aromatization of testosterone in female snails, thus increasing the endogenous androgen level in these organisms (9,66).
The international Society of Environmental Toxicology and Chemistry (SETAC)-OECD expert workshop on Endocrine Disruption in Invertebrates: Endocrinology, Testing and Assessment (EDIETA) held in the Netherlands in December 1998 has recently published its conclusions (67). Key aspects of invertebrate endocrinology and physiology are discussed, together with laboratory test methods and the use of aquatic and terrestrial invertebrates for environmental monitoring and assessment. Many of the standard methods for conducting toxicity tests with invertebrates were reviewed at this workshop: it was concluded that although no methods were designed to specifically evaluate endocrine disruption, many invertebrate tests include end points that may be endocrine responsive (e.g., development, growth, reproduction). It was highlighted, however, that basic research on invertebrate endocrinology for diverse taxa is urgently needed to remedy our ignorance of mechanisms of action, physiologic control, and hormone structure and function in invertebrates. Research was called for to evaluate known endocrine disruptors with a variety of invertebrate bioassays using a suite of designated reference compounds (67).
In conclusion, for Crustacea and other arthropods, juvenile and ecdysteroid hormones appear to be more important than vertebrate-type steroids in influencing sexual differentiation, growth, and reproduction (55). We therefore suggest that before implementing the proposal to employ daphnid or mysid life-cycle tests for endocrine disruptors, it is essential to address the serious gaps in our basic knowledge of endocrine function in these and other invertebrate taxa. In terms of invertebrates often used in ecological effects characterization, further activity in this area should be assisted by the recommendations contained in the EDIETA workshop report.
Amphibian extended development and reproduction. Although EDSTAC proposed a higher-tier amphibian assay incorporating extended development and reproduction, no specific assay was described (1). Subject to adequate research, we suggest that the inclusion of such an amphibian chronic test might be a technically viable option, but only if triggered by lower tier tests and by exposure characterization (Figure 3). From an ethical perspective, research efforts need to address the comparative effects of a range of Review • Hutchinson et al.
endocrine disruptors on fish versus amphibian development and reproduction in order to establish potential redundancy between the higher-tier fish and amphibian test methods. The merits and objectives of a specific test with amphibian species on endocrine-disrupting effects was discussed in the "Frog metamorphosis screening assay" section.
A validated frog chronic test for developmental and reproductive effects may be justified at Tier 3 ( Figure 3) if exposure is likely to be significant and if the substance is active in the Tier 2 frog metamorphosis test. For relevance to the field situation on such a case-by-case basis, this type of amphibian chronic test should consider an ecologically representative species (for the protection of wild amphibian populations) together with exposure characterization.

Protocol Development and Validation
Validation can be defined as the input control technique used to detect any data that are inaccurate, incomplete, or scientifically unreasonable. The basic challenge is to demonstrate that a given assay is both biologically meaningful (ecologically relevant) and is reproducible in the hands of the international scientific community. Screening assays should also be able to resolve whether endocrine interactions are due to the (anti)estrogenic, androgenic, and thyroid activity of a given substance. The critical objective of the Tier 1 screening assays is to detect chemicals that may be active in vivo, thereby taking into account bioavailability, comparative metabolism, and excretion of a substance in animals. Such in vivo screens should ideally avoid false positives, but it is essential that the potential for false negatives is minimized.
Interspecies comparisons. The validation of in vivo assays should provide definition of the similarities and differences between the different animal species involved and be relevant to wildlife populations of concern for a valid ecological risk assessment process (11). Wherever possible, compounds that have been selected for the validation of mammalian protocols should also be considered for the validation of protocols with egg-laying species. The development of endocrine-disruption tests with novel ecotoxicity test species, for example Amphibia, is more problematic because the baseline data for their reproductive biology and toxicology is far less developed than is the case for existing OECD test species. This basic problem should be recognized in drawing together a timetable of validation for the wildlife protocols and particularly for the protocols on thyroid-active developments. A similar challenge exists for invertebrates; arthropod tests should first establish the effects of endogenous hormones and their active metabolites before blindly testing xenobiotics as putative endocrine disruptors. This approach is being evaluated at present using 20-hydroxyecdysone in a number of crustacean systems, including daphnids (68) and harpacticoid copepods (60).
Selection of reference chemicals. A number of strategies for endocrine-disruptor screening and testing include proposals for protocols that are currently in the conceptual phase of development. To ensure scientific dialogue in protocol development from the start and to avoid any party having to go back in the future to repeat key studies, it is essential that the international scientific community agree on a list of reference chemicals for use in research and development as part of an integrated scientific effort (69). The chemical selection process should consider reference substances for each hormone end point (namely, androgens, estrogens, and thyroid hormones; both negative and positive controls), natural and manmade substances, composition and definition of reference substance purity, and verification of chemical stability in the test systems. In the current European Chemical Industry aquatic research program, the endocrine-disruptor compounds selected include diethylstilbestrol, 17α-ethynylestradiol, fadrozole, flutamide, genistein, methoxychlor, methyltestosterone, 4-tert-pentylphenol, and the pure antiestrogen ZM189,154 (33,70). These compounds were principally chosen in view of their published database from mammalian systems, and the intention is to generate comparative data for a range of animal species looking at a range of diverse endocrine-disruptor end points. For invertebrates, the recent EDIETA workshop recommended comparative research using a variety of reference compounds including methoprene, precocene, 20-hydroxyecdysone, luteolin, fadrozole, methyltestosterone, flutamide, 4-tert-pentylphenol, ethynylestradiol, and the anti-estrogen ZM189,154. In summary, the method development and validation process should be made a joint activity between governmental regulators, international harmonization organizations such as the OECD, and industry research groups. Such positive collaboration will help ensure that we make best use of existing data, initiated research programs, and collaborative activities in order to avoid duplication and conflicting approaches (71). Science is developing rapidly in the endocrine-disruptor arena; we all need to take such developments into account during the development of prospective hazard and risk assessment programs (72).

Conclusions
We advocate the adoption of the existing ecological risk assessment guidelines (11) for addressing the risk characterization and risk management of natural and synthetic endocrine disruptors. Within these guidelines, we propose that although the exposure characterization phase remains in principle the same for endocrine disruptors as for other substances, there are convincing scientific arguments for seeking to strengthen the effects characterization (hazard assessment) component to efficiently deal with potential endocrine disruptors and their wildlife impacts. In response to recommendations from the EDSTAC (1) and proposed EDSP (2), we suggest a refined set of screening and testing tiers to optimize scientific iteration between characterization of both exposure and ecological effects, to minimize the number of animals required for testing, and to make optimal use of public and commercial testing resources. With a scientific flexibility to allow incorporation of ideas from the OECD and other scientific groups active in the endocrine-disruptors area, the ECETOC approach ( Figure 3) is considered to offer an optimal way forward for the development and validation of new endocrine-disruptor screening and testing protocols using selected reference chemicals. Once validated tests are available, the emphasis should be on the use of higher-tier data for the ecological risk assessment of both natural and synthetic endocrine-disrupting substances.