Screening and Testing for Endocrine Disruption in Fish—Biomarkers As “Signposts,” Not “Traffic Lights,” in Risk Assessment

Biomarkers are currently best used as mechanistic “signposts” rather than as “traffic lights” in the environmental risk assessment of endocrine-disrupting chemicals (EDCs). In field studies, biomarkers of exposure [e.g., vitellogenin (VTG) induction in male fish] are powerful tools for tracking single substances and mixtures of concern. Biomarkers also provide linkage between field and laboratory data, thereby playing an important role in directing the need for and design of fish chronic tests for EDCs. It is the adverse effect end points (e.g., altered development, growth, and/or reproduction) from such tests that are most valuable for calculating adverseNOEC (no observed effect oncentration) or adverseEC10 (effective concentration for a 10% response) and subsequently deriving predicted no effect concentrations (PNECs). With current uncertainties, biomarkerNOEC or biomarkerEC10 data should not be used in isolation to derive PNECs. In the future, however, there may be scope to increasingly use biomarker data in environmental decision making, if plausible linkages can be made across levels of organization such that adverse outcomes might be envisaged relative to biomarker responses. For biomarkers to fulfil their potential, they should be mechanistically relevant and reproducible (as measured by interlaboratory comparisons of the same protocol). VTG is a good example of such a biomarker in that it provides an insight to the mode of action (estrogenicity) that is vital to fish reproductive health. Interlaboratory reproducibility data for VTG are also encouraging; recent comparisons (using the same immunoassay protocol) have provided coefficients of variation (CVs) of 38–55% (comparable to published CVs of 19–58% for fish survival and growth end points used in regulatory test guidelines). While concern over environmental xenoestrogens has led to the evaluation of reproductive biomarkers in fish, it must be remembered that many substances act via diverse mechanisms of action such that the environmental risk assessment for EDCs is a broad and complex issue. Also, biomarkers such as secondary sexual characteristics, gonadosomatic indices, plasma steroids, and gonadal histology have significant potential for guiding interspecies assessments of EDCs and designing fish chronic tests. To strengthen the utility of EDC biomarkers in fish, we need to establish a historical control database (also considering natural variability) to help differentiate between statistically detectable versus biologically significant responses. In conclusion, as research continues to develop a range of useful EDC biomarkers, environmental decision-making needs to move forward, and it is proposed that the “biomarkers as signposts” approach is a pragmatic way forward in the current risk assessment of EDCs.


Monograph
There is a powerful weight of evidence that many water bodies receive significant inputs of natural and synthetic chemicals (from both diffuse and point sources) that act as endocrine disrupting chemicals (EDCs), which constitute a threat to the reproductive health of fish populations. The modes of action of concern include androgen and estrogen agonists and antagonists, aromatase inhibitors, and also thyroid disruptors [reviewed by Fairbrother et al. (1999); Tyler et al. (1998); Vos et al. (2000)]. Sources of EDCs include municipal sewage discharges and some industrial effluent (e.g., pulp mills), as reported by scientists in Europe (Hecker et al. 2002;Jobling et al. 1998), North America (Parks et al. 2001), and Japan (Hashimoto et al. 2000). This situation gave rise to research efforts, both nationally and internationally, on the development of testing strategies for EDCs [U.S. Environmental Protection Agency (U.S. EPA) 1998; Fenner-Crisp et al. 2000;Huet 2000;Hutchinson et al. 2000]. From an international context, the Organisation for Economic Co-operation and Development (OECD) seeks to ensure, for example, that proposed fish test guidelines on EDCs measure biologically relevant end points and that these end points are reproducible between laboratories. From an international perspective, the OECD has formed a special subcommittee to oversee the identification and validation of internationally harmonized test guidelines to assess EDCs in both humans and wildlife (Huet 2000). A major test method validation program was also launched in the United States following the 1996 U.S. Congress mandate for the U.S. EPA to develop and implement a screening and testing program to detect certain types of EDCs (U.S. EPA 1998a). Recommendations from an advisory body to the U.S. EPA (1998b) include expansion of the scope of the program to a) include not just estrogens but also chemicals that affect any aspect of the hypothalamic-pituitary-gonadal (HPG) and thyroid axes; b) consider wildlife, in addition to human health effects; and c) expand the chemical universe of concern to all chemicals on the U.S. EPA TSCA (Toxic Substances Control Act) inventory (> 80,000 chemicals). For practical reasons, in this article we focus on the fish species currently included in OECD chronic test guidelines, particularly fathead minnow (Pimephales promelas), medaka (Oryzias latipes), rainbow trout (Oncorhynchus mykiss), sheepshead minnow (Cyprinodon variegatus), and zebrafish (Danio rerio) (OECD 1992). However, the concepts derived and proposed can be applied to other fish species.

Biomarkers for EDCs in Fish
Definitions. Biomarkers are broadly defined as a change in a biological response (ranging from molecular through cellular and physiological responses) that can be related to exposure to or toxic effects of environmental chemicals [Peakall 1994; see also Huggett et al. (1992) and van der Oost et al. (2003)]. The integrated use of biomarkers such as plasma steroid hormones, vitellogenin (VTG), and gonad histology has advanced the understanding of fish reproductive toxicology in both field and laboratory studies, as well as provided mechanistic alerts for other aquatic taxa (e.g., amphibians, echinoderms, and mollusks) that may share similar reproductive hormone systems.
Biomarkers for EDC assessments. Although potentially useful for predicting possible adverse effects of chemicals at the population level, an exclusive focus on apical (whole animal) end points can result in shortcomings in Biomarkers are currently best used as mechanistic "signposts" rather than as "traffic lights" in the environmental risk assessment of endocrine-disrupting chemicals (EDCs). In field studies, biomarkers of exposure [e.g., vitellogenin (VTG) induction in male fish] are powerful tools for tracking single substances and mixtures of concern. Biomarkers also provide linkage between field and laboratory data, thereby playing an important role in directing the need for and design of fish chronic tests for EDCs. It is the adverse effect end points (e.g., altered development, growth, and/or reproduction) from such tests that are most valuable for calculating adverse NOEC (no observed effect oncentration) or adverse EC 10 (effective concentration for a 10% response) and subsequently deriving predicted no effect concentrations (PNECs). With current uncertainties, biomarker NOEC or biomarker EC 10 data should not be used in isolation to derive PNECs. In the future, however, there may be scope to increasingly use biomarker data in environmental decision making, if plausible linkages can be made across levels of organization such that adverse outcomes might be envisaged relative to biomarker responses. For biomarkers to fulfil their potential, they should be mechanistically relevant and reproducible (as measured by interlaboratory comparisons of the same protocol). VTG is a good example of such a biomarker in that it provides an insight to the mode of action (estrogenicity) that is vital to fish reproductive health. Interlaboratory reproducibility data for VTG are also encouraging; recent comparisons (using the same immunoassay protocol) have provided coefficients of variation (CVs) of 38-55% (comparable to published CVs of 19-58% for fish survival and growth end points used in regulatory test guidelines). While concern over environmental xenoestrogens has led to the evaluation of reproductive biomarkers in fish, it must be remembered that many substances act via diverse mechanisms of action such that the environmental risk assessment for EDCs is a broad and complex issue. Also, biomarkers such as secondary sexual characteristics, gonadosomatic indices, plasma steroids, and gonadal histology have significant potential for guiding interspecies assessments of EDCs and designing fish chronic tests. To strengthen the utility of EDC biomarkers in fish, we need to establish a historical control database (also considering natural variability) to help differentiate between statistically detectable versus biologically significant responses. In conclusion, as research continues to develop a range of useful EDC biomarkers, environmental decision-making needs to move forward, and it is proposed that the "biomarkers as signposts" approach is a pragmatic way forward in the current risk assessment of EDCs. both diagnostic and predictive risk assessments. Specifically, apical end points (e.g., growth, development, and reproduction) lend little insight as to causative modes of action (MOA), which can only be understood through the collection of information on biomarker responses. This principle is particularly critical to the assessment of EDCs, for which there is an explicit regulatory focus on specific MOAs. To address this issue, fish assays that are being developed for EDC testing feature a suite of end points at multiple levels of biological organization that are indicative both of possible adverse outcomes and MOA. These measurements include adverse effect end points (i.e., survival, growth, morphological development, and reproduction) and EDC biomarkers (including secondary sexual characteristics, gonadosomatic index, plasma steroids, VTG, and gonad histology) (Ankley and Johnson 2004;Hutchinson 2004). In some cases, for example, gonadal histology, it is possible that microscopic observations would suggest a significant potential for adverse effects. For instance, if a fish exposed to a substance has no Sertoli cells as a consequence of an endocrine disruption during early life, then those animals are almost certainly infertile. Similarly, if chemically exposed fish do not develop a gonadal duct, the gametes cannot be transported to the exterior and, thus, these fish cannot breed successfully. Finally, if developing fish were exposed to an EDC during the critical life stages of sexual development that led to complete functional sex reversal, then gonad histology may not show an adverse impact on the individual, whereas there would be a major adverse effect on the population. Moving these issues forward requires the development of a toxicological pathology atlas for fish to help better classify and understand the importance of such observations [for broader discussion of the value of fish histopathology, see Au (2004); Maack and Segner (2003); . For examples of fish histology atlases, see van der  and Yonkos et al. (2000).
VTG-a biomarker for estrogenic disruption. Both field and laboratory studies have shown the value of VTG as a rapidly inducible biomarker for estrogens and antiestrogens in both adult and juvenile fish Nilsen et al. 2004;Panter et al. 2002;Sumpter and Jobling 1995;Thorpe et al. 2000). VTG is normally synthesized in response to endogenous estrogen and is the precursor for egg yolk. It is usually present only in the plasma of female fish, although males do possess the VTG gene, which can be readily induced by exposure to exogenous estrogens. Importantly, VTG is a large glycol-lipid protein and, in some species, is particularly labile. The susceptibility of VTG to cleavage can lead to products with distinct antigenic profiles, which potentially can cause problems with accurate VTG quantification using immunology-based detection methods [excellent overviews of these matters are provided by Folmar et al. (1996); Hiramatsu et al. (2005); Larsson et al. (1999); Tatarazako et al. (2004)]. Useful features of VTG induction as a biomarker are the specificity for estrogens, the sensitivity, and the magnitude of the response possible [plasma VTG may increase by up to a millionfold, from nanograms per milliter to milligrams per milliliter concentrations (Tyler et al. 1990)]. Assays for VTG are available for a wide range of fish species. Measurement of VTG mRNA has also been used recently for quantifying the potency of estrogens (Inui et al. 2003;Larkin et al. 2003;Thomas-Jones et al. 2003), and its measurement has proved to be equally as effective in this capacity as the measurement of VTG protein (Folmar et al. 2000;Hemmer et al. 2002;Schmid et al. 2002;Thomas-Jones et al. 2003). Partial-or full-length VTG cDNAs have been cloned in 20 different species of fish [available from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/)], thereby providing the potential for a wide application of the VTG transcript(s) as an estrogen biomarker. The VTG mRNA transcript is induced rapidly (within a few hours of estrogen exposure), which provides the potential to develop its use for very short-term screens. VTG protein, in contrast with VTG mRNA, can be measured nondestructively in fish, which favors it for use in monitoring estrogen exposure in wild populations and for studies that require repetitive sampling from the same individual. Under maximal stimulation, the VTG protein also undergoes a level of induction up to a 1,000-fold higher than that with the VTG mRNA transcript (Thomas-Jones et al. 2003).
Established relationships between VTG induction and adverse health in fish are limited. Very high levels of VTG synthesis in adult fish can induce kidney failure (Herman and Kincaid 1988) and cause disruption in blood dynamics and function (Scholz and Gutzeit 2000). The impact of lower-level inductions of VTG, however, is not well defined. Theoretically, precocious VTG synthesis caused by estrogen exposure could reduce the survival capability of young fish that are at a life stage where the energy budget is critically balanced (Länge et al. 2001). Field studies on wild roach in U.K. rivers have also shown an increased content of plasma VTG in intersex fish  and that there exists a correlation (not necessarily a causation) between elevated VTG levels and the presence of intersex gonads. More longterm studies are needed to define better the association between VTG induction and reduced reproductive capacity in fish. The application of VTG as a biomarker for estrogen exposure and potential adverse health effects is complicated by the fact that some fish species have more than one VTG (Hiramatsu et al. 2002;Trichet et al. 2000). In the zebrafish, where the full genome has been sequenced, nine distinct VTG mRNAs have been identified. For the medaka, two subtypes of VTG (1 and 2) have been shown to have different sensitivities to weak xenoestrogens (Hiramatsu et al. 2005;Inui et al. 2003).
Other estrogen and androgen biomarkers. A second set of biomarkers that has been developed for detecting estrogen exposure in fish is the vitelline envelope proteins (VEPs) (Arukwe et al. 1997;Celius and Walther 1998). These glycoproteins are egg envelope components that form the chorion of the developing egg. Three such proteins (VEP1, 2, and 3) have been identified in fish. The VEPs are normally synthesized in females only [during oogenesis, starting before the onset of vitellogenesis; however, as is true for VTG, male fish may synthesize VEPs following exposure to xenoestrogens (Arukwe et al. 1997(Arukwe et al. , 2000]. However, VEPs are extremely hydrophobic proteins, and they are difficult to measure using conventional techniques. In contrast, VEP mRNA can be measured with relative ease for the cloned sequences. VEP mRNAs have been shown to be responsive to estrogen exposure in several fish species (Arukwe et al. 2000;Lee et al. 2002;Thomas-Jones et al. 2003). It should be noted that in the Arctic char (Salvelinus fontinalis), VTG and VEPs are not mediated via same mechanism because VEP induction can be regulated by cortisol in addition to estrogens .
Androgen biomarkers in laboratory fish to date have focused on characteristics of male gender development that are controlled by androgens. These characteristics include the facial tubercles and nuptial pads in fathead minnow (Ankley et al. , 2003Harries et al. 2000) and induction of spiggin in sticklebacks (Gasterosteus sp.) (Katsiadaki et al. 2002). These biomarkers have been shown to be highly responsive to a range of environmental androgens and antiandrogens. It is important to note that while emphasis in this article is on OECD fish species, work on wild fish has shown that biomarkers of the type discussed here (secondary sex characteristics) are useful for detecting environmental contaminants. As an example, early work on mosquitofish (Gambusia affinis holbrooki) demonstrated the value of morphological features for detecting environmental androgens (Bortone et al. 1989;Bortone and Davis 1994;Howell et al. 1980). Additional molecular biomarkers for EDCs. Other genes central in the estrogen response pathway that have been studied with a view to unraveling the mechanisms of estrogenic disruption in fish include estrogen

Biomarkers as signposts
Environmental Health Perspectives • VOLUME 114 | SUPPLEMENT 1 | April 2006 receptors (esr1, esr2a, and esr2b, previously known as estrogen receptors α, β, and γ, respectively); various enzymes involved with sex hormone biosynthesis [including cytochrome P450 (CYP)19 aromatase and C17 lyase]; and gonadotrophins (follicle-stimulating hormone and luteinizing hormone, hormones derived from the pituitary gland and control gonad development and sex steroid synthesis). Estrogen receptor (ER) β mRNA regulation has been shown to be affected by xenoestrogens for exposure within in vitro studies using fish hepatocytes (Flouriot et al. 1996) and within in vivo studies using medaka (Inui et al. 2003), rainbow trout (Vetillard et al. 2003), and zebrafish (Islinger et al. 2003). Filby and Tyler (2005) have also shown that fathead minnows have three estrogen receptors that are differentially regulated by estrogen exposure. In vivo studies have shown that exposure to exogenous steroid estrogens and fadrozole alters gene expression of both brain and gonad aromatases (a key enzyme in the biosynthesis of estrogens from androgens). Exposure to pharmacological doses of fadrozole during the window of sexual differentiation in the zebrafish results in a depressed aromatase mRNA expression subsequently in all-male fish populations (Fenske and Segner 2004). Exposure to steroidal estrogens in adult fish alters gene expression of brain aromatase, but the significance of this for reproduction has not been determined in the fathead minnow (Halm et al. 2002), in medaka (Contractor et al. 2004), or in zebrafish (Kishida et al. 2001). Exposure to the estrogen mimic nonylphenol disrupts pituitary synthesis of gonadotrophin mRNA, and, thus, disrupts gonadal development at a pivotal level in the endocrine control pathway for sexual development (Harries et al. 2001). These examples illustrate that molecular biology, specifically studies on single specific gene transcripts, is increasingly important in unraveling the pathways and mechanisms of estrogenic disruption in fish. Moreover, because evidence shows that some chemicals can act at multiple targets to disrupt physiological function in fish [e.g., nonylphenol acts as an estrogen agonist, androgen antagonist and can alter gonadotrophin synthesis and secretion (Scholz and Gutzeit 2000)], more comprehensive molecular approaches (beyond assessing effects at single gene targets) are needed to identify pathways and mechanisms of endocrine disruption.
The potential for omics technologies to help unravel how EDCs interact in fish and mediate their effects is now well recognized; however, the practical science is still in its infancy. For example, with the availability of the genome sequences for zebrafish (Rasooley et al. 2003) and medaka (Henrich et al. 2003), DNA microarrays are now available for application to EDC research. A limited number of cDNA macroarrays are now available for other fish species, including fathead minnow (Miracle et al. 2003) and sheepshead minnow (Larkin et al. 2003). Larkin and coworkers recently conducted studies using cDNA macroarrays to investigate the responses of 30 genes in the sheepshead minnow (Larkin et al. 2003). They observed that exposure to 17β-estradiol, ethinylestradiol, diethylstilbestrol, and methoxychlor gave similar genetic signatures for the 30 (estrogen-responsive) genes studied. The gene response patterns, however, differed from the effects of steroid estrogens in fish exposed to nonylphenol or endosulfan.
Design of fish tests for EDCs. For more than 20 years, testing for fish reproductive effects has typically exploited one of two (or, in the case of full life-cycle tests, both) life stages where chemicals that affect the HPG axis produce adverse outcomes such as early development (covering the window of sexual differentiation) and active reproduction. The effects of EDCs during gonadal development and sexual differentiation can be manifested in intersex gonads and/or skewed phenotypic sex ratios in fish [examples using fathead minnows (Pimephales promelas), medaka (Oryzias latipes), and zebrafish (Danio rerio) are given by Maack and Segner (2003); Örn et al. (2003); Seki et al. (2002); van Aerle et al. (2002)]. In species with prominent secondary sexual characteristics, such as the fathead minnow or medaka, skewed sex ratios (typically deviation from 50:50% male:female) can be readily identified visually at sexual maturation. Identification of intersex gonads (often referred to as ova-testis) requires histological analysis. Changes in sex ratios or the occurrence of intersex gonads could clearly have consequences in terms of maintenance of stable populations, and both end points have some level of diagnostic capability with respect to identifying an underlying endocrine-mediated MOA. Hence, early developmental tests clearly have value with respect to screening and testing EDCs. However, a drawback of these types of tests is their duration, not so much in terms of chemical exposure time but in the amount of time the fish must be held after the exposure in order to reach the point of maturation where phenotypic sex can be determined and/or gonads can be examined histologically [see reviews by Ankley and Johnson (2004) and Hutchinson (2002)].
Another critical window of sensitivity regarding chemical disruption of the HPG axis in fish is the period of active reproduction. This period has been exploited with respect to EDC testing through use of short-term reproduction assays in several small fish species (Ankley and Johnson 2004). There is some variation in the test design and end points between laboratories; however, a relatively similar design has been employed for the fathead minnow, medaka, and zebrafish Harries et al. 2000;Seki et al. 2002;. Tests are usually started with animals that have an established history of successful reproduction. After a short period of acclimation, chemical exposure is initiated, generally via water and typically for 21 days. During the exposure, a number of apical (adverse) effects can be assessed, including survival and size (of the adults), fecundity (number of eggs spawned), fertility (number of fertile eggs produced), hatch (number of fertile eggs that produce larvae), and larval viability (e.g., occurrence of malformations in hatched animals). This type of information can be useful for subsequent population modeling. At the conclusion of the assay, a number of biomarker responses more diagnostic for specific EDCrelated MOAs can be assessed, including status of secondary sex characteristics, gonad histology, concentrations of VTG in plasma or liver, and plasma sex steroid (17β-estradiol, testosterone, 11-ketotestosterone) concentrations.
Although more limited, a number of full life cycle studies have been conducted with EDCs in small fish models. The advantage of these types of tests is that they capture all potential windows of sensitivity, and they are suitable for quantitative risk assessments. Their disadvantage is, of course, their duration and expense. Life cycle tests with EDCs have been described for the fathead minnow (e.g., Länge et al. 2001), medaka (e.g., Yokota et al. 2001) and zebrafish (e.g., Fenske et al. 2005;Nash et al. 2004;Segner et al. 2003). Life cycle tests can capture all the end points collected during the shorter-term developmental and reproduction assays described previously; as such, these tests provide information that can be used for population modeling as well as for diagnosing endocrine MOA. Moreover, effects may not be evident in the F 0 generation but may appear only in the F 1 generation, or organizational effects of EDCs induced early in life may become detectable only during later stages (Bigsby et al. 1999;Stahl and Clark 1998).

Species Selection and Interspecies Comparisons
Selection of test species. Fish are the most numerous class of vertebrates, and because of their diversity, it is impossible to identify any one species as an ideal test model. Many different fish species have been used for shortterm lethality assays. Test species have been selected mostly for ease of culture, ecological relevance and, occasionally, economic importance. A much smaller range of species, however, has been employed in partial and full life cycle toxicity testing, namely, the type of tests needed to detect EDC effects, because endocrine disruptors are more likely to cause long-term, sublethal rather than acute effects (Ankley and Johnson 2004). Salmonids such as the rainbow trout (Oncorhynchus mykiss) that have received historical attention regarding toxicology research in North America and Europe are not conducive to routine partial or full life cycle testing because of their comparatively long life cycle and large size. Because of these types of logistic constraints, most current toxicology testing with fish uses small (usually freshwater) species such as the fathead minnow, medaka, and zebrafish (Ankley and Johnson 2004). Although current emphasis in EDC testing and research is on these small laboratory fish species, the importance of including other freshwater or marine fish species in EDC research should not be overlooked (Folmar et al. 2000;Hashimoto et al. 2000;Jobling et al. 1998;Katsiadaki et al. 2002).
Although small fish models offer a number of practical advantages for EDC laboratory testing, their biology is rather different than that of some of the wild fish species that regulations are seeking to protect. While the fathead minnow, medaka, and zebrafish are short-lived and start to spawn early in their life, many wild fish species are long-lived and take 1 or 2 years until they first spawn. The model species are fractional spawners, whereas many fish species from temperate zones are periodic annual spawners. This disparity implies important differences in the fluctuation of hormone levels, VTG synthesis, and gonad maturation . It is therefore important to exploit the value of EDC biomarkers as signposts for interspecies extrapolation from laboratory fish species to wild fish populations.
EDC responses across species. At the molecular level, MOAs of EDCs are highly conserved among vertebrates Vos et al. 2000). A compound that acts as an estrogen receptor ligand in mammals will often also bind to these receptors of fish, and available evidence suggests that binding affinities can be comparable among species (Tollefsen et al. 2002;Urushitani et al. 2003;Wilson et al. 2004). It is possible, however, that an EDC could show an identical molecular action in two species but still evoke different physiological, morphological, or biological responses. For example, whereas xenoestrogen action leads to VTG induction in both medaka and zebrafish, it appears to evoke intersex formation in the medaka but not in the zebrafish (Gray et al. 1998;Örn et al. 2003;Seki et al. 2002;. Similarly, life cycle exposure of fathead minnow and zebrafish to ethinylestradiol at concentrations of 4 and 3 ng/L, respectively, results in a 100% inhibition of reproduction; but in fathead minnow this effect is caused by a complete feminization of the gonads (Länge et al. 2001), whereas in zebrafish it seems to be caused by arrested testicular development (Fenske et al. 2004). It is also necessary to consider that some hormones have wider functional roles in some fish species. For instance, while the main role of estrogen in all teleost species is to regulate sexual differentiation and reproduction, in migratory salmon, it is also involved in the smoltification process (Madsen et al. 1996). Therefore, estrogen exposure during the parr-smolt transition period could modify the success of smoltification and lead to reduced smolt survival in seawater (Fairchild et al. 1999). Hence, extrapolation may become more difficult at the level of metabolic, physiological, or systemic responses to EDCs (Hornung et al. 2004;James et al. 1988;Kawai et al. 2003). This possibility is particularly likely when there is insufficient knowledge of the basic endocrinology and physiology of model fish species.
The qualitative extrapolation of effects across fish species is one problem; another is the interspecies extrapolation of quantitative effect concentrations. When comparing for different fish species the threshold concentrations for induction of VTG, it appears, for example, that for ethinylestradiol, the lowest observed effect concentration for a biomarker ( biomarker LOEC) (values (based on VTG) are usually within one order of magnitude (Table 1). It should also be noted that different fish species respond at different rates to stimulation (differences in responsiveness). For example, sewage effluent exposures produce more rapid VTG induction in rainbow trout compared with that in carp or roach; however, the effective concentration of effluent to induce VTG is essentially the same (Jobling et al. 2004;Tyler et al. 2005). There is also the issue that some fish have a greater capacity for VTG induction compared with others, which probably reflects differences in their normal, basic physiology. For example, rainbow trout may produce tens of milligrams of VTG in response to an estrogen dose, whereas roach or carp may produce only hundreds of micrograms in response to that same dose ). Interspecies differences in responsiveness of VTG induction and in the resulting magnitude of VTG levels after induction may occur for several reasons, some of which include the underlying endocrine status of semelparous versus iteroparous species, the study temperature versus the environmental temperature normally conducive to sexual maturation and spawning, or indeed, other unrecognized factors. What is clear is that for the species of concern (in both laboratory and field studies), we need to develop a more comprehensive understanding of their temporal reproductive cycles, especially where VTG is being used as a biomarker of estrogen exposure in fish (Hiramatsu et al. 2002).

Protecting Fish Populations
Extrapolating laboratory data to the field. Unless a species is considered endangered, ecological risk assessments with toxic chemicals, including EDCs, are focused not on the individual but rather on impacts at the population

Biomarkers as signposts
Environmental Health Perspectives • VOLUME 114 | SUPPLEMENT 1 | April 2006  Table 1. Examples of interspecies comparison of VTG induction in fish exposed to 17α-ethinylestradiol.

Species
Range tested (ng/L) Exposure period (days) Age (sex) Biomarker LOEC or EC 50 Reference level. As a result, end points most commonly assessed in regulatory fish testing are those that can be interpreted relative to population vital statistics. They include survival, growth (generally during early-life stages), and measures of reproductive success (e.g., gonadal condition, egg production, fertility, hatching success). Environmentally adverse concentrations of a given chemical have generally been defined statistically on the basis of performance of fish in an assay. These concentrations are often summarized as the LOEC and the no observed effect concentration (NOEC), or sometimes as the EC 10 (effective concentration for a 10% response) value [European Commission (EC) 2003]. However, some environmental scientists feel that this regulatory approach is overly simplistic with respect to protecting populations in the field. For example, some researchers assume that sensitivity of fish tested in a laboratory adequately reflects the sensitivity of untested species in the field. An aspect of this assumption is the differing life history patterns of fish, namely the K-and r-strategies (Stearns 1976). For North American fish, interspecies patterns of life histories were analyzed in detail by Winemiller and Rose (1992). Fish species with an opportunistic strategy are characterized by early maturation, frequent reproduction over an extended spawning season, rapid larval growth, and high adult mortality. In contrast, species with an equilibrium strategy are characterized by intermediate-sized fish that produce small numbers of large eggs and provide high parental investment in their young. Finally, species with a periodic strategy are characterized by fish that show low adult mortality and that delay maturation until body size is sufficient for the production of large egg batches; usually they are perennial spawners that spread reproductive efforts over several years. Such fundamental differences in life strategies are likely to have a major bearing on the potential impacts of EDCs. In turn, using fish species with an opportunistic life history (as most of the laboratory test fish species have) to predict the response of and to protect a species of fish with a periodic life history (as many fish species of temperate zones have) is certainly open to question. This comparison illustrates further the limitations of any test system applied in a wide generic context in risk assessment to protect wild fish. Population modeling. Models offer one solution to dealing with uncertainties in the extrapolation of data generated from individuals in the laboratory to field populations (Hurley et al. 2004). Through the use of models it is possible to consider the long-term population implications of changes in parameters such as survival or reproductive output at the individual level. A number of recent studies have been conducted concerning the effects of EDCs in fish from a population-modeling perspective. In one example Grist et al. (2003) evaluated data generated in a full life cycle test conducted with fathead minnows exposed to ethinylestradiol (Länge et al. 2001), and they concluded that reductions in population growth rate caused by the estrogen would be due to effects on fertility rather than on survival. Gleason and Nacci (2001) provided an example of the use of a matrix model for the fathead minnow to relate a decreased population growth rate in fish exposed to estradiol to vitellogenin induction (in males). Miller and Ankley (2004) expanded on this approach by developing a model based on the Leslie matrix and logistic equation to predict the effects of the androgen 17β-trenbolone (Ankley et al. 2003) on dynamics of a fathead minnow population in a closed system. A more advanced example of population modeling of EDC effects on fish populations has been described by Brown et al. (2003), who evaluated the likely effects of 4-nonylphenol and methoxychlor on fathead minnow and brook trout populations. A significant element of that analysis was the consideration of differential impacts of the estrogenic EDCs on fish species with markedly different reproductive strategies, that is, discrete (trout) versus continual spawners (minnows). Jobling et al. (2006) have recently adopted a modeling approach to predict roach populations most at risk from the effects of steroid estrogens in U.K. rivers. Subsequent biological sampling at the study sites specified to probe the model showed a highly significant correlation between prediction (based on steroid estrogen exposure) and impacts on gonad histopathology.
Interpretation of effects data. The calculation of robust population predicted no-effect concentrations (PNECs) from NOEC or EC 10 values is a continual challenge in ecotoxicology, and several methods are used tackle it (EC 2003;Selck et al. 2002). In the most widely used approach, constant application factors are applied to the available toxicity data to account for different sensitivities of other untested species in the ecosystem. In Europe, for example, application factors are used for single substances to estimate an environmental concentration that presumably will protect the wildlife species from adverse effects. In this process the lowest effect concentration measured in a laboratory test is divided by an application factor (usually between 10 and 1,000, depending on pragmatic assumptions relating to single substances versus real world mixtures, acute-chronic ratios, and interspecies differences). Alternatively, species sensitivity distributions (SSDs) may be used (e.g., both in Europe and North America to derive water quality criteria) (EC 2003).
Application factors are used most often when few toxicity data are available. We propose, in keeping with the principles of EDC assessment in mammals and the valuable concept of the NOAEL (no observed adverse effect level) (Foster and McIntyre 2002;Lewis et al. 2002), that fish chronic testing data be expressed as adverse NOEC or adverse EC 10 values, which in turn should be used to derive PNECs. Similarly, biomarker responses such as VTG could be expressed in terms of biomarker NOEC or biomarker EC 10 values; however, at the current time it is prudent not to use these data alone for calculating PNECs for individual substances. Currently, it is difficult to incorporate EDC biomarker data directly within fish population modeling and to discern between potential causative factors versus data association (Gleason and Nacci 2001). For complex mixtures such as municipal and industrial effluents, at this time there is clearly a proven role for using VTG and other biomarkers (e.g., gonad histology) in fish to prioritize discharges of concern Sumpter and Jobling 1995). As recommended by Handy et al. (2003), it is currently inappropriate to use biomarker data alone to curtail discharges routinely ("a red traffic light" scenario), but rather their value is greater as scientific signposts to help target detailed chemical and biological analyses of water, sediment, and biota. The use of EDC biomarkers will improve, pending further research and knowledge regarding their functional relevance for reproductive toxicity.

Standardization of End Points
As the science concerning EDC testing progresses, it will be necessary to evaluate the variability in different biomarker responses and adverse effect end points. The same approach exists in other areas of ecotoxicology and mammalian toxicology (Dave 1993;Environment Canada 1990;Mitchell et al. 1990). Relative to this point, the OECD (2003) guidance document on the validation of test methods defines the following terms (with the current addition of the last point on interassay comparability): • Relevance: the description of the relationship of the test to the effect of interest and whether it is meaningful and useful for a particular purpose. It is the extent to which the test correctly measures or predicts the biological effect and species of interest • Reliability: the extent of reproducibility of results from a test over time within and among laboratories when performed using the same protocol • Reproducibility: the agreement among results obtained from testing the same substance using the same test protocol (see "Reliability") • Repeatability: the agreement among test results obtained within a single laboratory when the procedure is performed on the same substance under identical conditions • Robustness: the insensitivity of a test to departure from the specified test conditions; the ability of a test to provide similar results over a range of test conditions under which the test may be used in different laboratories • Transferability: the ability of a test method or procedure to be accurately and reliably performed in independent, competent laboratories • Comparability: the comparison of results from laboratories measuring the same end point using different protocols Table 2 presents data currently available for some biomarkers used in EDC assessments with fish. Taking the example of the use of a heterologous fathead minnow VTG immunoassay when exactly the same protocol was used in different laboratories, the coefficient of variation (CV) for interlaboratory reproducibility was 38-55% (Panter et al. 1999). A more recent study using a homologous enzymelinked immunosorbent assay for fathead minnow gave intralaboratory repeatability and interlaboratory reproducibility CVs of 16.4 and 18.4%, respectively (Eidem et al., in press). In contrast, when a variety of different methods were compared for measuring VTG in OECD species (reflecting interassay comparability but not interlaboratory reproducibility), the data resulted in CVs of up to 1,873% in medaka (Battelle 2003) or 263% in zebrafish (Porcher 2003). Such wide variability in data is an additional reason why VTG cannot be easily used to predict adverse effects accurately at this time. Other comparative exercises of different immunoassay methods have been conducted for plasma sex steroids, and they showed CVs of 60 and 70% for estradiol and testosterone, respectively (McMaster et al. 2001). What is notable in Table 2 is that CVs for VTG reproducibility are relatively similar to those for survival, growth, and fecundity from a range of laboratory studies.

Use of Biomarkers in EDC Screening and Testing
In the tiered approach to assessing possible EDCs, which has been suggested by the U.S. EPA (1998;Fenner-Crisp et al. 2000), the initial step would be prioritization of chemicals for testing based, for example, on existing toxicological data, structural characteristics, production volume, etc. Chemicals would then be subjected to tier 1 tests designed to provide evidence as to whether a chemical has the potential to act as an EDC in a whole animal. The U.S. EPA (1998) recommended consideration of five tier 1 tests: three with rats, one with an amphibian (to detect thyroid-active chemicals), and one with fish to identify chemicals that affect reproductive function through alterations in the HPG axis. Chemicals identified as positives through the tier 1 tests would proceed on to more extensive tier 2 testing designed to define human health and/or ecological risk quantitatively. Tier 2 tests are full life cycle and multigenerational tests with potentially a variety of species, including mammals (rats), birds, amphibians, and fish (U.S. EPA 1998). Outside the United States, other countries have proposed modifications to this approach, and some suggest a three-tier method that can be applied to different aquatic exposure scenarios (tier 1: fish screening; tier 2: fish partial life cycle; and tier 3: fish full life cycle) . Whatever approach to tiered testing is used, there is need for a toolbox of validated OECD test guidelines that encompass both biomarker and adverse effect end points that have been validated using a range of substances (Huet 2000).
In general, it is agreed that only through the application of a tiered strategy can the testing of the huge number of candidate chemicals be practically and economically feasible. The primary purpose of screening is to have rapid assays (with efficient use of animals, laboratory resources, and money) that ideally can detect all EDCs of concern, avoid false negatives, and act as signposts for the road ahead toward higher tier testing. It is therefore not the purpose of fish screening assays to provide data for direct use in calculating PNECs. Guided by the information from screening, the higher-tier fish tests are primarily devoted to identifying adverse effects of concern and to providing data to calculate the PNEC ( Figure 1). As an

Biomarkers as signposts
Environmental Health Perspectives • VOLUME 114 | SUPPLEMENT 1 | April 2006 Abbreviations: -, no data; IC 25, 25% inhibitory concentration; LC 50 , 50% lethal concentration. a Repeatability" describes the variation among repeated tests of the same protocol in the same laboratory. This variation is also called intralaboratory variability. Reproducibility" describes the variation between repeated tests of the same protocol in different laboratories. This variation is also called "interlaboratory variability" (Dave (1993). Comparability" describes the variation for the same end point measured using different protocols. CV = 100 × SD/mean. The CV is sometimes referred to as the relative standard deviation (RSD).

Figure 1.
Conceptual approach to fish screening and testing for EDCs. Fish screening assays often use adult fish but can also involve juvenile fish on a case-by-case basis. illustrative example for fish screening, the most rapidly assessed biomarkers (e.g., secondary sexual characteristics or VTG) could be evaluated first, and more complex biomarkers such as gonad histology could be addressed later (Figure 2).

Conclusions and Recommendations
Given the exponential increase in data on screening and testing of EDCs in fish, it is important to identify key points of learning to guide future science in both the laboratory and field, and also to aid risk managers in government and industry. At present, it is our view that this process can be summarized in five essential conclusions: • There is a need for a validated "toolbox" of fish screening and chronic testing methods within a flexible framework. A basic requirement here is the need for more published data on normal patterns of biomarker expression along with exposure studies using a wide range of both potent and weakly acting EDCs. Where mechanistically appropriate, the use of positive controls within research studies can add value to some studies [e.g., for estrogens and antiestrogens, 17α-ethinylestradiol was used by Panter et al. (2002)]. A historical control database needs to be established for fish species with the most promise in EDC screening and testing (e.g., fathead minnow, medaka, zebrafish) for the biomarkers and adverse effect end points of interest. Our effort to apply this approach to VTG suggests that this biomarker has an interlaboratory CV for reproducibility similar to those for traditional ecotoxicology end points (survival and growth). • Biomarkers provide valuable data for extrapolating from laboratory and field studies to the effects of EDCs on fish. It is doubtful whether the scientific understanding of environmental estrogens could have progressed this far were it not for the use of the fish VTG biomarker for monitoring effluents in many countries; its use led to the efficient focusing of analytical chemistry on key chemical suspects Vos et al. 2000). In parallel, the inclusion of biomarkers in screening and chronic testing with fish has provided essential data for verifying alerts from the field, and it has helped lead to a rational prioritization of natural and synthetic estrogens as key substances of concern for the reproductive health of fish populations. As more data become available for other classes of EDCs (e.g., androgen-active compounds and aromatase inhibitors), it is anticipated that the same principles will hold true and be productive in protecting and improving ecological quality (Ankley et al. 2002(Ankley et al. , 2003Parks et al. 2001;Zerulla et al. 2002) and also in linking to mammalian systems (Gray et al. 1998). • For prospective risk assessment, biomarkers can be used as signposts to design better and more cost-effective fish chronic tests. Many scientists recognize that a modest but detectable increase of VTG in male fish may not directly predict quantifiable impairment of reproduction, although very high levels of VTG can cause severe growth retardation associated with renal damage (Hermann and Kincaid 1988; Länge et al. 2001). In terms of reproducibility, VTG biomarker measurement appears to perform in a manner similar to survival and growth in fish (Dave 1993). More data are needed on the repeatability and reproducibility of sex hormone measurements to evaluate further their general use as mechanistic endocrine biomarkers McMaster et al. 2001). • For whole effluent assessment, biomarker data can be used to design cost-effective biological and chemical monitoring programs to reduce ecological impacts in receiving waters. As observed by Handy et al. (2003), it would be inappropriate for biomarker data alone to be used to curtail effluent discharges or to justify the imposition of penalties on the municipal or industrial discharger. Instead, as EDC biomarker data improve, site-specific goals could be set as a by-product of understanding impacts from a weight-of-evidence (including biomarker data) perspective, followed by dialogue involving all parties. Given the high cost of chemical analyses to quantify trace levels of toxicants in complex mixtures, biomarkers may well prove to be the most effective and economical approach (van der Oost et al. 2003). • Calculation of PNEC values should be based primarily on adverse effect end points that are directly relevant to protecting populations. Communication of this population relevance concept could be aided by ecotoxicologists adopting the NOAEL from mammalian toxicology. Based on current regulatory terminology, data could be expressed as the adverse EC 10 , adverse NOEC, and adverse LOEC (as distinct from biomarker EC 10 or biomarker NOEC values) ( Figure 3). This proposal may help achieve improved consistency in EDC risk assessment between mechanistic or mode-ofaction ecotoxicology (Escher and Hermanns 2002) and mammalian toxicology (Ashby et al. 2004;Foster and McIntyre 2002;Lewis et al. 2002). As our understanding of EDC biomarkers develops, there is future scope for their increasing role to guide environmental risk assessment, and investment in fish biomarker research should therefore remain a priority. Figure 2. An example of biomarker interpretation for fish screening and testing of potential reproductive EDCs, applicable to single substances or to complex effluents (according to local or regional requirements). Abbreviations: ED, endocrine disruptor; GSI, gonadosomatic index; SSC, secondary sexual characteristics.