The OECD program to validate the rat uterotrophic bioassay to screen compounds for in vivo estrogenic responses: phase 1.

The Organisation for Economic Co-operation and Development has completed the first phase of an international validation program for the rodent uterotrophic bioassay. This uterotrophic bioassay is intended to identify the in vivo activity of compounds that are suspected agonists or antagonists of estrogen. This information could, for example, be used to help prioritize positive compounds for further testing. Using draft protocols, we tested and compared two model systems, the immature female rat and the adult ovariectomized rat. Data from 19 participating laboratories using a high-potency reference agonist, ethinyl estradiol (EE), and an antagonist, ZM 189,154, indicate no substantive performance differences between models. All laboratories and all protocols successfully detected increases in uterine weights using EE in phase 1. These significant uterine weight increases were achieved under a variety of experimental conditions (e.g., strain, diet, housing protocol, bedding, vehicle). For each protocol, there was generally good agreement among laboratories with regard to the actual EE doses both in producing the first significant increase in uterine weights and achieving the maximum uterine response. Furthermore, the Hill equation appears to model the dose response satisfactorily and indicates general agreement based on calculated effective dose (ED)(10) and ED(50) within and among laboratories. The feasibility of an antagonist assay was also successfully demonstrated. Therefore, both models appear robust, reproducible, and transferable across laboratories for high-potency estrogen agonists such as EE. For the next phase of the OECD validation program, both models will be tested against a battery of weak, partial estrogen agonists.

1 National Institute of Health Sciences, Tokyo, Japan; 2 Environment, Health and Safety Division, OECD, Paris, France; 3 National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA; 4 U.S. Environmental Protection Agency, Washington, DC, USA; 5 Syngenta Central Toxicology Laboratory, Macclesfield, Cheshire, UK; 6 Procter & Gamble, Cincinnati, Ohio, USA Concern has been raised that ambient environmental levels of chemicals called environmental estrogens may be causing adverse effects in both humans and wildlife through the interaction of these chemicals with the endocrine system (1). Initial reviews of existing reports have noted limited evidence for endocrine disruption in humans but have noted several cases where local, high-level exposures have produced effects in wildlife (2)(3)(4).
To address this concern, the Organisation for Economic Co-operation and Development (OECD) initiated a high-priority activity in 1997 to a) provide information on testing and assessment activities, particularly at the national regulatory level, and coordinate these activities among member countries as appropriate; b) revise existing guidelines and develop new guidelines for screening and testing potential endocrine disrupters; and c) harmonize hazard and risk assessment approaches internationally (5). The advantage of the OECD activity is that it would produce a set of internationally recognized and harmonized screening and testing guidelines and strategies that would avoid duplication of testing resources, including animals.
The OECD activity is managed by the Task Force on Endocrine Disrupters Testing and Assessment (EDTA), the membership of which includes experts nominated by OECD member countries' regulatory authorities, international organizations, nongovernmental organizations, and industry associations. The activity is part of the OECD Test Guidelines Programme, so overall responsibility of the work lies with the Working Group of National Co-ordinators of the Test Guidelines Programme (WNT).
The OECD conceptual framework identifies short-and long-term assays of increasing complexity and detail to gather information on a chemical. The assays include a) structural activity relationships and in vitro assays that would identify a chemical based on certain intrinsic characteristics (e.g., estrogen receptor binding affinity); b) short-term in vivo assays to demonstrate relevant activity in the intact animal (e.g., the uterotrophic assay); and c) long-term assays involving exposure to the test substance at different stages of the development of the animal (e.g., the two-generation reproductive assay). The OECD strategy aims to develop these assays as multipurpose tools rather than as a rigid scheme. The purpose and use of a bioassay could vary depending on the chemical substance and the available toxicological data on that chemical. An early screen in one case could become a means to determine a chemical's mode of action in another (5).
In this article we focus on the OECD validation program for an in vivo screen for estrogenic activity. Historically, several candidate systems are available: a vaginal cornification and keratinization response (6), a water imbibition response of the uterus after a single dose of the test compound (7), and a uterine tissue weight increase after several doses of the test compound (8)(9)(10). The EDTA reached consensus to select the latter assay, called the uterotrophic assay, for further development and validation. The uterotrophic response has been employed to evaluate estrogenic activity using a number of mammalian and avian species, although primarily laboratory rodents. Because the rat has become the preferred species for reproductive and developmental toxicity testing, we chose it as the test species for further standardization and development of the uterotrophic assay.
Two possible uterotrophic models are based on the need to have a nonfunctional hypothalamic-pituitary-gonadal axis to ensure a sensitive and consistent uterine response both to administered estrogens alone and to administered antiestrogens in combination with a reference estrogen. One model uses the immature female before significant ovarian estrogen synthesis and regulation by the hypothalamic-pituitary-gonadal axis begins; the other model uses the ovariectomized (OVX) adult female, removing the primary source of estrogen synthesis. An extensive comparison of these models across several laboratories has never been performed. However, data in the literature and available data from laboratories participating in the OECD program suggest that the two models may be equivalent.
The objective of the OECD work on the uterotrophic assay is to develop a new, validated test guideline and clearly define its purpose. OECD member countries formally agreed in a workshop held in Solna, Sweden, in 1996 on the principles of validation and criteria for the acceptability of new and revised test guidelines (animal or nonanimal). These are now commonly known as the "Solna principles and criteria" and follow extensive work by national and regional authorities on the validation and acceptability of alternative test methods, including definitions of key terms (11). Validation is defined as the process by which the reliability and the relevance of a procedure are established for a particular purpose. Reliability is defined as the reproducibility of results from an assay within and between laboratories. Relevance describes whether a test is meaningful and useful for a particular purpose. The Solna principles and criteria were originally developed by the European Centre for the Validation of Alternative Methods (ECVAM) and the U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) and can be summarized as follows: a) the test method rationale should be stated, including the scientific need and regulatory purpose; b) the relationship of the end point(s) determined by the test method to the in vivo biologic effect and to the toxicity of interest must be stated; c) the limitations of a method must be described (e.g., metabolic capability); d) a detailed protocol must be readily available with sufficient detail to enable the user to adhere to it, including data analysis and decision criteria; e) test methods and results should be publicly available and should have been subjected to independent scientific review; f) intratest variability, repeatability, and reproducibility of the test method within and among laboratories should have been demonstrated, including a description of variability with time; g) the test method's performance must have been demonstrated using a series of reference chemicals, preferably coded to exclude bias; h) the performance of test methods should have been evaluated in relation to existing relevant toxicity data; i) all data supporting the assessment of the validity of the test methods including the full data set collected in the validation study should have undergone scientific review; and j) these data should have been obtained in accordance with the OECD Principles of Good Laboratory Practice (GLP) (11). In 1998, OECD member countries reconfirmed their commitments to these validation principles and criteria and further clarified that the process of validation should allow for flexibility. However, to be able to justify a certain flexibility, a transparent standard procedure should be available that allows for the assessment of the need for and extent of the required level of validation (12).
A Validation Management Group on behalf of the EDTA of the OECD Test Guidelines Programme coordinates the work on human health test methods. The Validation Management Group is composed of experts from eight member countries nominated for their expertise in toxicology, test development and validation, endocrinology, regulatory toxicology, and biostatistics. Experts from ICCVAM and from ECVAM also participate in the Validation Management Group.

Overall Program Design and Objectives
The work on the uterotrophic assay is being performed in phases. Phase 1, now completed, was designed to test, refine, and standardize the immature and the adult OVX rat uterotrophic assays using a high-potency reference agonist compound and to provide data on intra-and interlaboratory variability with this reference compound. In addition, the feasibility of using the protocol for antagonist assays was explored using a reference antagonist. A detailed report of phase 1 of the validation work on the uterotrophic assay, including the rationale for the design of this phase, was submitted to the VMG for final approval in March 2001 (13). At this time, further progress on the antagonist portion of the assay awaits synthesis of sufficient quantities of the reference pure antiestrogen. Few pure antagonists such as ZM 189,154 are known (14). Most known estrogen antagonists such as tamoxifen will also express low levels of an agonist response and would complicate data interpretation by responding as a positive in both agonist and antagonist sections of the assay (15). Phase 2, currently underway, is designed to demonstrate the capability of both standardized protocols against a set of test compounds comprising weak estrogen partial agonists and a known negative. Phase 2 is intended to demonstrate the repeatability and variation within and among laboratories for several compounds and over time. To properly investigate intra-and interlaboratory variability, the doses to be used in phase 2 will be specified in all cases. Twenty laboratories are participating in phase 2.
The need for additional work after phase 2 will depend on the outcome of phase 2. The results of the uterotrophic assays along with other relevant biologic and toxicologic data that may exist on the chemicals of interest will be evaluated to demonstrate the reliability and relevance of the uterotrophic screen for its intended use in detecting estrogen agonists in vivo.
Design of phase 1. The objectives of the first phase of the OECD validation work were to a) demonstrate, in immature and adult OVX rats, the dose-response relationship between uterine weight and a reference estrogen using two possible routes of administration-oral gavage and subcutaneous injection; b) investigate intra-and interlaboratory variation and identify any appropriate protocol refinements; c) compare the performance of the protocols; and d) demonstrate the feasibility of the protocols to identify potential antiestrogenic activity using a pure estrogen antagonist.
Currently, several protocols are in use for the uterotrophic assay. Three principal variables govern their differences: species, age of test animal, and route of administration. The result is eight possible test protocols, each having literature precedents for their use. The literature was reviewed and decisions made on the basis of scientific rationale and practical experience, given the intent to develop an OECD test guideline that could be transferred easily to many laboratories. The rat was chosen over the mouse because it is used most often as the preferred rodent model in toxicology testing paradigms for regulatory purposes. All protocols developed had essentially the same design and differed only in the model used and route of administration. The protocols using oral gavage in immature animals and subcutaneous injection in ovariectomized animals each had large databases of available historical information. The third protocol, subcutaneous injection of immature rat, also had a large database of historical information and was chosen as a link between the other two. A fourth protocol, extending the duration of exposure in ovariectomized animals, was carried out by some laboratories to explore whether varying this parameter had any effect on the sensitivity of the assay.
The design of phase 1 then consisted of testing four protocols: Protocol A used the immature female rat model, with administration of doses by oral gavage for 3 days at 24-hr intervals followed by humane killing 24 hr after the last administration. Protocol B also used the immature female rat model, with dosing by subcutaneous injection for 3 days at 24-hr intervals followed by humane killing 24 hr after the last administration. Protocol C used the adult OVX rat model, with administration by subcutaneous injection for 3 days at 24-hr intervals followed by humane killing 24 hr after the last administration. Protocol C´ also used the adult OVX model and extended the subcutaneous dosing to 7 days with humane killing 24 hr after the last administration.
The reference estrogen agonist was 17αethinyl estradiol (EE; CAS no. 57-63-6), and the reference estrogen antagonist was the pure antagonist ZM 189,154 (ZM; CAS no. 101908-22-9). The same lot of each chemical was distributed from a central repository. These chemicals were gifts of Schering (Kenilworth, NJ, USA) and Astra-Zeneca (Alderley Park, Cheshire, UK), respectively.
The lead laboratory was the National Institute of Health Sciences of Japan. Nineteen laboratories from Denmark, France, Germany, Japan, Korea, the Netherlands, the United Kingdom, and the United States participated in phase 1. Sixteen laboratories from seven nations performed protocol A, 12 laboratories from six nations performed protocol B, nine laboratories from three nations performed protocol C, and four laboratories from one nation performed protocol C´.
Because the uterotrophic assay is intended to be widely practiced, participating laboratories used their traditional rat strain, diet, vehicle, and housing procedures. Animals were to be acquired from standard animal supply sources with general instructions on acclimation and housing (e.g., immature animals transported with litters together accompanied by the dam or a foster dam, or scheduled to arrive as a litter when they are 17 days old; room temperature of 22 ± 3°C and a relative humidity 30-70%; artificial lighting with a 12-hr light and 12hr dark cycle; feed and tap or filtered drinking water provided ad libitum). Each laboratory recorded the specifics, and samples of vehicle and diet were retained. Individual animals were uniquely identified (e.g., by ear tags or tail tattoos), and each group was coded (e.g., by a letter and a color on housing cages). Both an untreated control and a vehicle control were included to allow detection of any significant contamination of the vehicle with phytoestrogen(s). There have been reports both in the older literature and more recently that particular lots of diet, presumably through the presence of phytoestrogens, could influence the baseline uterine weight (16)(17)(18)(19)(20). If significant variations in the control and vehicle control uterine weights were observed, the contributions of strain, diet, and so on could then be investigated further, if necessary, as retained samples of diet and vehicle were required. Details of these particulars are provided in Table 1.
All protocols were based on a group size of six animals. The total amount for subcutaneous injection per rat per day did not exceed 4 mL/kg, and the total amount for oral gavage per rat per day did not exceed 5 mL/kg. Included were daily measurement of animal body weights and adjustment of volumes to maintain the specified dose of substance for the allotted period. Body weights starting on the day of administration ranged from 26 to 57 g across laboratories for the immature animals and from 142 to 327 g for the adult OVX animals.
The end points of interest were the wet and blotted uterine weights. The uterine weight increase is a fundamental response of the female to sufficient exposure to estrogen agonists. The response begins with the essential interaction of the estrogen with a highaffinity receptor in uterine tissues that initiates a series of responses culminating in the uterine weight increase. The weight increase is a combination of water imbibition in the tissue and the uterine lumina and a hypertrophic response of the uterine tissues. The estrous cycle in the rat is 4-5 days, so the 3-day administration of a test compound is similar to the response time to endogenous estrogen surges in the intact animal that stimulate the uterine tissue. Thus, estrogen agonists can be identified by a statistically significant increase in uterine weight in treated versus untreated or vehicle control animals. In addition, estrogen antagonists can be identified by blocking or reducing the uterine weight increase of a reference agonist when both are simultaneously administered. The estrogen specificity of the uterine weight increase or decrease can be verified, if necessary, by histologic examination of the uterus and the vagina (18,21).
Historically, most published uterotrophic results have described uterine weights after careful blotting of the uterus after its wall was nicked or split to allow the luminal contents to drain out. The rationale given for measuring blotted uterine weight usually is that the wet weights are more variable, and the variability is increased by the possible loss of luminal fluid during dissection and tissue handling. For test optimization and validation, it was decided to include both wet and blotted uterine weights and to establish their variability in the models among different laboratories using standardized procedures. The uterine nicking and blotting technique was adopted in all protocols. Both wet and blotted weights were recorded to the nearest 0.1 mg in all protocols. Because several laboratories were performing the assay for the first time, and to standardize procedures, a videotape of procedures for ovariectomy and uterine dissection and preparation was prepared and distributed to the participating laboratories.
Precaution was taken to specify the age of the animals so that treatment could commence at 19-20 days of age (day of birth counted as day 1) and to limit body weight variability. Limiting the weight variability was thought essential to limit the chances of older animals being inadvertently included in the study. Older animals could enter puberty, leading to an increase in control uterine weight and thereby adding to the variability of the results (19,22). For the adult OVX animals, ovariectomy occurred at 6 weeks of age or later, with a minimum period of 1 week after surgery before administration of the reference compounds. In all protocols, groups were randomized according to body weight.
The doses of EE and ZM administered were specified to ensure that results could be statistically compared. For EE, a series of seven doses in half-log steps from 0.01 to 10 Table 2. Number of laboratories observing a lowest observed effect level (LOEL) at a given EE dosing based on the first observed significant (p < 0.05) increase in uterine weight. Participating laboratories submitted the raw data for central, independent statistical analysis. The ability to detect increased uterine weights was evaluated by an analysis of variance (ANOVA) approach, including body weight as a covariable. A variance-stabilizing logarithmic transformation was performed before the data analysis. Dunnett's test was used for making pairwise comparisons of each dosed group to vehicle controls. Dixon's outlier test was used to detect possible outliers, and Bartlett's test was used to assess homogeneity of variances. If significant heterogeneity was detected, the nonparametric Mann-Whitney U-test was used. For these data, parametric and nonparametric analyses produced similar results.

Results
All participating laboratories confirmed that the protocol was straightforward to perform. Suggested protocol refinements included additional guidance to reduce organ weight variation such as that caused by different prosectors; improved procedures for controlling body weight (in immature animals) and increasing the immature age for administration, because some laboratories encountered weight loss in the early weanlings.
All laboratories and all protocols were successful in detecting increases in uterine weights in the higher dosed EE groups. Within each protocol, there was good agreement among laboratories in the dose-response uterine weight increases for the reference EE. This included the EE doses identified as lowest observed effect levels (LOELs)-that is, the doses at which significant increases in uterine weight were first detected. The number of laboratories that observed a LOEL at a given dose for each protocol are summarized in Table 2. Blotted weights showed statistical significance at slightly lower EE concentrations than did the wet uterine weights. For protocol A, significance was generally first achieved at 1.0 µg/kg EE. The data for the wet and blotted uterine weights as well as body weights for the five highest dose groups are shown for Protocols A, B, C, and C´ in Tables 3, 4, 5, and 6, respectively. For protocols B and C, significance was generally first achieved at the next lower dose of 0.3 µg/kg EE. This difference was expected the different route of administration and previously published data for EE (18). Three of the four laboratories carrying out protocol C´ first found significant increases in uterine weight at the 0.1 µg/kg EE dose. In the higher dose groups, wet uterine weights were reduced substantially in protocol C´ relative to protocol C, whereas the reverse tended to be true for blotted weights. The reduced wet weights in protocol C´ were apparently caused by the reduction in luminal fluid content between days 3 and 7.
The consistency of dose-response results among laboratories was also evaluated: that is, did laboratories consistently produce a dose-response curve of approximately the Articles • OECD validation of rat uterotrophic assay Environmental Health Perspectives • VOLUME 109 | NUMBER 8 | August 2001 same shape where the same percentage increase in uterine weight, including the maximal increase, occurred at equivalent doses of the test compound? In protocol A, 8 of the 16 laboratories produced blotted uterine weight responses that were statistically consistent at all doses evaluated. In protocol B, 5 of the 12 laboratories produced blotted uterine weight responses that were statistically consistent at all doses evaluated. In protocol C, six of the nine laboratories produced blotted uterine weight responses that were statistically consistent. In protocol C´, all four laboratories produced blotted uterine weights that were statistically consistent (after deleting one outlier). Dose-response results are shown for all protocols in Figure 1.
The sensitivity of an assay can be defined in several ways. One approach is to identify the lowest dose at which statistical significance is achieved (see Table 2). Protocol A appeared less sensitive, as expected with the oral route of administration for EE. The data suggest no notable differences between protocols B, C, or C´ in the dose first producing statistical significance. Direct comparisons of performance should be based on data from the same set of laboratories performing both protocols. For example, eight laboratories carried out both protocols B and C. Seven laboratories achieved statistically significant increases at identical doses for the wet uterine weight, and six achieved statistically significant increases in blotted uterine weight.
At the higher EE doses, there was a significant difference between the models in the magnitude of the percentage uterine weight increase over controls. For the 12 laboratories carrying out protocol B, the range of increased blotted uterine weights over controls was 326-588% for the 1.0 µg/kg EE dose and 370-663% for the 3.0 µg/kg EE dose. For the nine laboratories carrying out protocol C, the range for the increase in blotted weights was 136-375% for the 1.0 µg/kg EE dose and 236-375% for the 3.0 µg/kg EE dose. The responses for all protocols at 3 µg/kg EE dose are shown in Table  7. Protocol B, C, and C´ animals appeared to have reached stable maximal responses in the tested dose range. Protocol A animals did not appear to have reached their stable maximal response even at the 10 µg/kg EE dose relative to the 3 µg/kg dose.
Two procedures were performed to permit meaningful comparisons of variability in response. The first procedure was to logtransform the uterine weight data. This was followed by ANOVA for each laboratory   and protocol, using body weight as a covariable. The error mean square resulting from this analysis can be regarded as a measure of intragroup variability, averaged over doses and corrected for the possible influence of body weight on the observed uterine weight response. The second procedure was to calculate the coefficient of variation (CV) in uterine weight for each dosed group for each laboratory within each protocol. The CVs were averaged over doses to obtain a representative value for each laboratory and protocol. Each procedure produced similar findings. The results for the second procedure are summarized in Table 8. These analyses revealed that a) within-group variability in response was consistently less for blotted weights than for wet weights; b) protocol A tended to show more within-group variability in both the wet and blotted measures of uterine weights, which was not unexpected given the oral route of administration; and c) the adult OVX subcutaneous protocols (C and C´) have slightly lower CVs than the immature animal subcutaneous protocol (B). Note from Table 8 that some laboratories have consistently different (lower or higher) CVs across all protocols. This suggests an important role for laboratory technique in controlling variability in both the wet and blotted uterine weight response measurements. Although body weights were controlled tightly within a laboratory, animal body weights varied widely in both the immature and the adult OVX protocols from laboratory to laboratory. Yet despite these differences in body weights, generally similar relative increases in both wet and blotted uterine weight were observed at the various laboratories for all of the protocols. High EE doses reduced body weight in the adult OVX protocols, but not in the immature animal protocols. In protocols A and B, only one laboratory recorded a significantly (p < 0.05) reduced body weight at the 10 µg/kg EE dose relative to the vehicle, while one laboratory recorded a significant increase. In protocol C, the 1.0, 3.0, and/or 10 µg/kg EE doses significantly reduced body weight relative to the vehicle controls for six laboratories. In protocol C´, with extended dosing from 3 to 7 days, the four laboratories showed consistently and significantly reduced body weights in the high EE dose groups. This weight loss is characteristic of potent estrogens such as EE or diethylstilbestrol. Loss of body weight and adverse effects including mortality require further consideration when determining the doses to be tested in any proposed OECD Test Guideline. These issues will be considered further in dose selection of test substances in phase 2 of the validation program.
Body weight and uterine weight showed no consistent correlation, with less than half of the studies showing a significant (p < 0.05) correlation between these two variables. Significant associations were found more often in the immature animal protocols than in the adult animal protocols. These findings, coupled with the lack of significant body weight effects in the immature animal protocols, meant that the body weight adjustment had relatively little impact on the evaluation of uterine weights in these studies.
In the ZM antagonism dose groups, most laboratories found blotted uterine weight decreases in the ZM/EE combination groups that were statistically consistent, with the magnitude of the reduction similar across all laboratories. Interestingly, in all eight laboratories that carried out both protocols B and C, protocol B had a greater percentage reduction in uterine weight at the However, there was no consistent difference in sensitivity between protocols B and C for the low-dose combination (0.1 mg/kg ZM + 0.3 µg/kg EE) group. One laboratory could not demonstrate a decreased uterine weight in the antagonist coadministration experiments. This was apparently related to an inability to induce a maximum uterine response at an EE-alone dose so that a statistically significant reduction could not be observed (see Figure 1, protocol C, lab 19, and Tables 7 and 10).
The original data analysis plan included a formal evaluation of whether the factors that varied from laboratory to laboratory could introduce variability in uterine weight response. Factors that varied from laboratory to laboratory included strain, diet, housing protocol, bedding, and vehicle. However, because of the overall consistency of the uterine weight responses across laboratories when these factors were reviewed (Table 1), a formal analysis of this aspect was judged unnecessary at this stage.
The analysis summarized in Table 9 illustrates the importance of minimizing the coefficient of variation in this type of study. The power of detecting various increases in uterine weight in the top dose group (by Dunnett's test) is analyzed as a function of the magnitude of the response (from a 25% to 40% increase in uterine weight), the number of animals per group (6 or 10), and the underlying CV (from 10.0 to 25.0). Six animals per group appear to be sufficient for detecting a 25-35% increase in uterine weight with reasonable power if the CV can be kept relatively low (e.g., in the general range of 10.0-15.0).
A widely used mathematical model, the Hill equation model, generally provided a good fit to the various data sets. The Hill model was applied to the 41 individual experiments. This permits an estimate of the effective doses at various levels such as 10%, 50%, and 90% of the maximum, the ED 10 , ED 50 , and ED 90 , respectively. The calculated results for the ED 10 and the ED 90 are summarized in Table 10. The model calculations support the results previously reported based on other types of statistical analyses. For example, the estimated ED 10 values in protocols B, C, and C´ are lower than those in protocol A, and no significant difference was found between the ED 10 values from protocols B, C, and C´.

Discussion and Conclusions
All laboratories and all protocols were successful in phase 1 of the OECD validation program in detecting increases in uterine weights using EE as the reference agonist. These significant uterine weight increases were achieved in both the immature and the adult OVX models under a variety of different experimental conditions besides route of administration (e.g., strain, diet, housing protocol, bedding, vehicle). This suggests a certain robustness of the protocols, at least for the reference EE. The consistency of the results also suggests that no further specification of the strain of rat, diet, and so on is necessary for this screening assay to detect potent estrogen agonists. For each protocol, there was generally good agreement among laboratories with regard to the actual EE doses that produced increased uterine weights and the maximum response observed. The shapes of the uterine weight dose-response curves, although similar for many labs, did show some variation. The feasibility of an antagonist assay was also successfully demonstrated, but in less detail because availability of the ZM reference compound was limited.
At this time, no substantive difference or advantage in model-immature versus adult OVX-has been found. Differences in response to EE and ZM caused by the route of administration were expected and did occur. Nonetheless, these results reinforce the sense of robustness of the protocols. For example, there is a relatively consistent halflog difference between subcutaneous and  oral gavage administration in observed LOEL doses and the calculated ED 10 doses across protocols (see Tables 2 and 6 and Figure 1). Group sizes of six animals appear to be sufficient to detect modest percentage increases (25-35%) in uterine weight that have been observed for weak partial estrogen agonists (18,23). Both the wet and blotted uterine weight end points were sensitive in all protocols. The blotted weight was less variable and, qualified by the use of high-potency reference compounds, was first in a few instances to indicate a statistically significant difference at lowest doses. Furthermore, the Hill equation appears to model satisfactorily the dose response to provide additional perspective on ED 10 , ED 50 , and maximal dose responses within and among laboratories. Minor protocol refinements were identified. For example, protocol A was amended to allow a wider variation in body weights so that unnecessary animal use could be avoided. A body weight variation of ± 20% of the mean body weight (e.g., 35 g ± 7 g) was proposed as sufficient for the next phase of the study. Randomization among the groups will be maintained. Additionally, the age range of immature animals at first administration was expanded to 18-20 days. Protocol C was amended to lengthen the postoperative acclimatization period to 14 days. This allows further time for uterine regression, the use of the vaginal smears to confirm complete removal of the ovaries, and greater flexibility for laboratories in timing their experiments.
The current intent of the OECD program is to proceed with both the immature and the adult OVX models unless a substantive difference in the ability to detect estro-genic responses of the uterus is found. If confirmed as equivalent, they may both be considered for adoption as OECD Test Guidelines. The OECD is now implementing phase 2 of the program, which will entail demonstration of the protocols using weak estrogen partial agonists, such as genistein, o,p´-DDT, methoxychlor, nonylphenol, and bisphenol A. Phase 2 will continue to examine the performance of both immature and adult OVX animal models.