The OECD program to validate the rat uterotrophic bioassay. Phase 2: dose-response studies.

The Organisation for Economic Co-operation and Development has completed phase 2 of an international validation program for the rodent uterotrophic bioassay. The purpose of the validation program was to demonstrate the performance of two versions of the uterotrophic bioassay, the immature female rat and the adult ovariectomized rat, in four standardized protocols. This article reports the dose-response studies of the validation program; the coded single-dose studies are reported in an accompanying paper. The dose-response study design used five selected weak estrogen agonists, bisphenol A, genistein, methoxychlor, nonylphenol, and o,p -DDT. These weak agonists were administered in a prescribed series of doses to measure the performance and reproducibility of the protocols among the participating laboratories. All protocols successfully detected increases in uterine weights when the weak agonists were administered. Within each protocol, there was good agreement and reproducibility of the dose response among laboratories with each substance. Substance-specific variations were observed in the influence of the route of administration on the uterine response, the potency as related to the dose producing the first statistically significant increase in uterine weights, and the maximum increase in uterine weight. Substantive performance differences were not observed between the uterotrophic bioassay versions or among the standardized protocols, and these were judged to be qualitatively equivalent. It is noteworthy that these results were reproducible under a variety of different experimental conditions (e.g., animal strain, diet, housing, bedding, vehicle, animal age), indicating that the bioassay's performance as a screen is robust. In conclusion, both the intact, immature, and adult OVX versions, and all protocols appear to be reproducible and transferable across laboratories and are able to detect weak estrogen agonists.

These animal studies were performed in accordance with the OECD's guidelines on animal care (OECD 2000) and appropriate national regulations. Animal housing temperature was 22 ± 3C°, the relative humidity was between 30 and 70%, and lighting cycle was 12 hr light and 12 hr dark. If bedding was used, the type and supplier were recorded. Immature animals were group housed with two or three animals per cage, and housing practices for OVX animals were from one to two animals per cage. Feed and tap or filtered drinking water were provided ad libitum. The rats were fed the usual rodent diet of the participating laboratory, and the particular diet, the supplier, and the batch or lot number(s) of the diet were recorded. Laboratories did not change the diet during the validation program, and a sample of each lot was frozen and retained for phytoestrogen analyses. In those cases where multiple lots of diet may have been used in a laboratory, the same lot of diet was used for a given protocol. The dietary analyses and the relation of phytoestrogen levels to the uterotrophic bioassay's performance are reported separately .
Immature, intact animals. Immature animals, if externally supplied, were received either with dams or foster dams on approximately postnatal day (pnd) 14 (date of birth = pnd 0) or as weanlings on pnd 17. Animals were examined for overt signs of ill health and anomalies, and healthy animals were reacclimatized. Animals were allocated into treatment groups of six animals by randomization, ensuring that all groups of animals had a mean weight within ± 5% probability level. Test substance administration could begin at the choice of the participating laboratory on pnd 18, 19, or 20. Ovariectomized animals. At the time of ovariectomy, the animals were between 42 and 56 days of age. The dorsolateral abdominal wall was opened at the midpoint between the costal inferior border and the iliac crest and a few millimeters lateral to the lateral margin of the lumbar muscle. The ovaries were located, removed from the abdominal cavity, and detached by incision at the junction of the oviduct and each uterine horn. After confirming that no significant bleeding occurred, the abdominal wall was closed by suture, and the skin was closed, for example, by autoclips. The animals were allowed to recover and the uterus weight was allowed to regress for a minimum of 14 days before use.
Protocols. The individual protocols have been described previously (Kanno et al. 2001). Briefly, protocol A used intact, immature female rats as described above with dosing by oral gavage for 3 consecutive days. Protocol B used intact, immature female rats with dosing by sc injection for 3 consecutive days. Protocol C used young adult OVX rats as described above with dosing by sc injection for 3 consecutive days. Protocol D [previously called "protocol C" (Kanno et al. 2001)] also used young adult OVX rats and extended the sc injection dosing to a total of 7 days. As with phase 1, for demonstrating the basic toxicologic attribute of differences in chemical potency due to the route of administration, the VMG decided that only the immature version and the satellite study were adequate to conserve animals and resources. In all protocols, animals were humanely sacrificed 24 hr after the last dose administration.
Vehicle, test substance preparation, and dosing. Test substances were dissolved in a minimal amount of 95% ethanol and diluted to final working concentration in the test vehicle typically used by the participating laboratory (e.g., corn, arachis, sesame, or olive oil). If necessary, the test substance was dissolved with the assistance of gentle heating and vigorous mechanical assistance, for example, homogenized for several minutes in a rotor-stator homogenizer. As the literature indicated, the substances were stable, and most laboratories prepared the test substance weekly. The participating laboratories recorded the nature of the vehicle, the supplier, and lot number, and a sample of the vehicle was retained for analysis, if that became necessary.
Test substance administration was once per day for 3 consecutive days in three protocols (A, B, and C), and once per day for 7 consecutive days in a fourth protocol (D). The amount administered was calculated using the body weight (bw) of the animal recorded on the day of treatment. Treatment on each consecutive day was at approximately the same time and sequence for each animal. For oral gavage (protocol A and a single satellite study using oral gavage with OVX animals for 3 days), the total volume per rat per day did not exceed 5 mL/kg bw/day. For sc injection (protocols B, C, and D), the total amount of sc injection per rat per day did not exceed 4 mL/kg bw/day, and the maximum volume per injection site per rat did not exceed 0.2 mL. The precise method and volumes of administration by the individual participating laboratory were recorded. Animals were observed for clinical signs, the body weights were recorded daily to 0.1 g, any animals observed to be in distress were humanely sacrificed, and any animals found dead were disposed of.
Necropsy, dissection, and uterine weight. Twenty-four hours after the last treatment, the animals were humanely killed by the method routinely used by the participating laboratory in the same sequence as the test substance was administered. The uterus was carefully dissected, the ovaries of immature animals removed, and the uterus trimmed of fascia and fat to avoid loss of luminal contents. The uterus and cervix were removed by incision at the vaginal fornix to preserve the luminal fluid contents. The uterus was transferred to a marked, tared container with care to avoid desiccation. This first uterine weight (wet weight) included the luminal fluid contents and was recorded to the nearest 0.1 mg. Each uterus was then opened by piercing or longitudinal cuts into the uterine wall, and the luminal fluid was expressed with gentle pressure on moistened filter paper. The uterus was then weighed a second time (blotted weight), and the weight was recorded to the nearest 0.1 mg.
Study management and quality control. The study director was on the OECD staff, and each laboratory nominated a principal investigator as recommended by OECD Good Laboratory Practice and Study Management guidelines (OECD 2002). The laboratories were requested to perform these studies under these OECD Good Laboratory Practice guidelines and most, but not all, did so. When data were assembled and an initial statistical analysis performed, all laboratories were requested to audit these raw data and to respond to specific queries on outliers and questionable data. A small number of data corrections were made, and reporting errors on dilutions, samples, and identity of control groups were either corrected or clarified.
Statistics. The raw data uterine weights and body weights from each participating laboratory were recorded on a standardized electronic spreadsheet and submitted to an independent statistician for analysis. The uterine data were evaluated by an analysis of covariance approach with body weight at necropsy as the covariable. A variance-stabilizing logarithmic transformation was carried out on the uterine data prior to the data analysis. The Dunnett and Hsu test was used for making pairwise comparisons of each dosed group to vehicle controls and to calculate the confidence intervals. Studentized residual plots were used to detect possible outliers and to assess homogeneity of variances. The data were analyzed using the PROC GLM in the Statistical Analysis System (version 8; SAS Institute, Cary, NC). In addition to the ratio of the mean uterine weights (the treated groups relative to the vehicle control groups) in Tables 2-26, the ratio of the geometric means of the uterine weights (treated relative to the vehicle control) after adjusting for the body weight of the animal at necropsy was also calculated. as demonstrated with the strong agonist EE, when testing selected substances of lower estrogenic potencies. The primary objectives of the phase 2 dose-response studies were to demonstrate that participating laboratories could detect several selected weak estrogen agonists by a statistically significant increase in uterine weights, that the results were reproducible across laboratories, and that the animals would respond in a dose-related manner. The doses producing the first significant increase in uterine weights and the magnitude of the responses of these weak agonists were also to be compared with the potent reference estrogen, EE. Other objectives were to test whether the intact, immature version and the OVX version were generally equivalent in performance and their ability to detect the activity of weak agonists, and to quantify the variability of the dose response among laboratories and among protocols testing the equivalence of the protocols. The statistical analyses of these performance comparisons and determinations required a series of identical, prescribed doses for each test substance. If any laboratory was unable to detect the selected weak agonists, an effort would be made to determine the responsible factors.

Selection of Weak Agonists and Doses
The VMG selected five weak estrogen agonists: BPA, GN, MX, NP, and o,p´-DDT. For these substances, a) individual binding affinities to the estrogen receptor had been determined in a single laboratory, b) evidence from the literature was available for estrogenmediated activity in other in vitro assay systems, c) evidence from the literature was available that each weak agonist displayed positive response in the uterotrophic bioassay, and d) either subchronic or chronic testing data were available to indicate whether the compounds elicited estrogen-related effects, or such subchronic or chronic testing was in progress. Collectively, such data indicated that the selected substances were weak estrogen receptor agonists in vitro, were positive challenge substances for a validation study of the uterotrophic bioassay, and there were sufficient data for estrogen-related effects in higher tiers to assess the predictivity of the uterotrophic bioassay at the end of the validation program. These data are compliant with test substance selection recommendations to demonstrate the characteristics of a bioassay for validation studies and the relationship of a bioassay to other assays in a hierarchical, tiered approach (ICCVAM 1997;OECD 1998b). The chemical identities and estrogenreceptor binding data from Blair et al. (2000) and Branham et al. (2002) are shown in Table 1. The binding affinities of the selected weak agonists relative to 17β-estradiol cover a range of almost three orders of magnitude, for example, log -0.35 to -3.20, and even the two most potent selected agonists, GN and the metabolite of MX, are almost three orders of magnitude weaker than the reference EE agonist. Therefore, the selected weak agonists were judged to represent the range of potency that the uterotrophic bioassay would likely encounter in regulatory applications.
In addition, test substances were selected for expected differences in behavior in pharmacokinetic behavior to represent the variety of test substances likely to be encountered by the uterotrophic bioassay during use and to demonstrate differences between the oral and sc routes of administration observed in several pharmacokinetic studies below. Three test substances-BPA, GN, and NP-are reported to be rapidly eliminated and to undergo significant intestinal and hepatic conjugation, leading to a hypothesis of lower potency by the oral route of administration (Chang et al. 2000;Coldham and Sauer 2000;Fennell et al. 1998;Miyakoda et al. 2000;Müller et al. 1998;Pottenger et al. 2000). MX is reported to undergo hepatic activation, leading to a hypothesis of higher potency by the oral route of administration (Bulger et al. 1978). Finally, o,p´-DDT was selected because of the absence of a hydroxyl group necessary for rapid conjugation, and its persistence and bioaccumulation, leading to the hypothesis that it might display unique pharmacokinetic characteristics. Therefore, the selected weak agonists were judged to represent the range of test substance characteristics that the uterotrophic bioassay would likely encounter in regulatory applications.
As part of the overall design, five doses were recommended for each test substance. However, because of possible resource constraints, participating laboratories were required to use only the three intermediate doses. The VMG established a working group to review the scientific literature concerning each of the test substances, to consult researches for unpublished data, and then to select the doses for each substance and route of administration. Unfortunately, much of the background literature information from both published and "gray" sources did not report all necessary protocol details, use defined and closely interspersed doses, or consistently report the data as absolute uterine weight increases. Thus, the literature studies were not strictly comparable or unambiguous for dose-selection purposes. Because of the urgency and the complex logistics of an international validation program, the VMG decided to forego preliminary dose-setting studies. Therefore, the working group was required to rely upon its own expert judgment to recommend the dose levels, and risks were accepted that some laboratories might not achieve a complete dose-response curve.
To conserve animals and resources and to achieve a core of robust data for comparison, the VMG decided that priority in the dose-response work was to compare the results for NP and BPA. If additional laboratory resources were available, the remaining weak agonists, GN, MX and o,p´-DDT, would be examined. The doses recommended for the oral gavage studies were as follows: for BPA-60, 200, 375, 600, and 1,000 mg/kg/day; for 60,120,300,and 500 mg/kg/day;50,120,300,and 500 mg/kg/day;75,125,250,and 350 mg/kg/day;and for o,50,125,300, and 600 mg/kg/day. The doses recommended for the sc injection studies were as follows: for BPA-10, 100, 300, 600, and 800 mg/kg/day; for 15,35,50,and 80 mg/kg/day;100,250,500,and 800 mg/kg/day;15,35,80,and 100 mg/kg/day;and for o,25,50,100,and 200 mg/kg/day. All of the above doses were lower than the standard toxicologic limit dose of 1,000 mg/kg/day except for the final oral gavage dose of BPA, which was at the limit dose.

Results of Phase 2 Dose-Response Studies
A total of 86 dose-response studies were performed by 17 laboratories. Four other laboratories, which either participated in phase 1 (Kanno et al. 2001) or the coded single-dose studies in phase 2 (Kanno et al. 2003), did Abbreviations: IC 50 , the concentration of ligand that reduces the binding of native 17β-estradiol by 50%; RBA, relative binding affinity of the ligand to the native 17β-estradiol. a Data modified from tables in Blair et al. (2000) and Branham et al. (2002). b The binding curves were generated in a single laboratory on the basis of a single protocol using closely interspersed concentrations and performed in triplicate.
not participate in the dose-response studies. Because the laboratory numbers were kept consistent from 1 through 21 throughout the entire program, laboratories numbers 10, 16, 17, and 19 will not appear in this paper.

Mortalities, decreases in body weight or body weight gain, and clinical signs.
Out of a total of 2,652 animals, there were 45 mortalities observed in 10 laboratories: 5 in BPA studies, 6 in MC studies, 19 in NP studies, and 15 in DDT studies. Forty-two of the mortalities were in protocol A treatment studies using oral gavage. A dose-related pattern of modest reductions in body weights and diminished body weight gains was often observed in the immature animal studies and in the extending dosing of the OVX studies. Decreases in body weights at terminal sacrifice approaching or greater than 10%, indicating that the dose exceeded a maximum tolerated dose, were observed at doses of 100 mg BPA/kg/day and higher in both protocol D studies, at doses of 500 mg MX/kg/day and higher in both protocol D studies, at doses of 75 mg NP/kg/day and higher in 3 of 4 protocol A studies, and at doses of 300 mg DDT/kg/day in all protocol A studies. Clinical signs were reported in conjunction with the mortalities and body weight losses, including piloerection, lethargy and reduced mobility, and labored breathing.
Bisphenol A. A total of 22 dose-response studies were conducted with BPA, including 4 with protocol A, 10 with protocol B, 5 with protocol C, 2 with protocol D, and a satellite study using oral gavage with OVX animals. Twenty of 21 studies were successful in detecting increases in uterine weights at one or more of the prescribed doses. In the case of laboratory 21, the required terminal body weights were not recorded for the dose-response studies. Because the statistical analysis was based upon using terminal body weights as a covariant with the uterine blotted weight data, the body weight-adjusted statistical analysis was not performed on the data from this laboratory. However, the data such as mean wet and blotted uterine weights for laboratory 21 are reported in Table 3 and Figure 1, and these have been statistically compared without body weight adjustment.
Within each protocol, there was overall agreement among different laboratories both in the magnitude of the uterine weight increases and in the BPA doses first producing a statistically significant increase in uterine weight. In protocol A using oral gavage, all four studies detected statistically significant increases in uterine weights at lowest observed effect level (LOEL) doses of 375 mg BPA/kg/day (two studies), 600 mg/kg/day (one study), and 1,000 mg/kg/day (one study) ( Table 2). In protocol B, eight studies detected statistically significant increases in uterine weights at doses of 10 mg BPA/kg/day (one study), 100 mg/kg/day (three studies), 300 mg/kg/day (three studies), and 600 mg/kg/day (one study). However, in a ninth study, statistical significance was achieved at doses of 10 and 100 mg BPA/kg/day, no statistical difference was observed at 300 mg/kg/day, and statistically significant decreases in uterine weights were observed at 600 and 800 mg BPA/kg/day. Effectively, the reported dose response in this laboratory was the mirror opposite of the expectations and the results from all other laboratories (Table 3, laboratory 20). In protocol C, all five of the studies detected statistical significance at doses of 100 mg BPA/kg/day (Table 4). In protocol D, both studies detected statistical significance at doses of 100 mg BPA/kg/day ( Table 5). The satellite study with OVX animals using oral gavage administration did not detect statistically significant increases in uterine weight at the highest of the three intermediate doses used in that study, 600 mg/kg/day (i.e., the highest 1,000-mg BPA/kg/day dose was not tested in this laboratory with this protocol) ( Table 6).
The BPA results, except for the satellite study, are shown graphically in Figure 1. In protocols B, C, and D using sc injection, the ratio of the maximum mean uterine weights of the treated groups relative to the vehicle controls was generally between 3 and 4. The slope appeared to be steeper in the OVX animals, and the extension of the dosing to 7 days appeared to slightly increase the overall response. The maximum increase observed in uterine weights was considerably lower in protocol A, where the ratio of the maximum uterine weight increase to the vehicle controls was approximately 1.5 relative to the controls, and there was greater variability among the Mini-Monograph | OECD uterotrophic bioassay validation: dose-response studies Environmental Health Perspectives • VOLUME 111 | NUMBER 12 | September 2003 studies. Comparing protocols B and C, the dose-response curves among laboratories are somewhat more variable between the intact, immature animals and the OVX animals are not appreciably different, taking into consideration the larger number of laboratories conducting protocol B (Figure 1). Genistein. A total of 14 dose-response studies were conducted with GN, including 4 with protocol A, 4 with protocol B, 3 with protocol C, 2 with protocol D, and a satellite study using oral gavage with OVX animals. All studies in all protocols were successful in detecting increases in uterine weights at one or more prescribed doses.
Within each protocol, there was overall agreement among different laboratories both in the magnitude of the uterine weight increases and in the GN doses first producing a statistically significant increase in uterine weight. In protocol A using oral gavage, two studies detected statistically significant increases in uterine weights at LOEL doses of 20 mg GN/kg/day and the other two studies at doses of 60 mg/kg/day (Table 7). In protocol B, one study detected statistically significant increases in uterine weights at a dose of 1 mg GN/kg/day and the other three studies at doses of 15 mg/kg/day (Table 8). In protocol C, two of the studies detected statistical significance at doses Mini-Monograph | OECD uterotrophic bioassay validation: dose-response studies Environmental Health Perspectives • VOLUME 111 | NUMBER 12 | September 2003  Table 4. Uterine weights, body weights, and ratio of the relative increase in uterine weights for bisphenol A in protocol C.   of 15 mg GN/kg/day and another at a dose of 35 mg/kg/day (Table 9). In protocol D, both studies detected statistical significance at doses of 15 mg GN/kg/day (Table 10). The satellite study with OVX animals using oral gavage administration detected statistically significant increases in uterine weight at the lowest of the three intermediate doses used in that study, 60 mg/kg/day (i.e., the lowest 20-mg GN/kg/day dose was not tested in this laboratory with this protocol) ( Table 11). The GN results, except for the satellite study, are shown graphically in Figure 2. In protocol A using oral gavage, the ratio of the maximum mean uterine weights of the treated groups to the controls was generally between 2.5 and 3.5. In protocol B with intact, immature animals, the ratio relative to the controls was again 2.5 to nearly 4. In protocol C, the maximum induction was less, with the ratio approaching 2. In protocol D with extended dosing to 7 days, the response in the mature OVX animals reached an equivalent maximum response to the intact immature animals after 3 days of dosing.
Methoxychlor. A total of 14 dose-response studies were conducted with MX, including 4 with protocol A, 4 with protocol B, 3 with protocol C, 2 with protocol D, and a satellite study using oral gavage with OVX animals. All studies in all protocols were successful in detecting increases in uterine weights at one or more prescribed doses.
Within each protocol, there was overall agreement among different laboratories both in the magnitude of the uterine weight increases and in the MX doses first producing a statistically significant increase in uterine weight. In protocol A using oral gavage, three studies detected statistically significant increases in uterine weights at the LOEL dose of 20 mg MX/kg/day. Laboratory 12, however, used only the three intermediate doses and detected statistically significant increases in uterine weights at its lowest dose of 50 mg/kg/day, where the ratio of relative   Figure 1. Ratio of the mean blotted uterine weight in response to doses of BPA relative to the vehicle control group. (A) Participating laboratory results for protocol A using immature female rats, dosing by oral gavage for 3 consecutive days. (B) Participating laboratory results for protocol B using immature female rats, dosing by sc injection for 3 consecutive days. (C) Participating laboratory results for protocol C using adult OVX rats, dosing by sc injection for 3 consecutive days. (D) Participating laboratory results for protocol C using adult OVX rats and extending sc injection dosing to 7 days. In all cases, animals were humanely sacrificed 24 hr after the last dose administration, the uteri were removed and trimmed, and wet and blotted weights were recorded.   increase in uterine weight was already approaching 4 (Table 12). In protocol B, four studies detected statistically significant increases in uterine weights at the second dose of 100 mg MX/kg/day (Table 13). In protocols C and D, all studies detected statistical significance at the second dose of 100 mg MX/kg/day (Tables 14 and 15). The satellite study with OVX animals using oral gavage administration detected statistically significant increases in uterine weight at the lowest of the three intermediate doses used in that study, 50 mg/kg/day (i.e., the lowest 20-mg MX/kg/day dose was not tested in this laboratory with this protocol) ( Table 16). The MX results, except for the satellite study, are shown graphically in Figure 3. In protocol A, all studies at the lowest dose had ratios of the maximum mean uterine weights of the treated groups to the controls of 2 to 3.5. Thus, the selected doses were unable to indicate a minimal effective dose. In the case of MX, the oral route of administration was more sensitive than sc injection (Table 12). In protocols B, C, and D, the lowest dose producing a statistically significant increase in uterine weights was similar (Tables 13-15). However, protocol B produced a somewhat higher ratio of the maximum mean uterine weights relative to the controls of 2.5 to 3.5. The extended, 7-day dosing in protocol D did not lead to any increase in the maximum increase in uterine weights in the case of MX. With MX, the dose-response curves of protocol B appeared to be more variable than protocols C and D (Figure 3). The satellite study with OVX animals using oral gavage administration detected statistically significant increases in uterine weight at the lowest of the three intermediate doses used in that study, 60 mg/kg/day (i.e., the lowest 20-mg MX/kg/day dose was not tested in this laboratory with this protocol) (Table 16).
Nonylphenol. A total of 22 dose-response studies were conducted with NP, including 4 with protocol A, 10 with protocol B, 5 with protocol C, 2 with protocol D, and a satellite study using oral gavage with OVX animals. Three of the 21 NP studies were unsuccessful in detecting increases in uterine weights at any of the prescribed doses. Again, laboratory 21 did not record the required terminal body weights, and these studies could not be statistically analyzed using body weight adjustment. However, the wet and blotted uterine results are included in Table 18 and Figure 4, and these have been statistically compared without body weight adjustment.
Within each protocol, there was overall agreement among different laboratories both in the magnitude of the uterine weight increases and in the NP doses first producing a statistically significant increase in uterine weight. In protocol A using oral gavage, all four studies detected statistically significant increases in uterine weights at LOEL doses of 75 mg NP/kg/day (Table 17). In protocol B, seven of nine studies detected statistically significant increases in uterine weights at doses of 35 mg NP/kg/day (one study), 80 mg/kg/day (five studies), and 100 mg/kg/day (one study). One of two laboratories that failed to detect a significantly increased uterine weight used only the three intermediate doses and did not use the highest dose (Table 18). In protocol C, four of five studies detected statistical significant increases in uterine weights at doses of 35 mg NP/kg/day (one study), 80 mg/kg/day (one study) and 100 mg/kg/day (two studies) ( Table 19). The laboratory that failed to detect a significant increase in uterine weight used only the three intermediate doses and did not use the highest dose. In protocol D, both studies detected statistical significance at a dose of 35 mg NP/kg/day (Table 20). The satellite study with OVX animals using oral gavage administration detected statistically significant increases in uterine weight at the lowest of the three intermediate doses used in that study, 75 mg/kg/day (i.e., the lowest 15-mg NP/kg/day dose was not tested in this laboratory with this protocol) ( Table 21).
The NP results, except for the satellite study, are shown graphically in Figure 4. In protocol A using oral gavage, the ratio of the    Figure 2. Ratio of the mean absolute blotted uterine weight in response to doses of GN relative to the vehicle control group. (A) Participating laboratory results for protocol A using immature female rats, dosing by oral gavage for 3 consecutive days. (B) Participating laboratory results for protocol B using immature female rats, dosing by sc injection for 3 consecutive days. (C) Participating laboratory results for protocol C using adult OVX rats, dosing by sc injection for 3 consecutive days. (D) Participating laboratory results for protocol C using adult OVX rats and extending sc injection dosing to 7 days. In all cases, animals were humanely sacrificed 24 hr after the last dose administration, the uteri were removed and trimmed, and wet and blotted weights were recorded. maximum mean uterine weights of the treated groups to the controls was generally between 2 and 3 for the treated groups relative to the controls. The ratio of treated to vehicle groups was 1.5 to 3 in protocol B, a more modest 1.5 in protocol C, and 2 by extending the dosing to 7 days in protocol D. Again, dose-response curves among laboratories in protocol B appeared to be more variable than protocols C and D (Figure 4). o,p´-DDT. A total of 14 dose-response studies were conducted with o,p´-DDT, including 4 with protocol A, 4 with protocol B, 3 with protocol C, 2 with protocol D, and a satellite study using oral gavage with OVX animals. Thirteen of the 14 studies were successful in detecting increases in uterine weights at one or more prescribed doses. Within each protocol, there was overall agreement among different laboratories both in the magnitude of the uterine weight increases and in the o,p´-DDT doses first producing a statistically significant increase in uterine weight. In protocol A using oral gavage, one study Mini-Monograph | OECD uterotrophic bioassay validation: dose-response studies Environmental Health Perspectives • VOLUME 111 | NUMBER 12 | September 2003 a Ratio of arithmetic means of the treated blotted uterine weights relative to the vehicle control blotted uterine weights. b Ratio of geometric means of treated blotted uterine weights relative to the vehicle control blotted uterine weights after adjusting for body weights at necropsy as a covariable. c Lower and upper 95% confidence limits for ratio of blotted uterine weights based on body weights as a covariable. d One animal died in 120 mg MX/kg/day group before necropsy; one animal died in 300 mg MX/kg/day group before necropsy; three animals died in 500 mg MX/kg/day group before necropsy; and one animal also died in the vehicle control group before necropsy. *Level of significance, p < 0.05. detected statistically significant increases in uterine weights at a LOEL dose of 10 mg DDT/kg/day, and three studies at 50 mg/kg/day (Table 22). In protocol B, one study detected statistically significant increases at a dose of 100 mg DDT/kg/day, and the other three laboratories achieved statistical significance at doses of 200 mg/kg/day (Table 23). In protocol C, one study detected statistical significance at a dose of 50 mg DDT/kg/day and one study at a dose of 100 mg/kg/d (Table 24). The laboratory that did not achieve statistical significance used only the three intermediate doses and did not use the high dose of 200 mg/kg/day. In protocol D, one study detected statistical significance at a dose of 50 mg DDT/kg/day and the other at a dose of 100 mg/kg/day (Table 25). The satellite study with OVX animals using oral gavage administration detected statistically significant increases in uterine weight at the lowest of the three intermediate doses used in that study, 50 mg/kg/day (i.e., the lowest 10-mg DDT/kg/day dose was not tested in this laboratory with this protocol) ( Table 26).
The o,p´-DDT results, except for the satellite study, are shown graphically in Figure  5. In protocol A using oral gavage, the ratio of the maximum mean uterine weights of the treated groups to the controls was generally between 2.5 and 3.5 and plateaued at the second-highest dose of 300 mg/kg/day. In protocols B, C, and D, the ratio in uterine weights was approximately 1.5. Extending the dosing to 7 days did not lead to an apparent increase in the maximum induction in uterine weights. Within the sc protocols, there was no apparent difference in variability of the dose-response curves between the intact, immature, and OVX animals.

Discussion and Conclusions
Reproducibility of the dose response among laboratories within a given protocol was good. It is noteworthy that this reproducibility was Mini-Monograph | Kanno et al. 1540 VOLUME 111 | NUMBER 12 | September 2003 • Environmental Health Perspectives   achieved under a variety of different study conditions (e.g., strain, diet, housing protocol, bedding, vehicle); modest differences in the age of the immature animals (pnd 18-20), age at ovariectomy, and time of regression after ovariectomy; and a significant range in laboratory experience and proficiency (Table 27). For example, some laboratories have conducted uterotrophic studies for several years, whereas a number of others were conducting the bioassay for only for the first or second time. These variations and possible difference in experience would be expected to contribute to some degree of variability for a given protocol. In this light, the good reproducibility observed suggests that the uterotrophic bioassay itself is robust. This reproducibility is similar to that observed in phase 1 using the potent reference estrogen EE (Kanno et al. 2001). In addition, the uterine increase is observed even under conditions of severe systemic toxicity, as evidenced by mortalities and decreases in body weights sometimes greater than 10% (Tables 2D, 4D, 5A, and 6A). This easily observed response at doses exceeding the maximum tolerated dose further supports that the uterotrophic assay is a robust screen for detecting possible estrogen agonists. For all protocols, the blotted uterine weights appeared to show less interlaboratory and intragroup variability than uterine wet weight. This suggests that the blotted weight will provide greater power for detecting uterotrophic effects than wet weight.    Figure 3. Ratio of the mean absolute blotted uterine weight in response to doses of MX relative to the vehicle control group. (A) Participating laboratory results for protocol A using immature female rats, dosing by oral gavage for 3 consecutive days. (B) Participating laboratory results for protocol B using immature female rats, dosing by sc injection for 3 consecutive days. (C) Participating laboratory results for protocol C using adult OVX rats, dosing by sc injection for 3 consecutive days. (D) Participating laboratory results for protocol C using adult OVX rats and extending sc injection dosing to 7 days. In all cases, animals were humanely sacrificed 24 hr after the last dose administration, the uteri were removed and trimmed, and wet and blotted weights were recorded.

Ratio of uterine weight (test/control)
Lower and upper 95% confidence limits for ratio of blotted uterine weights based on body weights as a covariable. b The recorded decrease in uterine weights from the control vehicle group was statistically significant. c With the upper 95% confidence limit not < 1.0, the result is not statistically significant. d Terminal body weights were not recorded by the laboratory. e The blotted uterine weights were analyzed without body weight adjustments and were found to be statistically significant. *Level of significance, p < 0.05.
in both phases 1 and 2 and, in most cases, with the weak agonists in phase 2 (data not shown). Combining the observations of lower variability and the intermittent limited increase in uterine weights with weak estrogen agonists, the blotted weight appears to be the metric of choice. Despite the excellent overall agreement among laboratories within protocols, there was some variability concerning the actual doses at which statistical significance was first achieved. This variability in the dose first achieving statistical significance was greatest for BPA in protocols A and B. Here, the dose range was about 3-fold for protocol A Mini-Monograph | OECD uterotrophic bioassay validation: dose-response studies Environmental Health Perspectives • VOLUME 111 | NUMBER 12 | September 2003  Table 19. Uterine weights, body weights, and ratio of the relative increase in uterine weights for NP in protocol C.   (375-1,000 mg BPA/kg/day) and 60-fold for protocol B (10-600 mg BPA/kg/day), whereas protocols C and D first achieved statistical significance at the dose of 100 mg BPA/kg/day in all studies.
The doses at which the weak estrogen agonists first reach statistical significance were far in excess of those determined for the potent reference estrogen EE in phase 1. By oral gavage (protocol A), 16 laboratories achieved statistical significance in phase 1 at either 0.3 or 1 µg EE/kg/day (Kanno et al. 2001). By contrast in phase 2, doses of the weak agonists ranged from 1,000-to 10,000fold higher as estimated for MX and shown for GN, respectively, to over 300,000-fold higher for BPA. Similar disparities are observed with sc injection, as statistical significance with EE was achieved at 0.1 or 0.3 µg EE/kg/day in phase 1 (Kanno et al. 2001) and from 5 to 200 mg/kg/day with the weak agonists in phase 2, including with extended dosing in protocol D.
As expected for weak estrogen agonists, the maximum increase in the uterine weights was also generally less than that observed for EE in phase 1 of the validation program. The maximum relative ratio responses of EEtreated were 4 to 5 in protocol A, 4.5 to 6 in protocol B, 3.25 to 5 in protocol C, and approximately 4 in protocol D (Kanno et al. 2003). The maximum uterine weights reached by the weak estrogen agonists in these phase 2 studies were route, protocol, and test substance dependent, as is apparent by comparing the data in Figures 1-5.
Differences were found between the routes of administration in study responsiveness, i.e., the dose producing the first statistically significant increase in uterine weight, from test substance to test substance. Although many parties might choose to use the term "sensitivity" rather than "responsiveness," validation experts have used the term sensitivity for a measure of assay performance: the proportion of all positive chemicals that are correctly  Figure 4. Ratio of the mean absolute blotted uterine weight in response to doses of NP relative to the vehicle control group. (A) Participating laboratory results for protocol A using immature female rats, dosing by oral gavage for 3 consecutive days. (B) Participating laboratory results for protocol B using immature female rats, dosing by sc injection for 3 consecutive days. (C) Participating laboratory results for protocol C using adult OVX rats, dosing by sc injection for 3 consecutive days. (D) Participating laboratory results for protocol C using adult OVX rats and extending sc injection dosing to 7 days. In all cases, animals were humanely sacrificed 24 hr after the last dose administration, the uteri were removed and trimmed, and wet and blotted weights were recorded.  classified as positive in an assay (ICCVAM 1997). Therefore, the term responsiveness is used herein. The striking observation was that these differences in responsiveness were test substance specific. For BPA, sc injection achieved statistical significance at lower doses than oral gavage, and the maximum induction was consistently higher by sc injection. Likewise for GN, most sc studies achieved statistical significance at a somewhat lower dose, 15 mg GN/kg/day, and with greater consistency than oral gavage. For NP, the majority of oral gavage and sc studies achieved statistical significance at similar doses, 75 mg NP/kg/day and 80 mg/kg/day, respectively. For MX, the majority of oral gavage studies achieved statistical significance at the lowest doses tested and were near their maximum induction at 20 mg MX/kg/day, whereas sc injection doses were higher and the maximum uterine weight increase was lower. For o,p´-DDT, oral gavage produced statistical significance at lower doses and higher maximum responses. In the sc administration studies, an overall difference was not discernable between the intact, immature version (protocol B) or the adult OVX version (protocol C). Additionally, the satellite oral gavage studies using OVX animals produced results similar to the intact, immature animals in protocol A in both the maximum fold induction of the uteri and the first dose reaching statistical difference. Collectively, these results suggest that no route of administration with various agonists will be consistently the most sensitive. The substantial equivalence of the results indicates that the choice of route of administration will then depend on the purpose for which the assay is used, such as detecting the activity of a substance at the lowest minimal effective dose or providing a route of administration relevant for human and wildlife exposure. Five dose-response studies of 84 (not including the two studies in the laboratory that did not record terminal body weights) did not observe statistically significant increases with three substances: NP (three studies), BPA (one study), and o,p´-DDT (one study). These three substances are the lowest estrogen receptor-binding affinities, once MX metabolism in the liver to dihydroxymethoxychlor (HPTE) is considered (see Table 1 for binding affinities of each substance, including HPTE). A closer examination of the circumstances and data has been made to see if these cases were approaching statistical significance or what other circumstances may have intervened to prevent detection of statistical significance (Table 28). In these studies, statistical significance is achieved when the lower 95% confidence interval for the mean of the test substance is > 1.0-fold induction of the uterine weight.
Given that literature data and expert judgment were used to select the doses; that no range-finding studies were performed; that the range of the doses used was sometimes only a little more than an order of magnitude; and that these laboratories did not include the highest doses in their studies, these few studies lacking statistical significance may have been anticipated because of program design and not the performance shortcomings of the bioassay. In four of five cases, the studies did not test the highest dose of the five prescribed doses, reducing the opportunity for detecting a statistically significant response. These four cases are examined in detail.
In the first case involving NP in protocol B, the mean control blotted uterine weight in laboratory 6 was 58.0 mg, where the vehicle control means in most other immature control groups were < 40 mg. This would, in theory, be expected to diminish the study's responsiveness. Despite this possible impediment, the lower 95% confidence interval for the relative ratio for uterine weight increase was 0.91, indicating that the study was approaching statistical significance. In comparison with other protocol B NP studies, six of nine studies achieved statistical significance with uterine weight increases at 35 or 80 mg NP/kg/day, and a seventh achieved statistical significance at the highest dose of 100 mg NP/kg/day.
In the second case involving NP in protocol C, laboratory 6 was again approaching    Figure 5. Ratio of the mean absolute blotted uterine weight in response to doses of o,p'-DDT (DDT) relative to the vehicle control group. (A) Participating laboratory results for protocol A using immature female rats, dosing by oral gavage for 3 consecutive days. (B) Participating laboratory results for protocol B using immature female rats, dosing by sc injection for 3 consecutive days. (C) Participating laboratory results for protocol C using adult OVX rats, dosing by sc injection for 3 consecutive days. (D) Participating laboratory results for protocol C using adult OVX rats and extending sc injection dosing to 7 days. In all cases, animals were humanely sacrificed 24 hr after the last dose administration, the uteri were removed and trimmed, and wet and blotted weights were recorded.