Quantitative comparisons of in vitro assays for estrogenic activities.

Substances that may act as estrogens show a broad chemical structural diversity. To thoroughly address the question of possible adverse estrogenic effects, reliable methods are needed to detect and identify the chemicals of these diverse structural classes. We compared three assays--in vitro estrogen receptor competitive binding assays (ER binding assays), yeast-based reporter gene assays (yeast assays), and the MCF-7 cell proliferation assay (E-SCREEN assay)--to determine their quantitative agreement in identifying structurally diverse estrogens. We examined assay performance for relative sensitivity, detection of active/inactive chemicals, and estrogen/antiestrogen activities. In this examination, we combined individual data sets in a specific, quantitative data mining exercise. Data sets for at least 29 chemicals from five laboratories were analyzed pair-wise by X-Y plots. The ER binding assay was a good predictor for the other two assay results when the antiestrogens were excluded (r(2) is 0.78 for the yeast assays and 0.85 for the E-SCREEN assays). Additionally, the examination strongly suggests that biologic information that is not apparent from any of the individual assays can be discovered by quantitative pair-wise comparisons among assays. Antiestrogens are identified as outliers in the ER binding/yeast assay, while complete antagonists are identified in the ER binding and E-SCREEN assays. Furthermore, the presence of outliers may be explained by different mechanisms that induce an endocrine response, different impurities in different batches of chemicals, different species sensitivity, or limitations of the assay techniques. Although these assays involve different levels of biologic complexity, the major conclusion is that they generally provided consistent information in quantitatively determining estrogenic activity for the five data sets examined. The results should provide guidance for expanded data mining examinations and the selection of appropriate assays to screen estrogenic endocrine disruptors.

Evidence that certain man-made chemicals have the ability to disrupt the endocrine systems of vertebrates by mimicking endogenous hormones has, in recent years, sparked intense international scientific discussion and debate (1). A growing national concern has resulted in legislation, including reauthorization of the Safe Drinking Water Act and passage of the 1996 Food Quality Protection Act, mandating that the U.S. Environmental Protection Agency (EPA) develop a screening program for endocrine-disrupting chemicals (EDCs) (2,3). Under this requirement, at least 15,000 existing chemicals will be experimentally evaluated for their potential to disrupt activities in the estrogen, androgen, and thyroid hormone systems. A high-throughput prescreen assay that uses a reporter gene system may be used to prioritize chemicals for screening (4). The battery of in vitro and short-term in vivo screening assays should optimize hazard identification and provide guidance for subsequent longer term, more definitive in vivo tests for toxicity (5).
Although endocrine disruption can result from a variety of biologic mechanisms, more data exist for estrogens than for the other classes of activity (6, K. Because in vivo bioassays are time consuming and labor intensive, a battery of in vitro and short-term in vivo assays are proposed to be used as a first screen for estrogenicity (4). Estrogens regulate the expression of specific genes and the secretion of certain hormones, and coordinate diverse processes such as cell proliferation, cell differentiation, and tissue organization through pleiotropic actions. Once estrogens reach the bloodstream, they may remain free or bind to serum estrogen-binding proteins like tx-fetoprotein (AFP) in rodents (8,9) or sex hormone binding globulin (SHBG) in humans (9). Only the free (unbound) hormone is able to diffuse into the target cells, where it binds to the estrogen receptor (ER) to form a hormone-receptor complex. The prevailing model suggests that this complex then interacts with an estrogen response element (ERE) of a target gene and with the transcriptional machinery (10,11). Other estrogenic effects, such as the secretion of prolactin, are thought to be mediated by the ER through extranuclear mechanisms that do not involve transcription (12,13). In contrast, the mechanisms underlying the proliferative effect of estrogens are still poorly understood (14), despite the fact that this effect is considered the hallmark of estrogen action, and is the most sensitive and specific marker of in vivo estrogenic activity (15).
A series of in vitro assays have been developed for the detection of potential estrogens at several steps in the predominant mechanism of action. Most of these assays fall into one of three categories: a) ER competitive binding assays that measure the binding affinity of a chemical for ER; b) reporter gene assays that measure ER binding-dependent transcriptional and translational activity; and c) cell proliferation assays that measure the increase in cell number of target cells during the exponential phase of proliferation. Thus, these assays function at different levels of biologic complexity. The features, performance, and use of these assays in screening for estrogenic activities of endocrine disruptors have been reviewed and discussed elsewhere (16)(17)(18)(19).
Although numerous data sets exist in the literature for various estrogenic compounds tising several in vitro assays, the relationship between these assays has not been fully explored. The literature generally focuses on comparing performance of assays for individual pairs of chemicals. Few studies have investigated the relationship among assays in quantitatively detecting estrogenic activities of chemicals with a wide range of structural diversity and activity. This relationship is important, considering that the assays traverse various levels of biologic complexity and are being used to detect and characterize potential estrogens. Because the consistency and utility of the information from the various assays are unclear, data mining techniques can be used to consolidate individual data sets and examine the combined data (20).
Data mining techniques have been rapidly developed in the areas of genomics, clinical study, bioinformatics, cheminformatics, and other research areas (21). It is an emerging interdisciplinary research field that intersects with computer science (database, artificial intelligence, graphics, and visualization), statistics, and numerous scientific areas for obtaining new knowledge. Generally, data mining comprises a number of processes (22): a) develop an understanding of the scientific area; b) create a target data set on which discovery is to be performed; c) evaluate, clean, and preprocess the data; a) reduce the data; e) choose a data mining task (decide whether the goal is classification, regression, clustering, description, modeling, etc.); fJ choose a data mining algorithm; g) mine the data (i.e., search for patterns); h) interpret mined patterns; and :) consolidate the discovered knowledge (incorporate it into decision system, report, etc.).
Data mining can provide new insights, such as predictability across assays, assay strengths and weaknesses, and assay specificity and sensitivity, for the detection of a variety of classes of chemical structures with estrogenic activity. In this paper we report a data mining approach to investigate the estrogenicity of structurally diverse chemicals across three levels of biologic complexity. We used four published data sets (23)(24)(25)(26) and one new data set (27) from three in vitro assays for this study.

Materials and Methods
Criteriafor selecting data sets. To draw valid conclusions from the analysis, we selected the data sets for comparison based on the following criteria (analogous to data mining steps a), b), c), and d) mentioned above). 1) To ensure the assay's applicability across a broad range of chemical structures and activity levels, data derived from the assay must comprise a minimum of approximately 30 compounds that include various estrogenic chemical classes, such as steroids, synthetic estrogens, phytoestrogens, organochlorines, alkylphenols, mixed or partial agonists/antagonists (type I antiestrogens, such as tamoxifen), complete or pure antagonists (type II antiestrogens, such as ICI 164,384) (28), and other environmental chemicals (e.g., bisphenol A, phthalates). 2) To enable a valid comparison and reach a statistically significant conclusion, the data sets from each assay should include sufficient numbers of shared chemicals (at least 10 of the above) for cross-comparison. These chemicals should represent each primary chemical class listed in 1) and the range of biologic activity as measured by relative binding affinity (RBA; ER binding assays), relative potency (RP; yeast assays) or relative proliferation potency (RPP; E-SCREEN assay) should span at least 4 orders of magnitude (10,000-fold).
Based on these criteria, we selected five data sets for analyses: the ER binding assay data from Waller et al. (23) and Kuiper et al. (24), the yeast assay data from Coldham et al. (25) and Gaido et al. (26), and the E-SCREEN assay data from Soto and colleagues (19,27,29). Part of the E-SCREEN assay data [estradiol, ethinylestradiol, testosterone, diethylstilbestrol (DES), methoxychlor, o,p'-DDT, p,p'-DDE, ICI 182,780, butylbenzylphthalate, and bisphenol A] used in this study have been reported by Andersen et al. (19) and were provided by the Soto lab. Anderson et al. (19) provided the details of the experimental procedure. Progesterone and atrazin were reported by Soto et al. (29).
We obtained the remaining E-SCREEN data using the same assay conditions (27).
Because the E-SCREEN assay data presented here were collected using the same previously peer reviewed assay method, they constitute a self-consistent data set. Data sets covered each common category of in vitro assays that traverse different levels of biologic complexity.
Endpoint units. The absolute concentrations at which estradiol induced half-maximal activities were different for each assay type. To make direct comparisons between assays, we compared the relative activity of each chemical to the reference endogenous ligand 17,B-estradiol (E2). Specifically, the RBA in the ER binding assay is the ratio of the molar concentration of E2 to that of the competing chemical required to decrease radiolabeled E2-receptor binding by 50%, which is then multiplied by 100. Thus, by definition, E2 has an RBA of 100. Inhibition constants reported by Waller et al. (23) were converted to RBAs using the Cheng-Prusoff equation (30). The log RBA for Waller et al. For the yeast assay, Coldham et al. (25) computed the RP of the test compounds in their data set by dividing the concentration of E2 giving 50% induction of 0-galactosidase activity (EC50) by the EC50 of the test compounds, and then multiplying these values by 100. The activity log RP for the Coldham et al. data set ranges from 2.00 to -4.52. Gaido et al. (26) determined the EC50 for each ligand by fitting the dose-response data to the Hill equation and again computing the RP by dividing the E2 EC50 by the EC50 of the test compounds, multiplied by 100. The log RP value for this data set (26) ranges from 2 to -5.38. In both cases, the RP value for E2 was 100 by definition. The relative inductive efficiency (RIE) in the yeast assay is the ratio between maximal 3-glactosidase activity achieved with the test compound and that of E2, multiplied by 100. By definition, E2 has an RIE of 100.
The RPP for the E-SCREEN assay is the ratio of the concentration of E2 needed for 50% of maximal cell yield to the dose of the test compounds required to achieve a similar effect, multiplied by 100. The RPP value for E2 is thus set to 100. The log RPP value ranges from 2.05 to -4.08. The relative proliferative effect (RPE) is the ratio between the highest cell yield obtained with the test chemical to that obtained with E2, multiplied by 100. The RPE value for E2 is by definition 100. The RPE distinguishes between full agonists (RPE = 100) and partial agonists (RPE < 50) (31) and is formally analogous to the RIE in the yeast assay. New data reported here for the E-SCREEN was collected as previously described (19).
United data sets. Table 1 shows the data from the five sources used for comparison. Data are shown only for those compounds for which data are available from at least two of the five different references, and are listed as log RBA, RP, or RPP to enable plotting over the observed range of about 6 orders of magnitude. To attain the maximum number of chemicals for comparison, we developed united data sets (Table 1) separately for both ER binding assays and yeast assays; it is the united data sets that we will compare across the assays. These united data sets were built by first selecting a primary data set for each assay type and then adding data from the other data set as follows: * For each compound in the primary data set, we used the actual value. We selected the data sets of Waller et al. (23) and Coldham et al. (25) as primary data sets for the ER binding and yeast assays, respectively, because they include more chemicals and chemical classes than the other data sets. * For each compound not in the primary data set, we calculated the value for the united data set from the correlation equations y = 0.93x -0.24 ( Figure 1) by adding the data of Kuiper et al. (24) to the data sets of Waller et al. (23) for the ER binding assay, and y = 1.14x -0.14 ( Figure 2) by adding the data of Gaido et al. (26) to the data set of Coldham et al. (25) for the yeast assay.

Results
Correlation between Similar Assays ER competitive binding assay. We selected two ER binding assays for analysis: one that uses receptor transcribed from recombinant human ER-a complementary DNA (cDNA) (24), and one that uses the receptor from mouse uterine cytosol ( (active in both data sets) for inclusion in this regression. Figure 1 shows a good linear correlation (r2 = 0.88) for these chemicals, which indicates a limited range of RBA variability despite the species and ER subtype differences between the ER sources. The chemicals with the largest disparity are coumestrol and genistein, which may reflect different affinities for the human versus mouse receptors. If these two chemicals are omitted from the comparison, the correlation coefficient is much higher (r2 = 0.98).
Yeast-based reporter gene assay. For the analysis, we used two yeast assay data sets, which appear to be identical recombinant yeast cell assays (25,26); both contain an expression plasmid with a CPU1 metallothionein promoter fused to the human ER cDNA and a promoter plasmid containing two Xenopus vitellogenin EREs upstream of the structural gene for j-galactosidase. A quantitative comparison of the data sets enables evaluation of the consistency and replicability of this particular assay. There are 13 common chemicals in the data sets, but we included in the regression only the 10 chemicals that were active in both data sets. These chemicals represent diverse chemical classes and have RP values that range over 106-fold (log RP from -4.5 to Table 1  2.0). Figure 2 shows a high correlation coefficient (r2 = 0.91), which is a strong indicator of good reproducibility across different studies. However, we observed a large discrepancy for methoxylchlor, which may be due to sources of error such as chemical impurities (3Z. In addition, we examined the relationship between the log RP and the RIE for the data set of Coldham et al. (25) (Figure 3). The values for the two partial agonists tamoxifen and 4-OH-tamoxifen lie close to the regression line (r2 = 0.78) and have RIEs of approximately 50%. Estradiol derivatives, DES derivatives, phytoestrogens, and polychlorinated biphenyls (PCBs), which are relatively strong estrogens, have RIEs > 50% of E2. In contrast, those lower potency chemicals, such as androgens, alkylphenols, DDTs (except methoxychlor), and phthalates have RIEs < 50% of E2. Six of the lowest potency chemicals had RIEs of 0.8-5.3%. This indicates that a chemical with lower potency tends to have lower inductive efficiency in this particular assay condition.
Comparison Between Different Assays ER competitive binding assay versus yeastbased reporter gene assay. The ER binding assay directly measures the RBA of ligands for ER, whereas the reporter gene response includes effects from not only ligand-ER binding but also ER-ERE interactions, transcriptional complex effects, and translational effects. Although different in the nature and biocomplexity of their end points, both assays measure receptor-ligand interaction, for which the RBA is a direct measure and the RP is an indirect measure. Figure 4 shows a plot of log RP versus log RBA, which is constructed by using the united data sets (see "Materials and Methods") to reduce redundant data points and to increase the number of chemicals for comparison. In general, except for the five antiestrogens, the two assays correlate very well for estrogenic agonists. The five antiestrogens are conspicuous outliers with RPs 100to 1,000-fold lower than would be expected from their RBAs. These antiestrogens traverse a wider range of RBAs (-25-fold) than the 3-fold range for RP, indicating that the yeast assay has limited and relatively constant sensitivity to these antiestrogens (25). The r2 value (0.53) for the comparison of the ER binding assay and the yeast assay was much lower when the antiestrogens were included ( Table 2). With the antiestrogens excluded, we obtained a good linear relationship (r2 = 0.78) between the ER binding and yeast assays across all chemical dasses (Table 2 and Figure 5). Inspection of data points for individual compounds shows good agreement for steroids and synthetic estrogens, indicating that their binding to the ER is both the initiating and rate-determining mechanism for these estrogen agonists. Additionally, a reasonable, but less good, linear correlation exists for the seven chlorinated chemicals. However, a relatively large disparity is found for several chemicals, induding 4-tert-octylphenol, dihydrotestosterone (DHT), and o,p'-DDT. Although many factors could cause the disparities, chemical impurities may be a possible source of disagreement between assays.
ER competitive binding assay versus the E-SCREEN assay. In the E-SCREEN assay, estrogens recruit MCF-7 cells into the cell cycle. Estradiol exponentially increases the cell number (doubling time = 36 hr) (33). We found a very good linear relationship (r2 = 0.86) between the E-SCREEN and ER binding assays across all chemical classes and across a 106-fold range (log RPP from -4.08 to 2.05) of activity values (Table 2, Figure 5), which is in agreement with the observation of Weise et al. (34) for a set of steroids. The r2 of 0.86 is virtually identical to the value of 0.85 without partial agonists and antagonists. The two natural estrogens estradiol and estriol had a higher proliferative activity than the activity predicted by their receptor binding affinity, which is in agreement with previous observations (35). The partial and mixed antiestrogens-tamoxifen, 4-OH-tamoxifen, nafoxidine, and clomiphene-also had relatively high RPPs that correlated well with their RBAs in the ER binding assay, but their RPEs were much lower than that observed with estradiol ( Figure 6), indicating their partial agonist effect. In contrast, the two pure antiestrogens, ICI 164,384 and ICI 182,780, have a higher binding affinity in the ER binding assay and induced no response in the E-SCREEN assay; therefore, they could not be plotted in Figure 5. Thus, a pure antiestrogen can be identified using a combination of the ER binding and E-SCREEN assays.
E-SCREEN assay versus yeast-based reporter assay. The correlation between the E-SCREEN assay and the yeast assay is shown in Figure 7. The r2 was 0.56 when antiestrogens were included and 0.72 when they were excluded ( Table 2). Similar to the comparison between the ER binding assay and the yeast assay, the antiestrogens were outliers. Several organochlorines and estriol were significant outliers. The correlation between the ER binding assay/E-SCREEN pair was stronger than that for the E-SCREEN/yeast assay pair.

Discussion
Our results present quantitative comparisons between three different assay types. Each assay measures different end points at different levels of biologic complexity of estrogen action (i.e., receptor binding, expression of a VOLUME 108 1 NUMBER 8 1 August 2000 * Environmental Health Perspectives reporter gene, and cell proliferation). The comparisons allow conclusions to be drawn concerning the characteristics and performance of these assays individually and in pair-wise combinations.
Chemicals that exhibit estrogen-like activity have a very broad range of structural diversity (23,25,29). A common structural feature for steroids, DES derivatives, and most phytoestrogens is the presence of two rings (one of them usually a phenolic ring) separated by two carbons. Chemicals with two rings either separated by one carbon atom (DDTs and bisphenol A derivatives), connected directly (PCBs), or possessing only one ring (alkylphenols, phthalates, kepone) typically have relatively lower binding affinities as compared to chemicals with two rings with two atoms separating them. The chemicals in this study cover all these structural features as well as activity measures based on RBA, RP, and RPP that traverse 4 to over 6 orders of magnitude.
The linear relationships for estrogen binding or activity among the three assays are generally consistent. This supports the idea that ER binding is a major determinant or rate-determining step in the assays using living cells.
A literature survey revealed a great variability in binding data for certain compounds (36). For example, an approximately 10-fold range of binding affinity has been observed for E2 in various species (7), and p,p'-DDT binds to the human ER but not to the rat ER (37). However, similar vari-ability exists for binding affinities in assays conducted in different laboratories, even though the same species were used. For example, genistein shows a 20-fold difference for RBA values in MCF-7 cells between the observations reported by Martin et al. (38) and Zava et al. (39), whereas nonylphenol showed a 10-fold RBA difference between the findings reported by Waller et al. (23) and Shelby et al. (40) in mouse uterine cytosol. These considerations make it difficult to distinguish experimental deviations from species differences for RBAs when an individual chemical is compared. Based on the good linear correlation of log RBAs for two data sets from different species (human and mouse), we found that speciesrelated differences in RBA are not high for most ligands examined (Figure 1). These results are consistent with the findings from a quantitative structure-activity relationship (QSAR) model used to extrapolate across species (41). However, we observed relatively larger disparities for phytoestrogens in the cross-species comparison, which may be due to binding differences between pure human ER-o used by Kuiper et al. (24) and the mixture of predominately ca and a small amount of P isoforms in rodent uterine cytosol (42) used by Waller et al. (23). Similar observations have been reported regarding species and receptor subtype sensitivity of phytoestrogens (43,44).
Antiestrogens are chemicals that antagonize the actions of estrogens through several possible mechanisms (45). Six antiestrogens examined here inhibit E2-induced responses through interactions with ER. Tamoxifen, 4-OH tamoxifen, clomiphene, and nafoxidine are partial agonists/antagonists (type I antiestrogens), and ICI 164,384 and ICI 182,780 are pure antagonists (type II antiestrogens). Because the yeast assay does not directly measure antagonist activity, antiestrogens could be mistaken to be weak agonists only. Moreover, a good correlation between a chemical's log RP and RIE (Figure 3) indicates that the measurement of a chemical's efficiency in the yeast assay also cannot distinguish the partial agonist activity of type I antiestrogens from strong or weak agonists. Thus, like the ER binding assay, the yeast assay alone cannot identify antiestrogens. However, the activity of a chemical in the yeast assay as compared to the ER binding assay allows the discrimination of the partial agonist activity of type I or II antagonists from that of full agonists. Specifically, chemicals that have RBAs approximately 2 log log RPP Figure 6. The correlation of log RPP against RPE for the E-SCREEN assay. The dashed lines are the RIE at 50% of E2 and the log RP at -1.5, respectively. Log RBA (competitive ER binding assay) Log RBA (competitive ER binding assay) Figure 4. Comparison of the ER competitive bind- Figure 5. Comparison of the ER competitive binding assay to the yeast assay. The united data sets ing assay with the E-SCREEN assay. The united were used for both assays. The r2 = 0.78 (Table 2) data set was used for ER binding assay. The r2 = and the regression equation is y = 0.77x + 0.83 0.85 (Table 2) and the regression equation is y = (with antiestrogens excluded). 0.98x-1.35 (without inclusion of antiestrogens). Table 2. Summary of correlation coefficients for in vitro assay comparisons. Cross in vitro assay comparison r2 without antiestrogens r2 with antiestrogens Binding assay vs. yeast assay (Figure 4 (Table  2) and the regression equation is y = 0.88x + 0.19 (with antiestrogens excluded).
Environmental Health Perspectives * VOLUME 108 1 NUMBER 8 1 August 2000 units higher than predicted from the yeast assay have the potential to be a type I or II antiestrogen. This approach could be useful in a drug discovery for screening potential antiestrogens in a high throughtput mode. It is important to note that some antiestrogenic chemicals do not act via ER binding. For example, aryl hydrocarbon receptor agonists, such as dioxins and PCBs, also act as antiestrogens (46)(47)(48) by causing down-regulation of ER (49), thus decreasing with DNA binding (50). Such antiestrogens cannot be identified by comparing binding and yeast assays. The E-SCREEN assay measures the RPE, whereas the same measure in the reporter gene assay is the RIE. Although the RPP value of 4-OH-tamoxifen in the E-SCREEN assay is close to its RBA value in the binding assay, its RPE is approximately 25% of estradiol, which is similar to the response in the rat uterotrophic assay (51). Thus, in the E-SCREEN assay, type I antiestrogens are detected as partial agonists (tamoxifen, 4-OH-tamoxifen, clomiphene, and nafoxidine), whereas type II antiestrogens are inactive (ICI 164,384 and ICI 182,780). Hence, type II antagonists are active in the ER binding assay and inactive in the E-SCREEN assay. Although it is possible to infer antagonistic activity by comparing the behavior of these compounds in the three assays discussed herein, antagonistic activity must be verified by demonstrating the ability of a chemical to inhibit estrogen action in vivo.
There were some apparent discrepancies in the detection of activity using the three in vitro assays. Specifically, except for the two ICI chemicals, nine chemicals were shown to be inactive in one of the three assays (indicated in bold in Table 1); these are steroidal chemicals, organochlorines, and phthalates.
Progesterone was inactive in all three assays, indicating that it could be used as a negative control for these assays. 4-Androstenedione and atrazine showed consistent undetectable activity in at least two assays. P-Sitosterol, butylbenzylphthalate, di-n-butylphthalate, p,p'-DDE, p,p'-DDT, and lindane gave inconsistent responses in at least two assays. All of the inconsistent chemicals had low activity in the assays that detected activity, and they had nondetectable activity or an activity listed as less than a cutoff value in the other assays. Among the six inconsistent chemicals, only the two phthalates showed positive activity in the ER binding assay. In contrast, three of the five inconsistent chemicals in the yeast assay (P-sitosterol, butylbenzylphthalate, and p,p'-DDT) and all five of the inconsistent chemicals in the E-SCREEN assay exhibited activity, which indicated that these two assays are more sensitive in detecting low potency estrogens than the ER binding assay. It is important to note that most of these inconsistent chemicals had marginal activity in one assay but no detectable response in the others.
In the E-SCREEN assay, only 4 of 19 (21%) low potency (log RPP < -1.5) chemicals had an RPE < 50% of estradiol ( Figure  6). In contrast, 12 of 16 (75%) low potency chemicals (log RP < -1.5) in the yeast assay had an RIE < 50% of estradiol ( Figure 3). Most of these chemicals are androgens, alkylphenols, DDTs, and phthalates. This finding, combined with the linear correlation of RIE with log RP, suggests that these results are inherent to the reporter gene construct, and demonstrate that the yeast assay has lower resolving power measured by the RIE or RPE than by the E-SCREEN for low potency chemicals. Recently, Harris et al. (52) reported that the RIE for phthalates increased as the incubation time proceeded; this suggests that the incubation times used in the data sets analyzed here may have contributed to the low RIEs.
There are a several sources of error that should be examined in comparing the results of the three assays. One is the reproducibility of results from different labs performing the same assay. Because the results of the two yeast assays were analyzed differently, error could be introduced in the comparison. Gaido et al. (26) fit their results to a Michaelis-Menten equation with a Hill coefficient and estimated the EC50, whereas Coldham et al. (25) recorded the EC50 relative to the EC50 of estradiol for chemicals with an RIE > 50%. For chemicals with an RIE < 50%, they calculated the concentration ofestradiol and the test chemical that gave the same activity values. However, the good linear correlation between the two assays demonstrates the reproducibility of the yeast assay, even when the data are analyzed differently. Conflicting outcomes were found for two chemicals that were not induded in the correlation analysis. Coldham et al. (25) reported that butylbenzylphthalate and p,p'-DDT were marginally active, but Gaido et al. (26) reported that they were inactive. This suggests that for low potency chemicals near the limit of resolution of the assay, inconsistent results may be obtained. It is important to define the limits of assay resolution for these and other assays in order to have confidence in the activity value for low potency chemicals.
Another source of error in comparing either identical or different assays is chemical purity. Technical grade methoxychlor contains more than 50 impurities (53,54), of which monohydroxymethoxychlor olefin and monohydroxymethoxychlor are most likely the active components. Their ER binding affinities are close to that of2,2-bis (p-hydroxyphenyl)-1,1,1-trichloroethane (HPTE) (55). In addition, Blair et al. (55) reported that 99% pure methoxychlor is inactive in binding to the ER and 95% pure methoxychlor actually competes with E2 at a 100,000-fold lower binding affinity. Nonylphenol is a mixture of congeners (56). In our study, we determined that octylphenol and nonylphenol (both technical grade from ChemService, West Chester, PA) are 50-and 40-fold more potent, respectively, than reported by Andersen et al. (19) for pure 4-n-octylphenol and 4-n-nonylphenol. This is consistent with early E-SCREEN results by Soto et al. (29) and with ER binding assay data from Blair et al. (55). The purity of chemicals may vary among batches from the same manufacturers and among diverse manufacturers. This may explain why the largest errors were found for these two chemicals in comparing the data sets of Coldham et al. (25) and Gaido et al. (26). The issue of chemical purity should be given serious consideration in both the experimental design phase and in evaluating results within and across laboratories. It would be desirable to assemble a common set of chemicals of defined source and purity for use across laboratories.
Our analysis suggests that although there is general agreement among the three assays, there are certain performance characteristics and sources of error that should be considered in the use of the assays, either alone or in combination. For purposes of prioritization, some degree of error may not be of great concern because these chemicals would be examined further at higher tiers in the test battery (4).
Assay comparisons using data mining techniques are very different from other published comparisons among estrogens reported in the literature. Most of the publications (19,25,57) have focused on an individual chemical across assays. In contrast, data mining techniques allow the use of a large database and statistical analysis methods to explore the inherent relationships and patterns between assays for a broad range of structurally diverse chemicals and activities. Although we used a simple linear regression method, the knowledge acquired and the patterns discovered are obvious. Furthermore, although beyond the scope of this paper, additional benefits can also be anticipated from conducting analyses such as the one we are reporting. For instance, few data sets reported in the literature cover a variety of chemical classes and/or exist for a large number of estrogens assayed under identical conditions. Appropriate quantitative comparisons allow large data sets to be built from small sets, as we did here for the united data set. Knowledge of sources of error is important for the integrity of such united data sets, which can be used to extract knowledge by meta-analysis for predicting activity or toxicity. Some computational approaches VOLUME 108 1 NUMBER 8 1 August 2000 * Environmental Health Perspectives Articles * Quantitative comparisons of in vitro assays for estrogenic activit have been used for a similar purpose using large data sets. For example, QSAR models have been constructed for predicting the biologic activity of untested chemicals (58); classification models have been developed for categorizing chemicals based on their biologic activity range (59); and rule-based models can be constructed for selecting the combination of short-term assays that best predict long-term assay outcomes. The use of such computational predictive models, in conjunction with the methods of comparative analysis reported here, could greatly facilitate the process of identifying chemical compounds with endocrine-disrupting potential.