Activity of Xenoestrogens at Nanomolar Concentrations in the E-Screen Assay

Background Certain effects induced by endocrine-disrupting chemicals (EDCs) may occur at dose levels lower than those normally tested in toxicology, but few systematic dose–response studies have been carried out in the low-dose range. Objectives The high statistical power afforded by a high-throughput in vitro assay such as the E-Screen assay was exploited with the aim of producing low-dose estimates for 24 estrogenic chemicals, including endogenous hormones and xenoestrogens. Results Unusual dose–response curves with inverted U-shapes were not observed in the low-dose range. Instead, many chemicals exhibited curves with very small gradients at low doses, and this complicated the reliable estimation of low effects. Systematic comparisons between the outcomes of hypothesis-testing procedures (lowest observed effect concentrations—LOECs, no observed effect concentrations—NOECs) and regression modeling approaches (EC01—effective concentration causing a 1% effect, EC05—effective concentration causing a 5% effect) produced estimates that agreed reasonably well. In many cases, NOECs were shown to be associated with proliferative responses of 1–2%. This is in contrast with the widespread perception of NOECs as values that signal complete absence of effects. For many of the tested xenoestrogens, the NOECs, EC01, and EC05 were in the nanomolar range, and comparisons with measured serum and adipose tissue levels in Europe revealed considerable overlaps in some cases. Conclusions Our studies illustrate the difficulties that may be encountered during the estimation of low doses in vivo. High statistical power is required when the underlying dose–response curves are shallow. Through the use of large sample sizes and numerous repeats, the experimental power of the E-Screen assay was sufficiently high to measure effect magnitudes of around 1–2% with reliability. However, such resources are usually not available for in vivo testing, with the consequence that the statistical detection limits are considerably higher. If this coincides with shallow dose–response curves in the low-effect range (which is normally not measurable in vivo), the limited resolving power of in vivo assays may seriously constrain low-dose testing.


Monograph
Certain properties and effect profiles are thought to set endocrine-disrupting chemicals (EDCs) apart from other hazardous substances. Some EDC effects may be irreversible, such as those resulting from interference with androgen action during key steps of sexual differentiation of males. Of concern with estrogens is their role in breast and ovarian cancer [see recent review by Kortenkamp (2006)]. Certain EDC-mediated effects such as weight changes of sex accessory glands have been shown to occur at dose levels lower than those normally tested in toxicology, often with unusually shaped doseresponse curves such as inverted Us. However, these observations could not be replicated by others [see the reviews by Ashby et al. (2004); vom Saal and Hughes (2005); vom Saal et al. (2005)], and this has provoked an unusually heated controversy in the field, with claims of bias due to sources of research funding (vom Saal and Hughes 2005).
However, few systematic dose-response studies have been carried out with EDCs, and this has been highlighted as a deficiency by the National Toxicology Program (NTP) Low-Dose Peer Review Panel (NTP 2001). Most EDC low-dose studies conducted to date have employed only one or two different dose levels and have used statistical hypothesis testing procedures to compare the effects in treated groups with those observed in controls. These methods are commonly drawn on to derive no observed effect levels (NOELs) but have been sharply criticized by statisticians due to insufficient control for type II errors (Moore and Caux 1997;Slob 1999). Type II errors occur when experimenters arrive at the conclusion that there is no effect when in fact there is one. Increasingly, regression-based methods such as the benchmark approach [Crump 2002; U.S. Environmental Protection Agency (U.S. EPA) 1995] are promoted as alternatives to hypothesis testing. One of the strengths of regression methods lies in the fact that the statistical power contained in the entirety of experimental data is accessible for low-effect dose estimations. This is not the case with hypothesis-testing methods where the pair-wise comparisons between controls and one dose group leave the information available from other dose groups unused. Regression analyses have rarely been employed for the estimation of low effect doses of EDCs. Systematic comparisons of low-dose estimates derived from hypothesis testing with those obtained by regression modeling are missing for EDCs.
The E-Screen assay. The protocol described previously (Rajapakse et al. 2004;Soto et al. 1995) was adopted to a miniaturized format using 96-well microtiter plates. MCF-7 BOS were seeded into the central 48 wells of 96-well plates (Falcon; BD Biosciences, Oxford, UK) at a density of 2,500 cells per well in a volume of 200 µL and allowed to attach for 24 ± 2 hr. Peripheral wells on the microtiter plate were filled with sterile water.
The media change into experimental conditions was carried out on a plate-by-plate basis and with two lanes at a time. This was to control and minimize the time the cells were left in rinse media, without FBS. We found that cells kept for too long without seeding medium grew suboptimally, and this introduced errors leading to poor reproducibility.
The seeding media of the top two lanes was gently aspirated and the attached cells rinsed with 200 µL phenol red-free DMEM (Invitrogen). The rinse medium was then replaced with 200 µL experimental medium [charcoal-dextran (CD)-DMEM] consisting of phenol-red free DMEM supplemented with 1% (v/v) sodium pyruvate, 1% MEM-NEAA, and 10% CD-stripped FBS, with the appropriate concentration of the test compound. These top two rows contained eight increasing concentrations of the test chemical solubilized in ethanol (final ethanol concentration: 0.5%) and tested in duplicate. The next row, as well as one row between positive and negative controls, was left untreated to avoid "creeping" of the test chemical to adjacent wells. The remaining two rows were treated in the same way as the first two. One contained negative controls (CD-DMEM + 0.5% ethanol in 8 wells) and the other positive controls [CD-DMEM + E 2 (20 nM) in 8 wells].
To minimize the number of cells being lost from the bottom of the wells during media changes, rinsing, and treatment, all pipetting was carried out using a low-ejection force electronic multichannel pipettor, and great care was taken to avoid long direct contact of the pipettor tip with the bottom of the well, as this would have dislodged cells.
After 120 hr, the assay was terminated by placing the plates on ice for 1 min before gently removing the experimental media and replacing it with 200 µL ice-cold 10% (wt/vol) trichloroacetic acid, 10% (wt/vol). The plates were left on ice for 25 min, then rinsed gently 5 times with water and allowed to air dry. Cells were then stained with 0.4% sulforhodamine B (SRB) in 1% (vol/vol) acetic acid for 10 min. The bound dye was solubilized with 100 µL Tris-base and the optical density (OD) read at 510 nm directly in the same plate on a microplate reader (Labsystems Multiskan; VWR International, Ltd., Leicestershire, UK). It had been established previously that there is a direct linear relationship between cell number to OD values of the Tris-SRB solution and experimental readings were in the linear range of the standard curve (data not shown).
To reduce intraexperimental variability, data were normalized on a plate-by-plate basis. Data were scaled between 0 (ethanol controls) and 1 (positive controls). A detailed description of the data normalization procedure has been published previously (Rajapakse et al. 2004).
All compounds were tested in at least four independent experiments run on up to three plates, with each plate containing eight increasing concentrations of the test chemical in duplicates. Hexestrol, dienestrol, 4-MBC, and OMC were tested twice on two plates each.
Statistical analysis and regression modeling. Statistical dose-response regression analyses were carried out by applying a best-fit approach (Scholze et al. 2001). Various nonlinear regression models (logit, probit Weibull, generalized logits I and II), which all describe monotonic sigmoidal dose-response relationships, were fitted independently to the same data set, and the best-fitting model was selected on the basis of a statistical goodnessof-fit criterion, the information criterion of Schwarz (Schwarz 1978). High-dose ranges for which the effect data showed a down-turn trend (U-shape) were excluded from data analysis. Results are shown in Table 1. Data analysis was always performed on pooled data from all the repeat studies. To account for the intra-and interstudy variability associated with this nested data scenario, the generalized nonlinear mixed modeling approach was used, in which both fixed and random effects are permitted to have a nonlinear relationship with the effect end point (Vonesh and Chinchilli 1996). As potential sources for random effects, two cases were identified for the normalized end point: dose-response data from different studies varied in their curve steepness, which was dealt with by including an additional random effect to the steepness model parameter, and slight shifts of the whole curves based on the log 10transformed concentration scale were observed, which was accounted for by including an additional shift parameter as random effect in the nonlinear regression model. The random effects were assumed to follow a Gaussian distribution with an expectation of zero and thus were not included in Table 1.
The effect concentrations shown in Table 2 were selected for three low-response levels (10, 5, and 1% normalized cell proliferation) and were calculated from the functional inverse of the best-fitting model. Statistical uncertainties for the estimated effect doses were expressed as 95% confidence belts and approximately determined by applying the bootstrap method (Efron and Tibshirani 1993).
NOEC and lowest observed effect concentration (LOEC) values were derived by testing a trend in concentration effects against control by using nonparametric multiple contrast tests (Neuhaeuser et al. 2000). This method is considered a very powerful and robust test [see Neuhaueser et al. (2000) for more details].

Results
During our studies with the E-Screen assay, we encountered three prototypical dose-response relationships, characterized by specific features of shape, gradient, and position ( Figure 1). As an example for a typical steroidal estrogen, estriol exhibited the full range of effects, producing the maximal proliferative response observed with E 2 , which was routinely used as a positive control. With a median effect concentration of 0.1 nM, its potency fell in the range of other steroidal estrogens. The phytoestrogen coumesterol was about 100 times less potent than estriol and provoked only 90% of the maximal effect. At the highest tested concentrations, there was a noticeable down-turn in responses, giving rise to an inverted U-shape. This phenomenon became much more pronounced with the pesticide β-endosulfan, which produced a maximal effect of only 70%, followed by a decline of the response with rising concentrations of the pesticide. Because of the nature of the E-Screen assay, it is not possible to delineate whether this reduction in response is the result of cell toxicity or cell proliferation arrest. As is typical for the E-Screen, data variation increased with effect magnitude and was lowest around negative control responses. Toward the lower range of responses, the curves for estriol and coumes-terol were slightly shallower than the curve for β-endosulfan. The best-fitting regression models used for these three agents are shown in Table 1 together with those employed for all other tested chemicals.
Application of hypothesis testing procedures (nonparametric multiple contrast test) allowed us to estimate LOECs (Table 2), and these were 4.0 × 10 -4 nM, 0.55 nM, and 410 nM for estriol, coumesterol, and β-endosulfan, respectively. Consequently, the next lower tested concentrations could be designated as NOEC values (depicted as blue circles in Figure 1), and these were 3.6 × 10 -4 nM for estriol, 0.24 nM for coumesterol, and 150 nM for β-endosulfan. Regression analysis yielded low-dose estimates that differed slightly from the NOECs (Table 2). For β-endosulfan, the concentration estimated to produce a 1% effect (EC 01 ) was lower than the NOEC (140 nM vs. 150 nM for β-endosulfan). The EC 01 values for coumesterol (0.47 nM) and estriol (9.7 × 10 -4 nM) were higher than the NOECs for these chemicals.
By far the most extensive low-dose studies were carried out with E 2 and the chlorinated hydrocarbon β-HCH, a waste product of lindane production. For reasons that remain to be clarified fully, we encountered considerable response variations with E 2 . Curiously, this was restricted to doses corresponding to low effects but did not extend to the median-effect range. Comparable response variations also did not occur with the other tested steroidal estrogens (estrone, estriol, dienestrol, hexestrol). Figure 2 Environmental Health Perspectives • VOLUME 115 | SUPPLEMENT 1 | December 2007 93 Low-dose effects of xenoestrogens in the E-Screen assay  Figure 2, namely, data from the EC 01 confirmation study is not included.
compares the outcome of dose-response studies carried out in 2004 (gray circles) with those obtained from more recent experiments where different E 2 stock solutions were used. Although the variations in the median-effect range were relatively low, even among studies, the proliferative response induced by the hormone varied strongly at effect levels below 0.3. In this low-effect range, two repeat studies carried out with a dilution series prepared from the same stock solution (black and light blue circles in Figure 2) yielded higher responses than those in an experiment conducted with a different E 2 stock solution (dark blue circles), which in turn agreed very well with the historical data set (gray circles). Because of the low gradient of the dose-response curves, the EC 05 (concentration estimated to produce a 5% effect) estimates for E 2 that can be derived from these studies cover the range between 3 × 10 -6 nM and 2.2 × 10 -4 nM. Regression analysis of the pool of all data gave an EC 05 of 8 × 10 -5 nM and an EC 01 of 4 × 10 -6 nM.
Because of the shallow gradient of the dose-response curve in this low-effect range, the 95% confidence intervals (CIs) for these effect concentrations were very large ( Table 2). The variability associated with different stock solutions of E 2 was only observed for low concentrations of the hormone and was not observed for any of the other tested compounds. This rules out the possibility of experimental errors during the preparation of stock solutions and subsequent dilutions.
In contrast to our studies with E 2 , the experiments carried out with β-HCH proved to be very reproducible. Regression analysis of an initial low-dose study produced an EC 1 estimate of 40.2 nM (black open circles in Figure 2). In a second experiment, we decided to assess the validity of this low-dose estimate by using a hypothesis-testing approach where the 40 nM concentration was retested with a large number of replicates and compared with control readings without further doseresponse analysis. This study confirmed the original EC 01 estimate, with a statistically significant proliferative effect of 1.22% and a 95% CI of 0.06-1.8% (n = 16, p = 0.007, t-test; data not shown in Figure 2). In an attempt to probe the predictive value of this estimate by regression analysis, multiple concentrations of β-HCH between 1 and 100 nM were retested with a high and equal number of replicates and controls (light blue open circles in Figure 2). The outcome of this third study was in good agreement with those of the initial experiment. Regression analysis of the pooled data set from all three experiments gave a revised EC 01 estimate of 88 nM, and an NOEC of 52 nM.
The remaining xenoestrogens gave results that were generally very reproducible. Their low-dose estimates, including EC 10 (concentration estimated to produce a 10% effect), EC 05 , EC 01 , LOEC, and NOEC, are listed in Table 2. For most compounds, the LOECsestimates derived from hypothesis testing procedures-were equivalent to effects of between 1 and 5% and in five cases even below 1% (hexestrol,estrone,and p,. NOECs often equated to responses of around 1%; in four cases (estriol, estrone, propyl paraben, p,p´-DDT) they were even significantly below the EC 01 (i.e., outside the corresponding 95% confidence belt).
Judged by their high EC 10 /EC 01 ratios, some chemicals exhibited extremely shallow dose-response curves in the low-effect range. E 2 represents the most extreme case, with a ratio of 72.5. Many of the steroidal estrogens also produced rather shallow curves, a characteristic not observed with many of the synthetic xenoestrogens. Small gradients may increase the uncertainty associated with low dose estimates, as reflected by the larger CIs for the respective effect concentrations ( Table 2).
The often surprisingly small numeric values of the low-dose estimates for the tested agents prompted us to relate these readings (NOEC, EC 01 , and EC 05 ) to the range of levels found in human tissues. Where available for the tested chemicals, concentrations in serum and in adipose tissue measured in European countries were chosen as comparators (Figure 3). The reference studies used in the preparation of Figure 3 are listed in Table 3. It includes only those that reported the highest and lowest concentrations of the contaminants in either tissue. A large number of other publications from several European countries, including Belgium, Denmark, Germany, and Holland, were also analyzed. They all reported levels between the extremes presented in  . Colors present data from different independent studies: same stock solution (black and light blue) or different stock solution (dark blue), or data from 2-year-old studies (gray circles). The best-fitting regression models (see Table 1) are shown as lines with the corresponding 95% confidence belt for the mean effect as dotted lines, with colors of lines corresponding to colors of data. For E 2 , the range of EC 05 values (EC 05 = 3 × 10E-6 -2.2 × 10E-4) is pictured, as obtained from data from different studies.  Table 1) are shown as blue lines with the corresponding 95% confidence belt for the mean effect as dotted blue lines. Blue circles refer to the NOECs, derived by a nonparametric contrast test.  Koppen et al. 2002;Link et al. 2005;Raaschou-Nielsen et al. 2005).
For those organochlorine pesticides without sufficient available data (α-endosulfan, endosulfan o,p´-DDD), conversion from adipose tissue levels to serum levels and vice versa was carried out as described by Lopez-Cervantes et al. (2004).
For many of the chemicals tested in the E-Screen, low-dose estimates were removed by a factor of between 5 and 100 from the highest measured levels in human serum. However, there were notable exceptions: All low-dose measures for bisphenol A and o,p´-DDT fell near the median of serum levels measured in Europe, and toward the high end of serum levels, there were overlaps with the estimates derived for dieldrin and p,p´-DDT ( Figure 3B). With adipose tissue levels, the low-dose estimates for all chemicals except dieldrin and aldrin covered the range of measured values.

Discussion
The focus of most of the E-Screen studies carried out with xenostrogens in the past was on defining potencies in relation to endogenous hormones, and presumably for this reason, information about low-dose effects is scarce. Systematic attempts to titrate doses down into the range at the "threshold" between effect and no-effect are missing. The experiments presented here were intended to fill this gap and enable us to draw the following conclusions: Apart from a down-turn of responses near the high end of tested concentrations, inverted U-shapes in the range of low-effect doses were generally not observed under our experimental conditions, and this may be specific for the end point investigated in the E-Screen. Instead, detailed low-dose-response analyses revealed that many of the tested agents exhibited quite shallow curves in the low-effect range, and this resulted in low-dose estimates with often surprisingly small numerical values. In terms of small gradients, high potency and correspondingly low-effect dose estimates, the steroidal estrogens stood out. It is remarkable that this feature was less pronounced with all the synthetic xenoestrogens where low responses returned to control levels far more rapidly as the doses decreased. There are two possible reasons for the observed slow leveling of effects observed with E 2 . One hypothesis is that they are due to the secretion of messenger substances via autocrine or paracrine loops, which serve to induce small proliferative responses at very low concentrations. Hamelers et al. (2003) have shown that E 2 -responsive MCF-7 cell lines release a factor capable of activating the insulin-like grown factor (IGF) receptor, when treated with E 2 , and that this factor synergizes with small concentrations of E 2 . However, little information is available about the concentration range of E 2 effective in triggering the release of such factors. The transcription of other E 2 -inducible autocrine factors such as transforming growth factor (TGF)-α and stromal cell-derived factor-1 (SDF-1) is suppressed, not stimulated, by low concentrations of E 2 (Coser et al. 2003). An alternative explanation for the shallow concentration-response curve of E 2 may be sought by invoking an inhibitory effect of the hormone on apoptosis rather than to a proliferative effect. A study by Hur and colleagues (2004) has shown that low concentrations of E 2 block the transcription of Bik, a proapoptotic protein, which is expressed in MCF-7 BOS cells in the absence of estrogens.
It is striking that the dose-response curves observed with xenoestrogens were noticeably less shallow in the low-dose range. We speculate that xenoestrogens may lack the ability to induce signaling loops or antiapoptotic effects similar to E 2 , but experimental evidence to support this suggestion is lacking at present.
Dose-response curves with small gradients give rise to complications during the estimation of low-effect doses. As illustrated by our experiments with E 2 , high statistical power is necessary to arrive at valid estimates, and in this sense, the E-Screen serves as an illustrative example for the resources that are needed for the demonstration of effects with small magnitudes. Through using large sample sizes and numerous repeats, the experimental power of the E-Screen was sufficiently high to measure effect magnitudes of around 1-2% with reliability. However, such resources are usually not available for in vivo testing, with the consequence that the statistical detection limit is often considerably higher. If this coincides with shallow dose-response curves in the low-effect range (which is normally not measurable in vivo), the limited resolving power of in vivo assays may seriously constrain low-dose testing. This aspect of the EDC low-dose issue has not been appreciated sufficiently in the past.
In the examples presented here, the lowdose estimates derived from hypothesis testing agreed reasonably well with those obtained by regression modeling. To a large extent, this was because of our narrow spacing of tested concentrations in the low-dose range. Because NOECs are defined in relation to LOECsthey are the next lower tested concentrationstheir numeric value depends heavily on the choice of concentrations selected for testing. With shallow gradients of the underlying response curves, tight spacing will tend to yield higher NOECs, and under such conditions they are likely to be similar to regression-based estimates such as EC 01 or EC 05 , as in our case. Depending on the experimental power and the chemical tested, NOECs were close to the estimated EC 01 and thus associated with proliferative effects of around 1% ( Table 2). The resolving power of the E-Screen was not sufficient to say with certainty whether these concentrations provoked proliferative effects, nor could such effects be ruled out with certainty (as indicated by the model estimation). This is in contrast with the widespread perception of NOECs (and NOELs) as values that signal complete absence of effects. When effect variation is high, and experimental power comparatively low, NOECs can be associated with effect magnitudes as high as 10-20% (Moore and Caux 1997), and this has led to sharp criticism of thoughtless use of the terms NOEL and NOEC ["one of the most misunderstood concepts in ecotoxicology" (Moore and Caux 1997)].
The realization that even the statistical power afforded by a high-throughput assay such as the E-Screen is insufficient to resolve effect magnitudes smaller than 1% raises the issue whether such small effects, although statistically relevant, also have biological meaning. Thus, if it is difficult to derive a zero effect level for xenoestrogens in the E-Screen, would not a solution to this dilemma present itself by defining a proliferative effect of biological significance that should be avoided to protect the exposed organism? Concentrations associated with such "critical" effect sizes could then be used to derive better defined quality standards.
However, our knowledge about the role of estrogens, both steroidal and man-made, in the normal development of the breast as well as in the induction of neoplasia is too fragmentary to provide conclusive answers to this question. As yet, there is no consensus about the way in which steroidal estrogens promote cell division in the mammary gland [see discussions by Cheng et al. (2004); Clarke (2003); Smalley and Ashworth (2003)]. According to one widely held view, estrogens provide stimuli for the clonal expansion of precancerous cell populations (Smalley and Ashworth 2003). If this is true, then even small proliferative effects, over decades, may contribute to the clonal expansion of precancerous cells. Viewed from such a long-term perspective, any attempts to establish a critical effect size below which risks are negligible may be problematic. MCF-7 cells are used widely as a model to represent estrogen-responsive breast cancer cells (Spink et al. 2006), but it is unclear whether their sensitivity to estrogens is representative of the situation in vivo. Bearing this proviso in mind, it may nevertheless be of interest to compare E-Screen low-dose estimates with the tissue levels determined in European citizens. In making such comparisons, it is important to reflect on the dose metric used as a basis. It is difficult to define the target doses of these chemicals received locally by cells in the human mammary gland, but regardless of this complication, blood serum levels are often regarded as reasonable measures of such internal exposure. Pointing to the high levels of some xenoestrogens in adipose tissue and the close proximity of epithelial cells that line the milk ducts of the female breast with the surrounding adipocytes, Shekhar and colleagues (1997) have argued that epithelial cells may be exposed to higher levels of xenoestrogens than suggested by serum levels. Although this may be the case, it also appears plausible that xenoestrogens are more easily available to mammary cells from blood serum, which would mean that serum levels are a better measure of the "dose at target." Because a firm decision on these matters cannot be reached at present because of a lack of evidence, we decided to take a pragmatic course and compare E-Screen low-effect dose estimates with both serum and adipose tissue levels.
Considering the 5-to 100-fold margin between our E-Screen low-dose estimates and the high end of the range of measured serum levels, it appears unlikely that the majority of the tested chemicals individually are able to induce biologically significant degrees of cell proliferation at these exposure levels. This conclusion needs to be tempered in view of the likelihood of possible combination effects of these chemicals (Rajapakse et al. 2004), but this awaits experimental confirmation. To our surprise, this reasoning could not be extended to bisphenol A and o,p´-DDT. In both these cases, our low-dose estimates were placed in the mid-range of measured serum levels. For o,p´-DDT, the span of measured serum levels became extended toward the high end because of the high measured values in parts of the Canary Islands and Portugal. The high levels of o,p´-DDT and corresponding metabolites in these countries were attributed to high consumption of contaminated foods from Asia and Latin America, where DDT is still in use (Cruz et al. 2003;Zumbado et al. 2005). When adipose tissue levels were chosen as the basis for comparisons, all low-dose estimates with the exception of dieldrin and aldrin fell within the span of measured values in Europe. With α-endosulfan, β-endosulfan, and methoxychlor, the overlap was toward the high end of adipose tissue concentrations. Notable are β-HCH, p,p´-DDT, and p,p´-DDE, where the mid-range of levels was shown to elicit low effects in the E-Screen. The outcomes of the comparisons with human tissue levels are not biased because of inconsistent application of low-dose estimation procedures.
By conducting extensive dose-response analyses with high experimental power, we were able to show that estimates of low-effect doses for xenoestrogens overlapped with some tissue levels found in humans. Investigations of the toxicologic relevance of these observations require more urgency than perhaps thought previously. The usefulness of human biomonitoring, animal experiments, and in vitro assays could be enhanced by efforts to explore the relationships between target doses in vivo and effective concentrations in vitro.