Predictive features of ligand‐specific signaling through the estrogen receptor

Abstract Some estrogen receptor‐α (ERα)‐targeted breast cancer therapies such as tamoxifen have tissue‐selective or cell‐specific activities, while others have similar activities in different cell types. To identify biophysical determinants of cell‐specific signaling and breast cancer cell proliferation, we synthesized 241 ERα ligands based on 19 chemical scaffolds, and compared ligand response using quantitative bioassays for canonical ERα activities and X‐ray crystallography. Ligands that regulate the dynamics and stability of the coactivator‐binding site in the C‐terminal ligand‐binding domain, called activation function‐2 (AF‐2), showed similar activity profiles in different cell types. Such ligands induced breast cancer cell proliferation in a manner that was predicted by the canonical recruitment of the coactivators NCOA1/2/3 and induction of the GREB1 proliferative gene. For some ligand series, a single inter‐atomic distance in the ligand‐binding domain predicted their proliferative effects. In contrast, the N‐terminal coactivator‐binding site, activation function‐1 (AF‐1), determined cell‐specific signaling induced by ligands that used alternate mechanisms to control cell proliferation. Thus, incorporating systems structural analyses with quantitative chemical biology reveals how ligands can achieve distinct allosteric signaling outcomes through ERα.


Introduction
Many drugs are small-molecule ligands of allosteric signaling proteins, including G protein-coupled receptors (GPCRs) and nuclear receptors such as ERa. These receptors regulate distinct phenotypic outcomes (i.e., observable characteristics of cells and tissues, such as cell proliferation or the inflammatory response) in a ligand-dependent manner. Small-molecule ligands control receptor activity by modulating recruitment of effector enzymes to distal regions of the receptor, relative to the ligand-binding site. Some of these ligands achieve selectivity for a subset of tissue-or pathwayspecific signaling outcomes, which is called selective modulation, functional selectivity, or biased signaling, through structural mechanisms that are poorly understood (Frolik et al, 1996;Nettles & Greene, 2005;Overington et al, 2006;Katritch et al, 2012;Wisler et al, 2014). For example, selective estrogen receptor modulators (SERMs) such as tamoxifen (Nolvadex â ; AstraZeneca) or raloxifene (Evista â ; Eli Lilly) ( Fig 1A) block the ERa-mediated proliferative effects of the native estrogen, 17b-estradiol (E2), on breast cancer cells, but promote beneficial estrogenic effects on bone mineral density and adverse estrogenic effects such as uterine proliferation, fatty liver, or stroke (Frolik et al, 1996;Fisher et al, 1998;McDonnell et al, 2002;Jordan, 2003).
ERa contains structurally conserved globular domains of the nuclear receptor superfamily, including a DNA-binding domain (DBD) that is connected by a flexible hinge region to the ligandbinding domain (LBD), as well as unstructured AB and F domains at its amino and carboxyl termini, respectively ( Fig 1B). The LBD contains a ligand-dependent coactivator-binding site called activation function-2 (AF-2). However, the agonist activity of SERMs derives from activation function-1 (AF-1)-a coactivator recruitment site located in the AB domain (Berry et al, 1990;Shang & Brown, 2002;Abot et al, 2013).
In the canonical model of the ERa signaling pathway (Fig 1C), E2-bound ERa forms a homodimer that binds DNA at estrogenresponse elements (EREs), recruits NCOA1/2/3 (Metivier et al, 2003;Johnson & O'Malley, 2012), and activates the GREB1 gene, which is required for proliferation of ERa-positive breast cancer cells (Ghosh et al, 2000;Rae et al, 2005;Deschenes et al, 2007;Liu et al, 2012;Srinivasan et al, 2013). However, ERa-mediated proliferative responses vary in a ligand-dependent manner ; thus, it is not known whether this canonical model is widely applicable across diverse ERa ligands.
Our long-term goal is to be able to predict proliferative or antiproliferative activity of a ligand in different tissues from its crystal structure by identifying different structural perturbations that lead to specific signaling outcomes. The simplest response model for ligand-specific proliferative effects is a linear causality model, where

Molecular Systems Biology
Features of estrogen receptor signaling Jerome C Nwachukwu et al the degree of NCOA1/2/3 recruitment determines GREB1 expression, which in turn drives ligand-specific cell proliferation ( Fig 1D). Alternatively, a more complicated branched causality model could explain ligand-specific proliferative responses ( Fig 1E). In this signaling model, multiple coregulator binding events and target genes (Won Jeong et al, 2012;Nwachukwu et al, 2014), LBD conformation, nucleocytoplasmic shuttling, the occupancy and dynamics of DNA binding, and other biophysical features could contribute independently to cell proliferation (Lickwar et al, 2012).
To test these signaling models, we profiled a diverse library of ERa ligands using systems biology approaches to X-ray crystallography and chemical biology , including a series of quantitative bioassays for ERa function that were statistically robust and reproducible, based on the Z'-statistic (Fig EV1A and B; see Materials and Methods). We also determined the structures of 76 distinct ERa LBD complexes bound to different ligand types, which allowed us to understand how diverse ligand scaffolds distort the active conformation of the ERa LBD. Our findings here indicate that specific structural perturbations can be tied to ligand-selective domain usage and signaling patterns, thus providing a framework for structure-based design of improved breast cancer therapeutics, and understanding the different phenotypic effects of environmental estrogens.

Results
Strength of AF-1 signaling does not determine cellspecific signaling To compare ERa signaling induced by diverse ligand types, we synthesized and assayed a library of 241 ERa ligands containing 19 distinct molecular scaffolds. These include 15 indirect modulator series, which lack a SERM-like side chain and modulate coactivator binding indirectly from the ligand-binding pocket (Fig 2A-E; Dataset EV1)     (Muthyala et al, 2003;Seo et al, 2006) (Liao et al, 2014) (Min et al, 2013). We also generated four direct modulator series with side chains designed to directly dislocate h12 and thereby completely occlude the AF-2 surface (Fig 2C and E;Dataset EV1) (Kieser et al, 2010). Ligand profiling using our quantitative bioassays revealed a wide range of ligand-induced GREB1 expression, reporter gene activities, ERa-coactivator interactions, and proliferative effects on MCF-7 breast cancer cells . This wide variance enabled us to probe specific features of ERa signaling using ligand class analyses, and identify signaling patterns shared by specific ligand series or scaffolds.
We first asked whether direct modulation of the receptor with an extended side chain is required for cell-specific signaling. To this end, we compared the average ligand-induced GREB1 mRNA levels in MCF-7 cells and 3×ERE-Luc reporter gene activity in Ishikawa endometrial cancer cells (E-Luc) or in HepG2 cells transfected with wild-type ERa (L-Luc ERa-WT) (Figs 3A and EV2A-C). Direct modulators showed significant differences in average activity between cell types except OBHS-ASC analogs, which had similar low agonist activities in the three cell types. The other direct modulators had low agonist activity in Ishikawa cells, no or inverse agonist activity in MCF-7 cells, and more variable activity in HepG2 liver cells. While it was known that direct modulators such as tamoxifen drive cell-specific signaling, these experiments reveal that indirect modulators also drive cell-specific signaling, since eight of fourteen classes showed significant differences in average activity .
Tamoxifen depends on AF-1 for its cell-specific activity (Sakamoto et al, 2002); therefore, we asked whether cell-specific signaling observed here is due to a similar dependence on AF-1 for activity ( Fig EV1). To test this idea, we compared the average L-Luc activities of each scaffold in HepG2 cells co-transfected with wildtype ERa or with ERa lacking the AB domain (Figs 1B and EV1). While E2 showed similar L-Luc ERa-WT and ERa-DAB activities, tamoxifen showed complete loss of activity without the AB domain ( Fig EV1B). Deletion of the AB domain significantly reduced the average L-Luc activities of 14 scaffolds (Student's t-test, P ≤ 0.05) ( Fig 3B). These "AF-1-sensitive" activities were exhibited by both direct and indirect modulators, and were not limited to scaffolds that showed cell-specific signaling (Fig 3A and B). Thus, the strength of AF-1 signaling does not determine cell-specific signaling.

Identifying cell-specific signaling clusters in ERa ligand classes
As another approach to identifying cell-specific signaling, we determined the degree of correlation between ligand-induced activities in the different cell types. Here, we compared ligands within each class (Fig 3C), instead of comparing average activities (Fig 3A and B). For each ligand class or scaffold, we calculated the Pearson's correlation coefficient, r, for pairwise comparison of activity profiles in breast (GREB1), liver (L-Luc), and endometrial cells (E-Luc). The value of r ranges from À1 to 1, and it defines the extent to which the data fit a straight line when compounds show similar agonist/antagonist activity profiles between cell types ( Fig EV3A). We also calculated the coefficient of determination, r 2 , which describes the percentage of variance in a dependent variable such as proliferation that can be predicted by an independent variable such as GREB1 expression. We present both calculations as r 2 to readily compare signaling specificities using a heat map on which the red-yellow palette indicates significant positive correlations (P ≤ 0.05, F-test for nonzero slope), while the blue palette denotes negative correlations (Fig 3C-F).
This analysis revealed diverse signaling specificities that we grouped into three clusters. Scaffolds in cluster 1 exhibited strongly correlated GREB1 levels, E-Luc and L-Luc activity profiles across the three cell types (Fig 3C lanes 1-4), suggesting these ligands use similar ERa signaling pathways in the breast, endometrial, and liver cell types. This cluster includes WAY-C, OBHS, OBHS-N, and triarylethylene analogs, all of which are indirect modulators. Cluster 2 contains scaffolds with activities that were positively correlated in only two of the three cell types, indicating cell-specific signaling ( Fig 3C lanes 5-12). This cluster includes two classes of direct modulators (cyclofenil-ASC and WAY dimer), and six classes of indirect modulators (2,5-DTP, 3,4-DTP, S-OBHS-2 and S-OBHS-3, furan, and WAY-D). In this cluster, the correlated activities varied by scaffold. For example, 3,4-DTP, furan, and S-OBHS-2 drove positively correlated GREB1 levels and E-Luc but not L-Luc ERa-WT activity ( Fig 3C lanes 5-7). In contrast, WAY dimer and WAY-D analogs drove positively correlated GREB1 levels and L-Luc ERa-WT but not E-Luc activity (Fig 3C lanes 8 and 9). The last set of scaffolds, cluster 3, displayed cell-specific activities that were not correlated in

A
Structure of the E2-bound ERa LBD in complex with an NCOA2 peptide of (PDB 1GWR). B-D Structural details of the ERa LBD bound to the indicated ligands. Unlike E2 (PDB 1GWR), TAM is a direct modulator with a BSC that dislocates h12 to block the NCOA2-binding site (PDB 3ERT). OBHS is an indirect modulator that dislocates the h11 C-terminus to destabilize the h11-h12 interface (PDB 4ZN9).   Fig 2). The average activities of ligands classes are shown (mean + SEM). C-F Correlation and regression analyses in a large test set. The r 2 values are plotted as a heat map. In cluster 1, the first three comparisons (rows) showed significant positive correlations (F-test for nonzero slope, P ≤ 0.05). In cluster 2, only one of these comparisons revealed a significant positive correlation, while none was significant in cluster 3. +, statistically significant correlations gained by deletion of the AB or F domains. À, significant correlations lost upon deletion of AB or F domains.
Source data are available online for this figure. any of the three cell types (Fig 3C lanes 13-19). This cluster includes two direct modulator scaffolds (OBHS-ASC and OBHS-BSC), and five indirect modulator scaffolds (A-CD, cyclofenil, 3,4-DTPD, imine, and imidazopyridine). These results suggest that addition of an extended side chain to an ERa ligand scaffold is sufficient to induce cell-specific signaling, where the relative activity profiles of the individual ligands change between cell types. This is demonstrated by directly comparing the signaling specificities of matched OBHS (indirect modulator, cluster 1) and OBHS-BSC analogs (direct modulator, cluster 3), which differ only in the basic side chain ( Fig 2E). The activities of OBHS analogs were positively correlated across the three cell types, but the side chain of OBHS-BSC analogs was sufficient to abolish these correlations (Figs 3C lanes 1 and 19, and EV3A-C).
The indirect modulator scaffolds in clusters 2 and 3 showed cellspecific signaling patterns without the extended side chain typically viewed as the primary chemical and structural mechanism driving cell-specific activity. Many of these scaffolds drove similar average activities of the ligand class in the different cell types (Fig 3A), but the individual ligands in each class had different cell-specific activities ( Fig EV2A-C). Thus, examining the correlated patterns of ERa activity within each scaffold demonstrates that an extended side chain is not required for cell-specific signaling.

Modulation of signaling specificity by AF-1
To evaluate the role of AF-1 and the F domain in ERa signaling specificity, we compared activity of truncated ERa constructs in HepG2 liver cells with endogenous ERa activity in the other cell types. The positive correlation between the L-Luc and E-Luc activities or GREB1 levels induced by scaffolds in cluster 1 was generally retained without the AB domain, or the F domain ( Fig 3D lanes 1-4). This demonstrates that the signaling specificities underlying these positive correlations are not modified by AF-1. OBHS analogs showed an average L-Luc ERa-DAB activity of 3.2% AE 3 (mean + SEM) relative to E2. Despite this nearly complete lack of activity, the pattern of L-Luc ERa-DAB activity was still highly correlated with the E-Luc activity and GREB1 expression (Fig EV3D and E), demonstrating that very small AF-2 activities can be amplified by AF-1 to produce robust signals. Similarly, deletion of the F domain did not abolish correlations between the L-Luc and E-Luc or GREB1 levels induced by OBHS analogs (Fig EV3F). These similar patterns of ligand activity in the wild-type and deletion mutants suggest that AF-1 and the F domain purely amplify the AF-2 activities of ligands in cluster 1.
In contrast, AF-1 was a determinant of signaling specificity for scaffolds in cluster 2. Deletion of the AB or F domain altered correlations for six of the eight scaffolds in this cluster (2,5-DTP, 3,4-DTP, S-OBHS-3, WAY-D, WAY dimer, and cyclofenil-ASC) (Fig 3D lanes  5-12). Comparing Fig 3C and D, the + and À signs indicate where the deletion mutant assays led to a gain or loss of statically significant correlation, respectively. Thus, in cluster 2, AF-1 substantially modulated the specificity of ligands with cell-specific activity ( Fig 3D  lanes 5-12). For ligands in cluster 3, we could not eliminate a role for AF-1 in determining signaling specificity, since this cluster lacked positively correlated activity profiles (Fig 3C), and deletion of the AB or F domain rarely induced such correlations (Fig 3D), except for A-CD and OBHS-ASC analogs, where deletion of the AB domain or F domain led to positive correlations with E-Luc activity and/or GREB1 levels ( Fig 3D lanes 13 and 18). Thus, ligands in cluster 2 rely on AF-1 for both activity ( Fig 3B) and signaling specificity ( Fig 3D). As discussed below, this cell specificity derives from alternate coactivator preferences.

Ligand-specific control of GREB1 expression
To determine whether ligand classes control expression of native ERa target genes through the canonical linear signaling pathway, we performed pairwise linear regression analyses using ERa-NCOA1/2/3 interactions in M2H assay as independent predictors of GREB1 expression (the dependent variable) (Figs EV1 and EV2A, F-H). In cluster 1, the recruitment of NCOA1 and NCOA2 was highest for WAY-C, followed by triaryl-ethylene, OBHS-N, and OBHS series, while for NCOA3, OBHS-N compounds induced the most recruitment and OBHS ligands were inverse agonists (Fig EV2F-H). The average induction of GREB1 by cluster 1 ligands showed greater variance, with a range between~25 and~75% for OBHS and a range from full agonist to inverse agonist for the others in cluster 1 ( Fig EV2A). GREB1 levels induced by OBHS analogs were determined by recruitment of NCOA1 but not NCOA2/3 (Fig 3E lane 1), suggesting that there may be alternate or preferential use of these coactivators by different classes. However, in cluster 1, NCOA1/2/3 recruitment generally predicted GREB1 levels ( Fig 3E lanes 1-4), consistent with the canonical signaling model (Fig 1D).

Ligand-specific control of cell proliferation
To determine mechanisms for ligand-dependent control of breast cancer cell proliferation, we performed linear regression analyses across the 19 scaffolds using MCF-7 cell proliferation as the dependent variable, and the other activities as independent variables (Fig 3F). In cluster 1, E-Luc and L-Luc activities, NCOA1/2/3 recruitment, and GREB1 levels generally predicted the proliferative response ( Fig 3F lanes 2-4). With the OBHS-N compounds, NCOA3 and GREB1 showed near perfect prediction of proliferation (Fig EV3G), with unexplained variance similar to the noise in the assays. The lack of significant predictors for OBHS analogs (Fig 3F lane 1) reflects their small range of proliferative effects on MCF-7 cells (Fig EV2I). The significant correlations with GREB1 expression and NCOA1/2/3 recruitment observed in this cluster are consistent with the canonical signaling model (Fig 1D), where NCOA1/2/3 recruitment determines GREB1 expression, which then drives proliferation. Ligands in cluster 2 and cluster 3 showed a wide range of proliferative effects on MCF-7 cells (Fig EV2I). Despite this phenotypic variance, proliferation was not generally predicted by correlated NCOA1/2/3 recruitment and GREB1 induction (Figs 3F lanes 5-19, and EV3H). Out of 15 ligand series in these clusters, only 2,5-DTP analogs induced a proliferative response that was predicted by GREB1 levels, which were not determined by NCOA1/2/3 recruitment (Fig 3E and F lane 10). 3,4-DTP, cyclofenil, 3,4-DTPD, and imidazopyridine analogs had NCOA1/3 recruitment profiles that predicted their proliferative effects, without determining GREB1 levels (Fig 3E and F, lanes 5 and 14-16). Similarly, S-OBHS-3, cyclofenil-ASC, and OBHS-ASC had positively correlated NCOA1/2/ 3 recruitment and GREB1 levels, but none of these activities determined their proliferative effects (Fig 3E and F lanes 11-12 and 18). For ligands that show cell-specific signaling, ERa-mediated recruitment of other coregulators and activation of other target genes likely determine their proliferative effects on MCF-7 cells.
NCOA3 occupancy at GREB1 did not predict the proliferative response We also questioned whether promoter occupancy by coactivators is statistically robust and reproducible for ligand class analysis using a chromatin immunoprecipitation (ChIP)-based quantitative assay, and whether it has a better predictive power than the M2H assay. ERa and NCOA3 cycle on and off the GREB1 promoter . Therefore, we first performed a time-course study, and found that E2 and the WAY-C analog, AAPII-151-4, induced recruitment of NCOA3 to the GREB1 promoter in a temporal cycle that peaked after 45 min in MCF-7 cells (Fig 4A). At this time point, other WAY-C analogs also induced recruitment of NCOA3 at this site to varying degrees (Fig 4B). The Z' for this assay was 0.6, showing statistical robustness (see Materials and Methods). We prepared biological replicates with different cell passage numbers and separately prepared samples, which showed r 2 of 0.81, demonstrating high reproducibility (Fig 4C).
The M2H assay for NCOA3 recruitment broadly correlated with the other assays, and was predictive for GREB1 expression and cell proliferation ( Fig 3E). However, the ChIP assays for WAY-C-induced recruitment of NCOA3 to the GREB1 promoter did not correlate with any of the other WAY-C activity profiles (Fig 4D), although the positive correlation between ChIP assays and NCOA3 recruitment via M2H assay showed a trend toward significance with r 2 = 0.36 and P = 0.09 (F-test for nonzero slope). Thus, the simplified coactivatorbinding assay showed much greater predictive power than the ChIP assay for ligand-specific effects on GREB1 expression and cell proliferation.
ERb activity is not an independent predictor of cellspecific activity One difference between MCF-7 breast cancer cells and Ishikawa endometrial cancer cells is the contribution of ERb to estrogenic response, as Ishikawa cells may express ERb (Bhat & Pezzuto, 2001). When overexpressed in MCF-7 cells, ERb alters E2-induced expression of only a subset of ERa-target genes (Wu et al, 2011), raising the possibility that ligand-induced ERb activity may contribute to E-Luc activities, and thus underlie the lack of correlation between the E-Luc and L-Luc ERa-WT activities or GREB1 levels induced by cell-specific modulators in cluster 2 and cluster 3 (Fig 3C).
To test this idea, we determined the L-Luc ERb activity profiles of the ligands (Fig EV1). All direct modulator and two indirect modulator scaffolds (OBHS and S-OBHS-3) lacked ERb agonist activity. However, the other ligands showed a range of ERb activities (Fig EV2J). For most scaffolds, L-Luc ERb and E-Luc activities were not correlated, except for 2,5-DTP and cyclofenil analogs, which showed moderate but significant correlations (Fig EV4A). Nevertheless, the E-Luc activities of both 2,5-DTP and cyclofenil analogs were better predicted by their L-Luc ERa-WT than L-Luc ERb activities (Fig EV4A and B). Thus, ERb activity was not an independent determinant of the observed activity profiles.

Structural features of consistent signaling across cell types
To overcome barriers to crystallization of ERa LBD complexes, we developed a conformation-trapping X-ray crystallography approach using the ERa-Y537S mutation (Nettles et al, 2008;Bruning et al, 2010;Srinivasan et al, 2013). To further validate this approach, we solved the structure of the ERa-Y537S LBD in complex with diethylstilbestrol (DES), which bound identically in the wild-type and   ERa-Y537S LBDs, demonstrating again that this surface mutation stabilizes h12 dynamics to facilitate crystallization without changing ligand binding (Appendix Fig S1A and B) (Nettles et al, 2008;Bruning et al, 2010;Delfosse et al, 2012). Using this approach, we solved 76 ERa LBD structures in the active conformation and bound to ligands studied here (Appendix Fig S1C). Eleven of these structures have been published, while 65 are new, including the DES-bound ERa-Y537S LBD. We present 57 of these new structures here (Dataset EV2), while the remaining eight new structures bound to OBHS-N analogs will be published elsewhere (S. Srinivasan et al, in preparation). Examining many closely related structures allows us to visualize subtle structural differences, in effect using X-ray crystallography as a systems biology tool. The indirect modulator scaffolds in cluster 1 did not show cellspecific signaling (Fig 3C), but shared common structural perturbations that we designed to modulate h12 dynamics. Based on our original OBHS structure, the OBHS, OBHS-N, and triaryl-ethylene compounds were modified with h11-directed pendant groups Zhu et al, 2012;Liao et al, 2014). Superposing the LBDs based on the class of bound ligands provides an ensemble view of the structural variance and clarifies what part of the ligandbinding pocket is differentially perturbed or targeted.
The 24 structures containing OBHS, OBHS-N, or triaryl-ethylene analogs showed structural diversity in the same part of the scaffolds (Figs 5A and EV5A), and the same region of the LBD-the C-terminal end of h11 (Figs 5B and C, and EV5B), which in turn nudges h12 (Fig 5C and D). We observed that the OBHS-N analogs displaced h11 along a vector away from Leu354 in a region of h3 that is unaffected by the ligands, and toward the dimer interface. For the triaryl-ethylene analogs, the displacement of h11 was in a perpendicular direction, away from Ile424 in h8 and toward h12. Remarkably, these individual inter-atomic distances showed a ligand class-specific ability to significantly predict proliferative effects (Fig 5E and F), demonstrating the feasibility of developing a minimal set of activity predictors from crystal structures.
As visualized in four LBD structures , WAY-C analogs were designed with small substitutions that slightly nudge h12 Leu540, without exiting the ligand-binding pocket (Fig 5G  and H). Therefore, changing h12 dynamics maintains the canonical signaling pathway defined by E2 (Fig 1D) to support AF-2-driven signaling and recruit NCOA1/2/3 for GREB1-stimulated proliferation.
Ligands with cell-specific activity alter the shape of the AF-2 surface Direct modulators like tamoxifen drive AF-1-dependent cell-specific activity by completely occluding AF-2, but it is not known how indirect modulators produce cell-specific ERa activity. Therefore, we examined another 50 LBD structures containing ligands in clusters 2 and 3. These structures demonstrated that cell-specific activity derived from altering the shape of the AF-2 surface without an extended side chain.
Ligands in cluster 2 and cluster 3 showed conformational heterogeneity in parts of the scaffold that were directed toward multiple regions of the receptor including h3, h8, h11, h12, and/or the b-sheets (Fig EV5C-G). For instance, S-OBHS-2 and S-OBHS-3 analogs (Fig 2) had similar ERa activity profiles in the different cell types (Fig EV2A-C), but the 2-versus 3-methyl substituted phenol rings altered the correlated signaling patterns in different cell types (Fig 3B lanes 7 and 12). Structurally, the 2-versus 3-methyl substitutions changed the binding position of the A-and E-ring phenols by 1.0 Å and 2.2 Å , respectively (Fig EV5C). This difference in ligand positioning altered the AF-2 surface via a shift in the N-terminus of h12, which directly contacts the coactivator. This effect is evident in a single structure due to its 1 Å magnitude (Fig 6A and B). The shifts in h12 residues Asp538 and Leu539 led to rotation of the coactivator peptide ( Fig 6C). Thus, cell-specific activity can stem from perturbation of the AF-2 surface without an extended side chain, which presumably alters the receptorcoregulator interaction profile.
The 2,5-DTP analogs showed perturbation of h11, as well as h3, which forms part of the AF-2 surface. These compounds bind the LBD in an unusual fashion because they have a phenol-to-phenol length of 12 Å , which is longer than steroids and other prototypical ERa agonists that are~10 Å in length. One phenol pushed further toward h3 (Fig 6D), while the other phenol pushed toward the C-terminus of h11 to a greater extent than A-CD-ring estrogens , which are close structural analogs of E2 that lack a B-ring (Fig 2). To quantify this difference, we compared the distance between a-carbons at h3 Thr347 and h11 Leu525 in the set of structures containing 2,5-DTP analogs (n = 3) or A-CD-ring analogs (n = 5) (Fig 6E). We observed a difference of 0.4 Å that was significant (twotailed Student's t-test, P = 0.002) due to the very tight clustering of the 2,5-DTP-induced LBD conformation. The shifts in h3 suggest these compounds are positioned to alter coregulator preferences.
The 2,5-DTP and 3,4-DTP scaffolds are isomeric, but with aryl groups at obtuse and acute angles, respectively (Fig 2). The crystal structure of ERa in complex with a 3,4-DTP is unknown; however, we solved two crystal structures of ERa bound to 3,4-DTPD analogs and one structure containing a furan ligand-all of which have a 3,4-diaryl configuration (Fig 2; Datasets EV1 and EV2). In these structures, the A-ring mimetic of the 3,4-DTPD scaffold bound h3 Glu353 as expected, but the other phenol wrapped around h3 to form a hydrogen bond with Thr347, indicating a change in binding epitopes in the ERa ligand-binding pocket (Fig 6F). The 3,4-DTPD analogs also induced a shift in h3 positioning, which translated again into a shift in the bound coactivator peptide (Fig 6F). Therefore, these indirect modulators, including S-OBHS-2, S-OBHS-3, 2,5-DTP, and 3,4-DTPD analogs-all of which show cell-specific activity profiles-induced shifts in h3 and h12 that were transmitted to the coactivator peptide via an altered AF-2 surface.
To test whether the AF-2 surface shows changes in shape in solution, we used the microarray assay for real-time coregulator-nuclear receptor interaction (MARCoNI) analysis (Aarts et al, 2013). Here, the ligand-dependent interactions of the ERa LBD with over 150 distinct LxxLL motif peptides were assayed to define structural fingerprints for the AF-2 surface, in a manner similar to the use of phage display peptides as structural probes (Connor et al, 2001). Despite the similar average activities of these ligand classes (Fig 3A and B), 2,5-DTP and 3,4-DTP analogs displayed remarkably different peptide recruitment patterns (Fig 6H), consistent with the structural analyses.
Hierarchical clustering revealed that many of the 2,5-DTP analogs recapitulated most of the peptide recruitment and dismissal patterns observed with E2 ( Fig 6H). However, there was a unique cluster of peptides that were recruited by E2 but not the 2,5-DTP analogs. In contrast, 3,4-DTP analogs dismissed most of the peptides  (Fig 6H). Thus, the isomeric attachment of diaryl groups to the thiophene core changed the AF-2 surface from inside the ligand-binding pocket, as predicted by the crystal structures. Together, these findings suggest that without an extended side chain, cell-specific activity stems from different coregulator recruitment profiles, due to unique ligand-induced conformations of the AF-2 surface, in addition to differential usage of AF-1. Indirect modulators in cluster 1 avoid this by perturbing the h11-h12 interface, and modulating the dynamics of h12 without changing the shape of AF-2 when stabilized.

Discussion
Our goal was to identify a minimal set of predictors that would link specific structural perturbations to ERa signaling pathways that control cell-specific signaling and proliferation. We found a very strong set of predictors, where ligands in cluster 1, defined by similar signaling across cell types, showed indirect modulation of h12 dynamics via the h11-12 interface or slight contact with h12. This perturbation determined proliferation that correlated strongly with AF-2 activity, recruitment of NCOA1/2/3 family members, and induction of the GREB1 gene, consistent with the canonical ERa signaling pathway (Fig 1D). For ligands in cluster 1, deletion of AF-1 reduced activity to varying degrees, but did not change the underlying signaling patterns established through AF-2. In contrast, an extended side chain designed to directly reposition h12 and completely disrupt the AF-2 surface results in This was demonstrated with direct modulators in clusters 2 and 3. Cluster 2 was defined by ligand classes that showed correlated activities in two of the three cell types tested, while ligand classes in cluster 3 did not show correlated activities among any of the three cell types. Compared to cluster 1, the structural rules are less clear in clusters 2 and 3, but a number of indirect modulator classes perturbed the LBD conformation at the intersection of h3, the h12 N-terminus, and the AF-2 surface. Ligands in these classes altered the shape of AF-2 to affect coregulator preferences. For direct and indirect modulators in cluster 2 or 3, the canonical ERa signaling pathway involving recruitment of NCOA1/2/3 and induction of GREB1 did not generally predict their proliferative effects, indicating an alternate causal model (Fig 1E). These principles outlined above provide a structural basis for how the ligand-receptor interface leads to different signaling specificities through AF-1 and AF-2. It is noteworthy that regulation of h12 dynamics indirectly through h11 can virtually abolish AF-2 activity, and yet still drive robust transcriptional activity through AF-1, as demonstrated with the OBHS series. This finding can be explained by the fact that NCOA1/2/3 contain distinct binding sites for interaction with AF-1 and AF-2 (McInerney et al, 1996;Webb et al, 1998), which allows ligands to nucleate ERa-NCOA1/2/3 interaction through AF-2, and reinforce this interaction with additional binding to AF-1. Completely blocking AF-2 with an extended side chain or altering the shape of AF-2 changes the preference away from NCOA1/2/3 for determining GREB1 levels and proliferation of breast cancer cells. AF-2 blockade also allows AF-1 to function independently, which is important since AF-1 drives tissue-selective effects in vivo. This was demonstrated with AF-1 knockout mice that show E2-dependent vascular protection, but not uterine proliferation, thus highlighting the role of AF-1 in tissue-selective or cell-specific signaling (Billon-Gales et al, 2009;Abot et al, 2013).
One current limitation to our approach is the identification of statistical variables that predict ligand-specific activity. Here, we examined many LBD structures and tested several variables that were not predictive, including ERb activity, the strength of AF-1 signaling, and NCOA3 occupancy at the GREB1 gene. Similarly, we visualized structures to identify patterns. There are many systems biology approaches that could contribute to the unbiased identification of predictive variables for statistical modeling. For example, phage display was used to identify the androgen receptor interactome, which was cloned into an M2H library and used to identify clusters of ligand-selective interactions (Norris et al, 2009). Also, we have used siRNA screening to identify a number of coregulators required for ERa-mediated repression of the IL-6 gene . However, the use of larger datasets to identify such predictor variables has its own limitations, one of the major ones being the probability of false positives from multiple hypothesis testing. If we calculated inter-atomic distance matrices containing 4,000 atoms per structure × 76 ligand-receptor complexes, we would have 3 × 10 5 predictions. One way to address this issue is to use the cross-validation concept, where hypotheses are generated on training sets of ligands and tested with another set of ligands.
Based on this work, we propose several testable hypotheses for drug discovery. We have identified atomic vectors for the OBHS-N and triaryl-ethylene classes that predict ligand response (Fig 5E   and F). These ligands in cluster 1 drive consistent, canonical signaling across cell types, which is desirable for generating full antagonists. Indeed, the most anti-proliferative compound in the OBHS-N series had a fulvestrant-like profile across a battery of assays (S. Srinivasan et al, in preparation). Secondly, our finding that WAY-C compounds do not rely of AF-1 for signaling efficacy may derive from the slight contacts with h12 observed in crystal structures (Figs 3B and 5H), unlike other compounds in cluster 1 that dislocate h11 and rely on AF-1 for signaling efficacy (Figs 3B and 5C, and EV5B). Thirdly, we found ligands that achieved cell-specific activity without a prototypical extended side chain. Some of these ligands altered the shape of the AF-2 surface by perturbing the h3-h12 interface, thus providing a route to new SERM-like activity profiles by combining indirect and direct modulation of receptor structure. Incorporation of statistical approaches to understand relationships between structure and signaling variables moves us toward predictive models for complex ERa-mediated responses such as in vivo uterine proliferation or tumor growth, and more generally toward structure-based design for other allosteric drug targets including GPCRs and other nuclear receptors.

Statistical analysis
Correlation and linear regression analyses were performed using GraphPad Prism software. For correlation analysis, the degree to which two datasets vary together was calculated with the Pearson correlation coefficient (r). However, we reported r 2 rather than r, to facilitate comparison with the linear regression results for which we calculated and reported r 2 (Fig 3C-F). Significance for r 2 was determined using the F-test for nonzero slope. Highthroughput assays were considered statistically robust if they show Z' > 0.5, where Z' = 1 À (3(r p +r n )/|l p Àl n |), for the mean (r) and standard deviations (l) of the positive and negative controls (Fig EV1A and B).

ERa ligand library
The library of compounds examined includes both previously reported  and newly synthesized compound series (see Dataset EV1 for individual compound information, and Appendix Supplementary Methods for synthetic protocols).

Luciferase reporter assays
Cells were transfected with FugeneHD reagent (Roche Applied Sciences, Indianapolis, IN) in 384-well plates. After 24 h, cells were stimulated with 10 lM compounds dispensed using a 100-nl pintool Biomeck NXP workstation (Beckman Coulter Inc.). Luciferase activity was measured 24 h later (see Appendix Supplementary Methods for more details).
Cell proliferation assay MCF-7 cells were plated on 384-well plates in phenol red-free media plus 10% FBS and stimulated with 10 lM compounds using 100-nl pintool Biomeck NXP workstation (Beckman Coulter Inc.). Cell numbers determined 1 week later (see Appendix Supplementary Methods for more details).
Quantitative RT-PCR MCF-7 cells were steroid-deprived and stimulated with compounds for 24 h. Total RNA was extracted and reverse-transcribed. The cDNA was analyzed using TaqMan Gene Expression Master Mix (Life Technologies, Grand Island, NY), GREB1 and GAPDH (control) primers, and hybridization probes (see Appendix Supplementary Methods for more details).

MARCoNI coregulator-interaction profiling
This assay was performed as previously described with the ERa LBD, 10 lM compounds, and a PamChiP peptide microarray (PamGene International) containing 154 unique coregulator peptides (Aarts et al, 2013) (see Appendix Supplementary Methods for more details).
Protein production and X-ray crystallography ERa protein was produced as previously described (Bruning et al, 2010). New ERa LBD structures (see Dataset EV2 for data collection and refinement statistics) were solved by molecular replacement using PHENIX (Adams et al, 2010), refined using ExCoR as previously described , and COOT (Emsley & Cowtan, 2004) for ligand-docking and rebuilding.
Expanded View for this article is available online.