QSARMethods to Screen Endocrine Disruptors

The identification of endocrine disrupting chemicals (EDCs) is one of the important goals of environmental chemical hazard screening. We report on in silico methods addressing toxicological studies about EDCs with a special focus on the application of QSAR models for screening purpose. Since Estrogen-like (ER) activity has been extensively studied, the majority of the available models are based on ER-related endpoints. Some of these models are here reviewed and described. As example for their application, we screen an assembled dataset of candidate substitutes for some known EDCs belonging to the chemical classes of phthalates, bisphenols and parabens, selected considering their toxicological relevance and broad application, with the general aim of preliminary assessing their ED potential. The goal of the substitution processes is to advance inherently safer chemicals and products, consistent with the principles of green chemistry. Results suggest that the integration of a family of different models accounting for different endpoints can be a convenient way to describe ED as properly as possible and allow also both to increase the confidence of the predictions and to maximize the probability that most active compounds are correctly found.


Introduction
Endocrine Disruptors (EDs), also known as endocrinedisrupting chemicals (EDCs), are a diverse group of "chemicals of emerging concern" which have attracted much interest from the research community since the 1990s. Found in many household and industrial products, EDCs can be released into the environment and have the potential to interfere with normal, hormonally regulated biological process to adversely affect the development and reproductive functions of living species. EDCs are considered to be a serious health threat by contributing to major diseases in humans [1,2].
The identification and safety assessment of potential EDCs is complicated by the observed low-dose effects and non-monotonic dose responses as well as the often long-term exposure to chronic doses of complex mixtures [3]. Today there is still no definitive risk assessment tool for EDCs and their overall effective impact on human health is even controversial [4].
All the major international definitions of what is an EDC [5][6][7] stipulate, with slightly differences, that it is an exogenous agent that cause adverse effects via an endocrine mediated mechanism in an intact organism or its progeny. The toxicity is due to the interference at various levels with the natural hormones in the body (i.e., altered production, release, transport, metabolism, binding, action or elimination), which are responsible for the homeostasis and the regulation of developmental processes.
Despite endocrine disruption is believed to be the result of a series of complex mechanisms it can be almost in part explained by the interaction with specific target proteins. The classical EDC targets are nuclear hormone receptors (NRs), a superfamily of transcriptional factors that can bind to DNA and influence the expression of nearby genes.
The majority of the research on EDCs has examined the consequences of their interactions with estrogen receptors alpha and beta (ER and ER ) and with the androgen receptor (AR). Other NRs investigated are mineralcorticoid-(MR), glucocorticoid-(GR), retinoic acid-(RAR), thyroid hormone-(TA), and peroxisome proliferator-activated receptors (PPAR), just to name a few. In addition, some enzymes related to hormone synthesis and metabolism have been proposed to be affected by EDCs and to impact on NR responses by altering the availability of active hormones [8].
Often substances have been reported to be possible EDC based on the results of screening tests. Indeed, a considerable number of in vitro (sub-cellular or cellular) and in vivo (animal) screening tests for hormone-like activities of substances have been developed [9].
Since biological screening and testing is time-and costintensive, it is crucial to set priorities to ensure that compounds with the highest measured or predicted hormonal activities are prioritized for entry into a screening procedure [10]. More rational approaches to help to identify potentially harmful chemicals in a fast way are urgently needed.
In this context, computational methods that are already a well established and massively used tool in drug discovery and development, can also support toxicological studies related to xenobiotics either in the identification of new EDCs (screening) or pointing into the right direction when focusing on mechanisms of action and on a set of biological targets (profiling) related to toxicity events.
A series of successful and inspiring examples of computational predictive approaches applied to EDCs include pharmacophore modeling [11,12], quantitative structureactivity relationships (QSAR) [13][14][15], molecular dynamics simulations [16], docking studies [17][18][19] and virtual screening [20]. Most of these tools only enable hazard classification of chemicals (e.g., as either receptor binders or non-binders of ER subclasses or other receptors), while only a few provide quantitative estimation of the relative binding (e.g., the Endocrine Disruptor Knowledge Base, EKDB [21]) or of other parameters. In this paper, the focus is on (Q)SAR models addressing EDCs.
IUPAC defines QSAR as mathematical relationships that link chemical structure and pharmacological activity in a quantitative manner for a series of compounds [22]. QSAR methods rely on the theory that similar chemical descriptors of the compounds (e.g., molecular weight, number of heteroatoms etc.) are related to their similar biological activities (and toxicities) [23]. These descriptors are analyzed using various regression, classification and pattern recognition methodologies aiming to find relationship between the descriptors and considered biological activity. If a significant correlation is found, the QSAR model can be used for predicting biological activity of further compounds. However, the model should be first validated and its predictive power (i.e., sensitivity/specificity metrics) assessed.
QSAR models are usually restricted in their applicability within the chemical space in which they are trained [24]. In fact the reliability of the results is highly dependent on the starting data set (i.e., quality and availability of data) and on the similarity between compounds.
Although other modeling methods (e.g., docking) enable to explore the interactions of more diverse chemicals with a specific target (if some structural data are available), QSAR models are generally faster and less computationally intensive. Moreover, QSAR may account for more general toxic effects, also not directly related to receptor binding or deriving from a series of different processes, without having a detailed description of the specific targets and molecular mechanisms involved.
The concerns about ED and the use of suspicious compounds have led consumers and companies to search for less harmful alternatives (i.e., compounds similar in the effectiveness for specific applications and uses). In fact, "free-from" products, with the so-called clean labels (e.g., BPA-free, paraben-free etc.), have been growing exponentially across industries from food and beverage to cosmetics products, driven by consumer pressure because they are perceived to be safer and healthier. However, most of the available alternatives are not well studied with regard to the potential effects on human health and the environment and while being potentially less harmful, still carry a degree of risk. Furthermore, from the standpoint of producers, are material costs and properties that matter, not simply replacing one specific molecule with another.
The main goal of this study is to preliminary assess with the use of a battery of (Q)SAR models whether or not some candidate alternatives are indeed preferable considering their ED potential. The target compounds whose we focus on for substitution are bisphenol A and some representative phthalates and parabens (see Table 1) that are under intense scientific scrutiny for being EDCs. We clarify that this study was done neither for a full risk assessment (i.e., to predict precisely the properties of substances per se) using in silico models as single source of data, nor for prioritization purpose, which would require taking into account several other criteria (e.g., manufacturing and mechanical properties of the material, costs etc.) and not only a predicted ED-effect. At the basis of our strategy there is the assumption that in silico methods provide a rapid screening tool to identify suitable candidate substances to replace chemicals of possible concern within the REACH framework.

The target compounds for substitution.
A key first step in identifying appropriate alternatives is to determine the functions, uses and processes associated with the target molecules, as potential feasible and safer alternatives are often different particularly where a substance has numerous disparate applications. In this section the chemical classes of phthalates, bisphenols and parabens are briefly described for their applications, uses and toxicological relevance. The rationale behind the selection of these five target compounds is briefly explained below.
Phthalates are diesters of phthalic acid employed mainly as plasticizers in plastics manufacturing since the 1930s to increase flexibility and durability of polyvinyl chloride (PVC). Besides general-purpose PVC applications (e.g., wire and cables, flooring, packaging, synthetic leather, automotive etc.) phthalates are also used in medical devices (e.g., tubing and blood bags), rubber products and as fixatives and solvents for adhesives, inks and personal-care products. Phthalate exposure may be through direct use or by indirect means because of leaching from plastic products to food [25,26] and general environmental contamination [27]. Despite a paucity of experimental animal studies for several phthalates, there is sufficient evidence to suggest that some of them are reproductive and developmental toxicants [28][29][30] and have also the potential to be teratogenic, mutagenic and carcinogenic [31,32], though their toxicity varies somewhat depending on the specific structure.
Di(2-ethylhexyl) phthalate (DEHP) and dibutyl phthalate (DBP) were chosen as target molecules for our substitution exercise. DEHP is one of the most abundantly produced phthalate [33] used as general purpose PVC-plasticizer [34] and is already blacklisted as priority water pollutants by the U.S. Environmental Protection Agency (EPA) [35]. DBP is used extensively in the adhesives industry, as solvent and as a fibre lubricant in textile manufacturing. The use of DEHP and DBP is not permitted in toys, childcare articles or in cosmetics in the European Union, The United States, Japan and Canada (EU/2005/84/EC; CPSIA, 2008; HPA, 2010; [36]) and is subject to certain restrictions in food contact applications. DEHP and DBP are respectively classified as presumed (Class 1B) and suspected (Class 2) human reproductive toxicants according to Regulation EC 1272/2008 (CLP). However, phthalates are unregulated and continue to be used in toy making in many other parts of the world, such as China and India.
Bisphenols are a group of chemicals with two hydroxyphenyl functionalities. Bisphenol A (BPA), often simply called "bisphenol", the most popular representative of this group being one of the highest-volume chemicals produced worldwide, was chosen as target molecule for our substitution exercise. BPA is a versatile and valuable chemical building block used to make certain plastics and protective epoxy resins linings inside cans and is present into a variety of common consumer goods, such as water bottles. Moreover, it is used as a developer in thermal paper for cash register receipts. Residual amounts of BPA can leach into can contents, becoming an unwanted source of human exposure to the chemical [37]. The widespread exposure of individuals to BPA is suspected to affect a variety of physiological functions, including reproduction, development, and metabolism and its estrogenic activity has been well documented in the last 15 years [38,39]. BPA is classified as presumed (Class 1B) human reproductive toxicant according to Regulation EC 1272/2008 (CLP).
Parabens are alkyl esters of p-hydroxybenzoic acid widely used as preservatives in cosmetic and pharmaceutical products and suitable for many types of formulas (e.g., shampoos, topical/parenteral pharmaceuticals, toothpaste etc.). Their antimicrobial preservation efficacy against bacteria and molds in combination with their low cost and the long history of their use, probably explains why parabens are so commonplace. The estrogenic properties of parabens have been a focus of concern since the 1990s [40], although they have also been shown to have multiple endocrine-disrupting properties [41,42]. In a review of studies on paraben toxicology, the authors discussed whether "parabens should be termed 'weak oestrogens"' and suggested that "the extent to which parabens can be labelled as 'weak oestrogens' needs further consideration" [43]. Methyl paraben (MP) and ethyl paraben (EP) were chosen as target molecules for our substitution exercise because they are commonly used in a broad range of products being the only parabens allowed also as foods additives.
For a matter of consistency and simplicity both the target compounds and the collected candidate substitutes are identified with an univocal ID constituted in the following way: the first letter refers to the chemical domain of the target molecule and is retained also for relative substitutes (i.e., F for phthalates, B for bisphenols and P for parabens); next a progressive number was assigned; finally a S letter identify the target compounds to be substituted.

The screened dataset of alternatives.
Starting from the five target molecules chosen as case study a series of candidate alternatives (see Annex Table I, II, and III in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/) were identified and collected based on an extensive literature review. The main criteria for the selection of these compounds were the following, where applicable: i) chemicals already considered in previous studies on the substitution of the target molecules; ii) compounds with similar range of use respect to the target molecules to be substituted; iii) structural analogues of the target molecules known to have a reduced ED potential; iv) the

P2_S
Ethyl paraben (EP) likely feasibility of the synthesis of these molecules. These are quite complex criteria and, instead of screening and processing all the possible variables, we preferred to identify some credible candidates, which may cover most possible applications.
The initial list of candidate alternatives was pruned to avoid critical chemicals not suitable to be addressed by in silico models such as complex mixtures, polymers and polypeptides. Salts were neutralized and for simplicity spatial isomerism was not considered (SMILES without @, \ and / symbols).
With respect to phthalates (DEHP and DBP) a series of general-purpose plasticizers and solvents [44] have been considered, searching also for the ones with PVCspecific [45][46][47] and medical applications [48] (see Annex Table I in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/). The majority of them are esters of organic acids (e.g., butyrates, citrates, adipates etc.) or esters of phosphoric acid. Other high molecular weight phthalates and terephthalates with a reduced ED potential [49] were also considered.
A list of alternative chemicals to BPA have been retrieved focusing both on plastics [50,51] and on thermal papers applications [52] ((see Annex Table  II in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/)). Among BPA-free alternatives being explored only a few are not structurally related to bisphenols, some examples are the bio-based isosorbide diglycidyl ether (B19) and the three different components (B15, B16 and B18) of Eastman Chemical's Tritan TM copolyester.
The final data set (see Annex, Tables I, Tables II, and  Tables III in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/) consists of 58 chemical structures: 19 possible alternatives for phthalates, 23 for bisphenol A and 16 for parabens. This final list does not pretend to be exhaustive but can be seen as a collection of the most common alternatives today available.
2.3. The (Q)SAR models. The list of chemicals described above in the data set section has been screened with several in silico models (see Table 2) with the purpose of classification. The models here considered are fundamentally different among each other: i) they are built using disparate approaches and they work differently with the input data (e.g., fragments detection, molecular descriptors calculation etc.); ii) the data subjected to the modeling procedure are focused on some specific biological events, mainly addressing ER-mediated effects, that are not superimposable among models; iii) some models refer to experimental evidences generally ascribable to an ED effect (e.g., an endpoint such as the binding to NRs etc.) while others are based on authorities' opinions (i.e., collection of substances defined as candidate EDC by authorities and listed as structural rules); iv) some models are ready to use and implemented into a software program (e.g., CART-VEGA and OECD QSAR Toolbox profilers) while others are based simply on the structural matching of rules sets (statistics, expert driven or obtained from the literature).
A structural rule (or structural alert, SA), usually coded as SMARTS strings, represents a molecular structure with the extension of generalization. In fact, SMARTS can be used for automatic matching and a software program can quickly find what compounds satisfy the structural requirements expressed by rules. Istant JChem software was used for structure database management and for SMILES/SMARTS matching, Istant JChem 15.11.16.0 ChemAxon (http://www.chemaxon.com).

CART-VEGA model.
We used a previously developed model [55] based on a classification and regression tree (CART) algorithm for the binding to human estrogen receptor alpha (hER-). In that classification model any level of activity was labeled as active. The training set is a subgroup of the Japanese METI database [56] and consists of 806 compounds with experimentally determined values for relative binding affinity (RBA), considering 17 -Estradiol as reference. The CART model allows to perform a binary classification of submitted molecules based on their calculated DRAGON descriptors. In this tree structure, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Following a series of simple 'if/then' rules, from top to bottom along the branches, each molecule is classified as ER binder or non-ER binder when one of the 11 terminal nodes has been finally reached.
The Cooper statistics [57] show that the application of this CART model in predicting a group of compounds never used before (external test set of 150 compounds, 54 actives and 96 inactives) gives very good performance (accuracy, specificity and sensitivity are respectively 0.853, 0.854 and 0.852 [55]).
This CART model is implemented (as Relative Biding Affinity, RBA model) into a Java based platform named VEGA, and is freely available through the internet: http://www.vega-qsar.eu. VEGA includes several other QSAR models addressing a series of endpoints relevant for REACH legislation: toxicological (e.g., Mutagenicity, Skin Sensitization models etc.), ecotoxicological (e.g., Fish Acute Toxicity models etc.), environmental (e.g., BCF, Ready Biodegradability models etc.) and physicochemical (e.g., LogP models). Since for some endpoints multiple models are present (based on different approaches both for their building and application), it is of fundamental importance to score and discriminate between more reliable results and those which should be viewed with particular caution. In this context, VEGA offers a tool to measure the reliability of the prediction, through the applicability domain index (ADI). The assessment of the applicability domain is very important also for the evaluation of the results of a single model, and is requested by the EC regulation on chemical substances, REACH. Within VEGA, when available the original experimental value of the property of interest is also given. The ADI, that ranges from 0 (worst case) to 1 (best case), identifies reasons of possible concern, guiding the user to a careful evaluation of the results. Conceptually it summarizes a series of more specific indices that express a similarity check between the queried substances with those used to develop the model and to verify how accurate their predicted values were [58].

CERAPP rules sets.
Within the framework of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) [59], that involves 17 groups in the USA and Europe, a large quantity of data related to ER signaling has been made available. In particular, this data set is a curated collection of high-quality ER signaling data, deriving from 1677 chemicals screened across 18 high-throughput in vitro test from the US EPA's ToxCast program [60]. A score of 0.01, roughly analogous to an IC 50 of 100 M, was selected to discriminate the estrogenic chemicals (237 compounds, 14.13%) from the non-estrogenic ones (1440 compounds, 85.87%) having very low or no ER activity. On the basis of this CERAPP data set two groups of rules, encoded as SMARTS, have been generated by means of SARpy software [61,62] and by visual inspection, with the purpose of categorization, to detect the structural determinants responsible for the interference with the ER signaling.
SARpy is a tool suitable for building SAR models to classify chemicals according to a given property and is freely available through the internet: http://www.vegaqsar.eu/research.html.
SARpy breaks the chemical structures of the compounds in the training set into fragments of a desired size and it identifies the ones which are statistically related to the property of interest ('positive' or 'negative' experimental activity). For evaluation purpose each generated fragment is matched against every molecular structure in the training data. Having the experimental labels, fragment matches can be divided: either against structures with 'positive' experimental activity, called 'true positives' (TP), or with 'negative' experimental activity, called 'false positives' (FP) in the case we are searching for SAs for positive activity. The likelihood ratio (LR), a parameter that expresses the precision of each potential SA to predict the target activity label, is computed dynamically by the software with the aim to identify the fragments that best generalize the concept of biophores. The following equation has been used: The LR ranges from 1 to infinite and the higher is the value the higher is the predictive power and the relevance for a specific SA. If the number of FP equals zero then the LR value goes to infinite; from a practical point of view this means that the SA extracted is highly representative because all the subset of compounds with that SA share the target property. Table 2: Overview of the models addressing endocrine disruptors whose a detailed description will follow.

Models Endpoint(s) considered
CART-VEGA 1 Relative Binding Affinity, hER CERAPP rule sets 2 Various, related to ER signaling UBA EDC-scan 3 Binding to hER , hER or hAR OECD QSAR Toolbox profilers 1 Various, related to ER binding EC priority list rule set 2 Various, addressing any kind of mechanism both in vitro and in vivo and considering different species P&G DART scheme 3 Binding to ER, AR, RAR, AhR and prostaglandin receptor agonists 1 Models directly implemented into a software program; 2 Rule sets, statistics and/or expert driven, extracted from a set of selected compounds; 3 Rule sets obtained from the literature or as part of more complex architecture.
The first rule set -called hereafter "CERAPP rule set A/I" -have been obtained from a reduced group of compounds consisting of 1529 molecules (89 actives and 1440 inactives) once eliminated weakly active compounds (148 compounds in the score range 0.01-0.1). The rules set, deriving from the output of SARpy software, is composed of a total of 56 rules (see Annex  Tables IV in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/), 13 for active compounds and 43 for inactive ones. The prediction is made simply through a structure matching. In the case for a compound both Active and Inactive SMARTS were found the prediction cannot be made and it will be classified as not predicted (NP).
The second rule set -called hereafter "CERAPP rule set A" -is based on active compounds only and is made up of 17 SMARTS (see Annex Tables V in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/). The starting 89 active compounds have been clustered according to chemical similarity and then the underlying chemical feature describing the chemical class was drawn. When applying a rule set all chemicals not fired by any rules or SMARTS are assigned to the inactive class so, unlike for the rules set A/I, there are no unpredicted compounds.
To validate our rules sets we employed part of the Japanese METI database (used to build the CART-VEGA model previously described) as external test set, excluding the compounds already used to extract the rules. The performance, expressed by the Cooper statistics, for both the training and the test set (571 compounds, 227 actives and 344 inactives) are shown in Table 3 for the two rules sets.
2.6. Rules from UBA EDC-scan. The German Federal Environment Agency (UBA) developed a tool, called EDC-scan to screen and identify substances of very high concern (SVHC), and particularly addressing endocrine disrupting chemicals. It consists of 41 structural alerts encoded by SMILES, identifying structural features for the potential binding to ER and to AR. The tool currently is designed as a first indicator screen based on in vitro data only, and SAs cannot differentiate the receptor type (ER-, ER-or AR) or the effect (agonistic or antagonistic). The rules set (see Annex  Table VI in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/) has been used to screen our data set of possible substitute compounds, however to our knowledge it has not been fully validated yet. The majority of these rules can be clustered based on common scaffold and summarized by a lower number of simple structures like alkyl-phenols, diphenyls, diphenyl ethers, methylene diphenyls, phthalates, parabens and sterane-based. Other rules, less frequent, are based on indanes, norbornanes, phenantrenes, succinimides, calcones, cumarines and dibenzo-p-dioxyns.

OECD QSAR Application Toolbox profilers.
Two profilers are freely available in the Organization for Economic Cooperation and Development (OECD) (Q)SAR Application Toolbox software to investigate endocrine disruptors: the rtER Expert System ver.1 -USEPA and the Estrogen Receptor Binding profilers. Their use allows to highlight the presence of chemical characteristics associated to endocrine mode of action (estrogenic one). Even though the presence of an alert of a profiler is used for categorization purpose and not for predicting activity per se, in our exercise in a precautionary manner all chemicals flagged by at least one of the profilers are considered as positive.
The first profiler is the rtER Expert System ver.1 -USEPA [63,64]. It consists of molecular definitions mimicking the structural criteria of chemical classes for potential estrogen receptor-binders covered by US EPA Estrogen Receptor Expert System (ERES). This expert system, based upon binding to the rainbow trout ER (rtER), is considered to have good concordance with the human and rat ER-. The ERES profiler is a logic rule-based decision tree that encodes the experts' mechanistic understanding with respect to both the chemical and biological aspects of well-defined endpoints of the ER bioassay domain. The ERES was built to be mechanistically transparent and meet the needs of a specific application, i.e., predict for all chemicals within two well-defined inventories (industrial chemicals used as pesticide inert substances and antimicrobial pesticides). A chemical class-based approach was designed to allow extrapolating from a limited number of well-characterized training set compounds to a broader inventory of chemicals by employing effects-based chemical category and readacross concepts. Effects-based chemical categories are based upon a mechanistic hypothesis of how these non-steroidal chemicals of seemingly dissimilar structure to 17ß-estradiol (E2) could interact with the ER via two distinct binding modes. The automated version of the ERES enables users to compare the predicted chemical to training set chemicals within each chemical group (i.e., the decision tree node). Chemicals falling outside the applicability domain of the model (whether active or inactive) were considered by the Expert System to have "Unknown Binding Potential" (UnkBP). The second profiler incorporated in the Toolbox is the Estrogen Receptor Binding profiler. It is based on structural and parametric rules extracted from the findings reported in the scientific literature and supported by experimental data [65][66][67]. The ER-profiler classifies chemicals within five main groups (see Table 4) according to their binding propensities, ranging from Very strong binder, Strong binder, Moderate binder, Weak binder to Non binder. Molecular weight (MW) and structural characteristics such as hydroxyl or amino groups were also considered. This tool has been recently evaluated considering its predictive performance [68] against two large datasets of chemicals (binding affinity to rat and human ERs), and the reported Cooper statistics indicated that the binding affinities for the majority of them could be correctly predicted.

EC priority list rule set.
Two lists of ED compounds have been retrieved from official and reliable sources: the European Commission (EC) and the World Health Organization (WHO). The EC list arises from a Communication from the Commission to the Council and the European Parliament on a Community Strategy for Endocrine Disruptors published in December 1999. According to the Communication, a priority list of substances has been established for further evaluation of their role in endocrine disruption. From a total of 564 chemicals that had been suggested by various organizations or in published papers or reports as being suspected EDs, 147 were considered likely to be either persistent in the environment or produced at high volumes (HPV). Of these, however, in a first assessment clear evidence of ED activity was noted for only 66 (assigned Category 1 using the criteria adopted in the study). A further 52 chemicals showed some evidence suggesting potential activity (Category 2). Of the 66 chemicals in Category 1, humans were considered likely to be exposed to 60  Each list was cleaned and checked in order to obtain reliable structures (as SMILES strings) to be used as input for the following computer-based processing: retrieval and check of structures; removal of duplicates; pruning and removal of compounds with unclear structure. Each cleaned list has been processed with in-house software based on the similarity index as calculated by VEGA, and clustered. The resulting clusters are manually examined and modified when necessary, in order to obtain a chemical-meaning driven grouping. Matching the two lists, it can be observed that the EC list covers the 12,5% of the WHO's one (22 molecules are in both lists). Apart from these 22 compounds, some clusters of both lists share a certain degree of similarity. Each cluster of the EC priority list has been studied in order to extract the relevant chemical features responsible for the grouping. These chemical moieties are coded as SMARTS strings (see Annex Table VII in Supplementary Material available online at http://www.agialpress.com/journals/nurr/2016/101203/) and used for structure matching. Basically, they correspond to chloro alkyl (except trichloro group), chlorinated and/or Table 4: Structural criteria for the characterization of the binding potency of chemicals according to ER-profiler of the OECD (Q)SAR Application Toolbox.

Prediction Criteria
Very strong binder Chemicals with two non-impaired OH groups attached to two different five or six carbon-atom rings and a MW between 200 and 500 Da.

Strong binder
Chemicals with a non-impaired OH or NH2 group attached to five or six carbon-atom ring and a MW between 200 and 500 Da.

Moderate binder
Chemicals with an unhindered OH or NH2 group (one in the para-or meta-position of the ring) attached to a single five or six carbon-atom ring and a MW between 170 and 200 Da.

Weak binder
Chemicals with a non-impaired OH or NH2 group attached to a five or six carbon-atom ring and a MW<170 Da.

Non binder
Chemicals that do not follow the above mentioned criteria: with a MW≤ 500 Da and five or six carbon-atom ring structure but with impaired or lacking OH or NH2 groups; with a MW≤ 500 Da but that do not present cyclic structures; with a MW> 500 Da regardless of structure and shape.
brominated biphenyl, chlorinated and/or brominated biphenyl ether, chlorinated and/or brominated methylene biphenyl, dithiocarbamate, phthalate, trialky/aryl tin, orthoalkylphenol and bisphenol A core. Also the de-halogenated forms of biphenyl, biphenyl ether and methylene biphenyl were considered for the screening, in a precautionary way. Compared to the other models here used for endocrine disruptors this is the most general one since it is based on experts' opinions considering effects observed in higher tier studies, addressing any kind of mechanism causing the toxic effect.

Rules from the P&G DART scheme.
A framework for identifying chemicals with structural features associated with the potential to act as developmental or reproductive toxicants (DART) was developed by The Procter & Gamble (P&G) Company and is reported in the literature [70]. It consists of an empirically-based decision tree for determining whether or not a chemical has structural features that are consistent with chemical structures known to have toxicity for DART endpoints. It was built on the basis of the combination of known modes of action (MOA) and associated structural features, as well as an empirical association of structural fragments within molecules of DART chemicals. The design of this tool is based on a detailed review of 716 chemicals (664 positive, 16 negative, and 36 with insufficient data) that have been evaluated for their DART potential. These chemicals were grouped into 25 different categories, and 129 sub-categories, based on defined receptor binding, chemical properties and, when known, their MOA. The decision tree is not intended to be used as a stand-alone tool, and by design is expected to broadly capture chemicals with features that are similar to chemicals with reported effects for developmental and/ or reproductive toxic effects. This tool can be used both as a component of a screening system to identify chemicals of potential concern, and as part of weight of evidence decisions based on SAR, to fill data gaps without generating additional test data.
The original data set of the P&G DART scheme has been implemented into the VEGA platform as a Developmental/Reproductive Toxicity virtual library. This model implements all the 25 categories, and tries to find an exact match between the given compound and any of the virtual compounds in the library. If a match is found, a prediction of "Toxicant" is given, otherwise a "NON Toxicant" prediction is provided.
Since ED-related toxicity is only partially included into DART endpoints, for our screening purpose only two out of 25 categories were considered. In particular, those more closely related to binding with NRs and endocrine effects are exploited: Category 2 (Table 5) for Estrogen receptor (ER) and androgen receptor (AR) binding compounds and Category 3 (Table 6) for Retinoic acid receptor (RAR), aryl hydrocarbon receptor (AhR) binders and prostaglandin receptor agonists.

Results and Discussion
Most of the models employed identify as problematic the five target compounds that are under investigation for their possible substitution (F1_S, F2_F, B1_S, P1_S and P2_S). Several alternatives to bisphenol A maintain reason for concern probably due to ability of the models to catch more strongly the moiety related to ER interaction (presence of aromatic hydroxyl group) and since the majority of the substitutes are quite homogeneous and structurally related to bisphenols. Conversely, fewer compounds are flagged if we consider parabens and phthalates. One reason can be that parabens show generally low estrogenicity and depending on the model a higher level of activation is required to be included among the list of positive chemicals. It is believed that phthalates act mainly inhibiting testosterone synthesis [72] while most of the models are focused on ER effects. Moreover, the substitutive compounds for parabens and phthalates are more heterogeneous and different than those for bisphenols.  Since we used models really different among each other and based on various collections of data, the reliability of the predictions (e.g., ADI or other indices) can be not easy to assess. In some cases (e.g., CART-VEGA), if the model has a proper statistical basis, the AD has been checked. In the case of models which are expert systems, gathering collections of rules derived from the manual work of the human expert, the modeling procedure does not involve the statistical scheme of a set of chemicals used as a training set. Thus, in this situation the definition of the AD is not suitable, and the only way to assess if a certain rule apply to the target chemical is simply through the match between the structures. For this reason, we reported information concerning the reliability of the prediction only for CART-VEGA and P&G DART models as provided by the VEGA platform.
Among substitute compounds of phthalates (Table 7) five compounds (F5, F7, F9, F11 and F17), apart from the target F1_S and F2_S, are flagged as possible EDCs by at least one model. F5 is a phosphate ester and is predicted with a low ADI by CART-VEGA, while F7, F9, F11 and F17 candidates have in common the phthalic group. Interestingly F18, a trialkyl phosphate ester really similar to F5, is not flagged as ED-positive by any of the models. CART-VEGA model reported some compounds (F1_S, F7 and F11) to be experimentally ED-positive, they share all the phthalic group and the predictions performed by the other models are in agreement. Conversely, other similar compounds (F9 and F17) are predicted by some models as ED-positive but these predictions are not in agreement with the experimental data reported by CART-VEGA. They should be considered anyway as a warning because while CART-VEGA takes into account only ER binding the other models consider also other ED related effects.
It seems that both the CERAPP rule sets A/I and A are not able to flag any compounds, even the targets. This is due to the fact that no rules relative to short chain phthalates are present and probably this class of compounds is underrepresented in the starting CERAPP data set.
From an overall view of the results the candidates alternatives suitable for F1_S and F2_S substitution that are probably safer considering endocrine disruption are 11 and the majority of them are esters of organic acids: F1, F2, F3, F4, F6, F8, F10, F12, F13, F15, F18.
Considering substitute compounds of bisphenol A (Table  8), as already noted, the majority of them are flagged and according to models only five (B15, B16, B18, B19, B21) are not predicted as possible ED. In particular, all these five compounds are lacking the aromatic hydroxyl group and hence are probably poor ER binders. B15, B16 and B18 are all components of Eastman Chemical's Tritan TM copolyester, B19, isosorbide diglycidyl ether (isosorbide epoxy), is a biobased plastic building material and B21 Pergafast 201 ® is a color developer for use in thermal paper manufacture.
For substitute compounds of parabens (Table 9) only three compounds (P3, P4 and P16), apart from the targets P1_S and P2_S, are flagged by any models as possible EDs and are respectively bronopol, chlorexidine and trichlosan. However, since the reliability of these prediction is unknown they should be view only as an indication. The other substitutive compounds with preservative activity are boric acid, hexamethylentetramine, various organic acids and sugar-based Table 7: Results of the screening for possible alternatives of phthalates (compounds F1_S and F2_S). Compounds detected as potential EDs are red-colored. Prediction with a low ADI (and hence low reliability) are highlighted with a question mark (?) while experimental values with an asterisk (*). chemicals that, based on existing knowledge, are unlikely to show toxicity related to endocrine system disruption.

Conclusions
QSAR models can be a valuable tool to describe various toxicity endpoints, included EDC-related ones and to facilitate the prioritization of not yet tested compounds for future screening. Unfortunately, in the literature QSAR studies on EDCs presenting experimental evidence for good predictive power are often lacking [73]. This is probably due to the extreme complexity of the endocrine disruption phenomena and on the scarcity of experimental data available.
Since the toxicity related to endocrine system disruption occurs through many different mechanisms and involves a high number of different target sites, it is probably very difficult, if not even implausible, to predict this endpoint by a single computational model. Moreover, the use of multiple models can be of help in assessing the reliability of the predictions by evaluating the concordance among different tools employing different reasoning schemes. A convenient way to describe endocrine disruption as properly as possible should be through the integration of a family of different endpoints accounting for diverse NRs-pathways (e.g., receptor binding, receptor dimerization, chromatin binding of the mature transcription factor, gene transcription, changes in NR-induced cell growth kinetics etc.) and possibly considering also in vivo endpoints.
The models reviewed in this study are fundamentally different among each other, from both a theoretical and an applicative point of view. They are focused on some specific biological events that are not superimposable among models and can be viewed as different facets of the same complex toxicological process (named generically endocrine disruption). In our evaluations we assumed that compounds not labelled as EDCs by the majority of the models are more likely to be safe. In this way a small group of candidates suitable for the substitution of di-2-ethylexyl phthalate, dibutyl phthalate, bisphenol A, methyl paraben and ethyl paraben, with a low ED potential, has been selected.
It is important to note that there is no a winning model, but rather a case-by-case evaluation should be made taking into account also the applicability domain of each model when possible. And it is clear that further work has to be done concerning the integration of different models.
Finally, it also should be considered that any in silico method provides just predictions, therefore all the results should be confirmed and supported by further experimental studies. In fact, the effect relative to the endocrine disruption of the chosen substitutes will be experimentally tested in the following phases of the project. Table 8: Results of the screening for possible alternatives of bisphenol A (B1_S). Compounds detected as potential EDs are red-colored. Prediction with a low ADI (and hence low reliability) are highlighted with a question mark (?) while experimental values with an asterisk (*). Table 9: Results of the screening for possible alternatives of phthalates (compounds P1_S and P2_S). Compounds detected as potential EDs are red-colored. Prediction with a low ADI (and hence low reliability) are highlighted with a question mark (?) while experimental values with an asterisk (*).