Identification of Potential Human Protein Targets for Soybean Isoflavones

Soy isoflavones have been reported as endocrine disruptors due to their ability to modulate the activity of estrogen receptors (ERs) in mammals; however, its ability to modulate other metabolic pathways is not entirely clear, which makes it necessary to identify new pharmacological targets that interact with these compounds present in soybean. In this work, a virtual screening was executed to identify potential targets of nine soy isoflavones, employing human proteins target from PharmMapper. The best 25 fit scores were selected and prepared for AutoDock Vina docking protocols. The results suggest that equol, daidzein and biochanin A, have the potential to interact with targets such as phenylethanolamine N-methyltransferase, sex hormone-binding globulin and vitamin D3 receptor, respectively. The validations of docking protocols showed good pose reproducibility (root-mean-square deviation (RMSD) ranged 0.001-3.854 Å) and a modest correlation between binding affinities and agonist concentration, AC50 (correlation coefficient (R) = 0.643, p < 0.001). Protein interaction network revealed that predicted targets for soy isoflavones are involved in different pathways, including neurotransmission, metabolism, and cancer remarking the need of a better understanding of the effects of these compounds on human health.


Introduction
The endocrine disruptors (EDs) are substances or exogenous mixtures that alter the function of the endocrine system, and consequently cause adverse health effects in an organism or its progeny. [1][2][3] Those include a wide variety of anthropogenic and naturally occurring compounds that work through several mechanisms, from mimicking hormones to interaction with downstream signals. 4 Many EDs are commonly found in the human diet, including fruits, vegetables and beans, among others. 5,6 Soybean is a common food for adults and infants that is particularly rich in phytoestrogens, in particular isoflavones. 7 These natural products are derivatives of heterocyclic phenols that exhibit similar structure to estrogens. As typical phenolic compounds, they are potent antioxidants. 8 Isoflavones include genistein, daidzein, equol, glycitein, biochanin A, among others. 9,10 Most of these compounds are known as selective estrogen receptor modulators (SERM), and modulator of critical pathways for growth and cell proliferation, an event that affects multiple organs. 11,12 The genistein inhibits the activity of protein tyrosine kinase (PTK). 13,14 In vitro assays have shown that, although phytoestrogens activate transcription of genes dependent on both estrogen receptors, ERα and ERβ, usually have a selective binding affinity for ERβ. [15][16][17][18] It has been reported that phytoestrogens may reduce estrogen-dependent prostate cancer, 19 and related to the occurrence of cervical cancer, cardiotoxicity, polycystic ovarian syndrome, infertility, among others. 5,20,21 In addition to binding to estrogen receptor, isoflavones may be targeting many other biochemical pathways. One way to discover those signaling routes is to use computational methods to find molecular targets. These include: the prediction based on ligand similarity with drugs or evaluated molecules, 22,23 network prediction linked to known proteins, 24 methods based on adverse effects of phenotypes generated by drugs, molecular docking, pharmacophore mapping, and comparison of binding site or fingerprints. 25 Artificial intelligence methods have emerged as a powerful tool for identifying new targets in drug discovery. However, these methods have limitations when it comes to undeclared proteins as targets, since these methods are based on the principle that similar ligands exert similar biological effects, sometimes presenting erroneous predictions for targets with little information or noisy data available. 26 The identification of new protein targets for small molecules remains a major challenge in drug discovery, as well as, the study of their adverse effects. In recent years, the tireless search for reliable computational strategies has been growing, since experimental predictions require great expenditure of resources and time.
Virtual screening has made possible the identification suitable biological targets for particular compounds, mainly when conclusive experimental data are not available. There are many web servers that have been used for this purpose, including SwissTargetPrediction, 27 idTarget, 28 INVDOCK, 29 TarFisDock, 30 among others. However, most do not generate the coordinate for protein-ligand binding site, and others base their search on molecular docking, but with a limited number of human target proteins. In this work we combined different criteria of computational chemistry. PharmMapper server 31 was selected as the first filter, which is based on the search for targets based on the receptor pharmacophore mapping, with 23236 proteins, covering 16159 pharmacophore models as druggable binding sites, and then combined with molecular docking and protein interaction network methods.
PharmMapper server is offered as a friendly tool whose purpose is finding potential targets for small molecules. PharmMapper fit score generates a nearly 7,000 × 7,000 score matrix. After this, when a new molecule is submitted, the fit score to each pharmacophore will be calculated first, and then every fit score of a specific pharmacophore will be compared to the fit score matrix to measure its score level among all the scores of the pharmacophore. 32 On the other hand, protein-ligand molecular docking is a computational method, which binds flexible molecules within a rigid or flexible representation of a receptor, allowing to obtain the best energetically-possible conformation and geometric orientation. The program AutoDock Vina 33 has been widely used for this purpose. 34,35 In this study, combined computational approach PharmMapper server, 31 AutoDock Vina, 33 pharmacophore mapping and protein interaction network were used to perform the virtual screening to identify potential target proteins for soybean isoflavones, with subsequent validation of in silico methods. Additionally, it is discussed the interactions and possible role of these compounds on the different metabolic pathways for the identified targets.

Ligands optimization
Nine isoflavones present in soybean were evaluated as input ligands to find their plausible targets: genistein, 36 daidzein, equol, 37

Identification of target proteins for isoflavones with PharmMapper
A three-step approach was applied to identify target proteins for soybean isoflavones. The process includes the optimization of the ligands, virtual screening of new targets based on pharmacophores, and the calculation of the protein-ligand affinity by molecular docking. First, the ligand structures were optimized with the density functional theory (DFT) method at the Becke, 3-parameter, Lee-Yang-Parr (B3LYP)/6-31G (d',p') basis set. The calculations were performed using the Gaussian 09 program. 41 The resulting geometry was converted to Mol2 format with the Open Babel program. 42 The optimized isoflavone structures were submitted to the PharmMapper server using the option "Human Protein Targets Only". This allows the identification of potential human protein targets for isoflavones using a pharmacophore mapping approach. 43 This server contains 23236 proteins (covering 16159 druggable, 459 of which are human protein targets), and 51431 ligandable pharmacophore models. 44,45 The selection of potential pharmacological targets was carried out using the fit score value, employing the best 25 human proteins targets displayed in the target list (5.4% of human protein targets). The verification of protein binding sites coordinates was carried out visually with Sybyl-X 2.0, 46 in order to ensure that the ligand remained in the cavities of the protein.
The molecular interaction between isoflavones and target proteins selected by PharmMapper was verified by molecular docking protocols with AutoDock Vina. Nine isoflavones optimized structures were docked on first 25 target proteins according to the fit score value using AutoDock Vina program. 33 3D protein structures were downloaded from Protein Data Bank 47 and prepared with SYBYL-X 2.0 package. 46 The preparation process consisted of removing water molecules and other co-crystallized ligands, repairing and fixing of amides in amino acid side chains, then hydrogens were added to amino acids residues (protonation). Optimization using Powell conjugate gradient method; with dielectric constant value of 1.0, gradient convergence fixed to 0.001 kcal mol −1 , maximum number of iterations at 1000, and Kollman united/all-atoms force fields with AMBER charges. 48 The protein-ligand molecular docking was defined by establishing a cube with the dimensions 24 × 24 × 24 Å, fixed at the geometric centre of each evaluated soy isoflavone, and grid point spacing of 1.0, covering the ligand-binding site provided by PharmMapper. Molecular docking calculations for isoflavones included 20 number modes, an energy range of 1.5, and exhaustiveness equal to 25, and all docking run analyses were performed in triplicate; and their binding affinities (kcal mol -1 ) reported as mean ± standard deviation.

Identification of interacting protein residues and validation of protocols
The identification of amino acids in target proteins that interact with the isoflavones was performed employing LigandScout 3.0. 49 This tool creates simplified pharmacophores models to detect the number and type of interaction between the ligand and the residue in the binding site of the protein. Images of each protein-ligand complex were displayed using PyMOL. 50 Protein interaction network (PIN) of potential targets predicted for soy isoflavones Protein interaction network analysis and pathway enrichment for the potential target of soy isoflavones were performed with STRING 11.0 web server, 51 in order to identify and illustrate the protein-protein interactions and neighbor genes. Names of predicted PharmMapper protein targets for soy isoflavones were searched in Uniprot database and gen codes were extracted. Pathway analyses were processed according to the Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Reactome database. 52 Validation of docking protocols performed using two computational approaches The first computational approach, re-docking calculations, was performed with co-crystallized ligand for the hits proteins, in total 100 AutoDock Vina docking runs were performed for each evaluated complex. The RMSD (root-mean-square deviation of atomic positions) values were calculated with Sybyl-X 2.0 46 for the best AutoDock Vina predicted ligand pose, as well as value for experimentally reported pose and calculated RMSD values as histogram graph of co-crystallized ligands.
A second validation approach included a correlation analysis, which was performed in order to establish a relationship between the calculated binding affinity values and the biological data. In this work, only the hormone binding globulin (SHBG) bioassay from PubChem assay ID: 318680 53 was used for validation, because it contained a higher number of compounds evaluated, and unlike the assays recorded for the other targets predicted by PharmMapper, these have structural similarity with soy isoflavones. In addition, this benchmark data set of compounds has been used for validation of popular molecular field-based quantitative structure-activity relationship (QSAR) and molecular docking techniques. 54 Biological data for a representative set of 60 active ligands of SHBG were extracted from PubChem database. 53 The biological information consisted in the displacement of [3H]-5-alpha-dihydrotestosterone from the human SHBG. Theoretical binding affinities for these active compounds on the SHBG (Protein Data Bank (PDB): 1LHO) were obtained using the molecular docking protocols previously described.

Results and Discussion
Virtual screening using PharmMapper and AutoDock Vina An inverse virtual screening was carried out to predict potential targets of soy isoflavones ( Figure 1). The results from PharmMapper, verified using AutoDock Vina are shown in Table 1. According to PharmMapper fit score, most protein targets for isoflavones are nuclear receptors, being the ER highly frequent as a target among the best 25 main hits (Tables S1-S9, Supplementary Information (SI) section). The calculated AutoDock Vina affinity values were considerably poor for those targets above number 25 according to the fit score.
Soy-derived compounds also had good theoretical binding preferences for different enzymes such as transferases, reductases, kinases, phosphatases and dehydrogenases, including also globulin-related proteins. Selected proteins as better targets based on AutoDock Vina calculations were the phenylethanolamine N-methyltransferase (PNMT, PDB: 1YZ3), 55 sex hormonebinding globulin, (SHBG, PDB: 1LHO) 56 and vitamin D3 receptor (VDR, PDB: 1DB1) 57 for equol, daidzein and biochanin A, respectively. Soy isoflavones are phenolic/non-steroidal bioactive compounds classified as phytoestrogens. The exposure to these chemical agents occurs mainly through the intake of soy and its derivatives, these are being consumed at varying rates according to the region and dietary preferences. Moreover, these phytoestrogens are also provided in food products for cattle. 43 Isoflavones present in soybeans have been widely described as phytoestrogens, 58-60 as a result of their ability to modulate hormonal function via ER, 19,61 making them well-known endocrine disruptors. However, additional pathways by which these chemical agents can exert various health effects are still unknown. The results showed that soy-derived products could have health effects through their interaction with potential molecular targets. The best binding affinity for tested molecules was found for equol (-10.8 kcal mol -1 ) on PNMT, an enzyme important on the catecholamine biosynthesis. This protein participates on the conversion of norepinephrine to epinephrine. 62 The latter is related to the sympathetic nervous system and adrenal medulla, involved in many physiological functions, including blood pressure, vasoconstriction, cardiac stimulation and regulation of blood glucose level. 63 The protein-ligand interaction takes place near a binding site occupied by the cofactor S-adenosyl-L-homocysteine. In fact, equol also interacts with Phe-102 and Val-187 (Figure 2), residues that also bind to the purine ring in the cofactor. 55 Daidzein, another recognized soy phytoestrogen, presented an affinity value of -10.4 kcal mol -1 when binding on the sex hormone binding globulin. SHBG is a glycoprotein, biosynthesized mainly in the liver, which participates in the regulation of free estradiol and testosterone in plasma. 64,65 Moreover, it has been reported 66 that isoflavones or phytoestrogens increase the serum levels of SHBG. In addition, in vitro studies 14,18 have showed that isoflavones displace testosterone and 17β-estradiol from the binding sites of SHBG, potentially altering the bioavailability of free steroids and the androgen/estrogen   balance. Likely, as shown by the docking results, the potential interaction of daidzein with SHBG may result on hormone displacement from SHBG. The docking SHBG-daidzein complex displayed the same binding site reported for the native ligand, 5α-androstan-3β,17β-diol (1LHO). 56 This site includes Thr-40, Ser-42, Gly-58, Asp-65, Phe-67, Asn-82, Leu-131 and Met-107 (Figure 3). The DHD, a derivative of daidzein, presented an affinity value of -10.2 kcal mol -1 , occupying the same pocket in the protein, with small differences in protein-ligand interaction residues.
Molecular docking for biochanin A and genistein on VDR, a ligand-dependent transcription regulator, presented an affinity value of -10.1 and -10.0 kcal mol -1 , respectively. This receptor is involved in the transcription of a wide variety of genes like miRNAs, MIR181a, CYP19A1, among others. In addition, it controls the transcription of messenger ribonucleic acid (RNA), the regulation of the amount of translated proteins, the differentiation of hematopoietic cells and the production of aromatase, a key enzyme on the biosynthesis of estrogens. 67 The interaction VDR-biochanin A (Figure 4) involves residues Ser-237 and Trp-286, amino acids that have been reported 68,69 to have a critical role in the positioning of their natural ligand 1α,25-dihydroxyvitamin D3 (1,25 (OH) 2 D 3 ) on VDR. Trp-286 is also important as it allows an adjustment of the small conjugated chain (C=C-C=C) that connects the rings A and C of 1,25 (OH) 2 D 3 , facilitating the processing of the ligand binding through a network of hydrogen bonds. 57,68,70 Protein-protein interactions (PPI) for predicted soy isoflavones target proteins (Table 1) were evaluated again by using STRING 11.0 51 and these seed proteins were integrated into the total PIN with nodes 25, number of edges was 70, average node degree was 5.6, average clustering coefficient of 0.676, expected number of edges = 24, and PPI enrichment p-value = 3.83 × 10 -14 . Pathways of hub neighbors were obtained from GO (Gene Ontology) proteins involved in the same pathway except NRH: quinone oxidoreductase 2 (NQO2), dihydroorotate dehydrogenase (DHODH), transthyretin (TTR), estrogen-related receptor gamma (ESRRG) ( Figure 5). However, TTR and ESRRG participate in hormonal regulation processes. 71 The KEGG analysis revealed the participation of these proteins in different pathways including cyclin-dependent protein serine/threonine kinase activity, hormone binding, kinase binding, nuclear receptor activity, among others (Table S10, SI section). On the other hand, Reactome pathways analysis showed the participation of these proteins in critical routes including transcription of genes involved in G1 cell cycle arrest, G1/S deoxyribonucleic acid (DNA) damage checkpoints, generic transcription pathway, and diseases of signal transduction (Table S11, SI section). These findings suggest that isoflavones could modulate different signaling pathways, explaining some of the effects observed for the evaluated compounds. 72 However, some KEGG signaling pathways revealed the close relationship of these proteins in cancer-related mechanisms such as CDK2, CCEN1 and RB1, opening a door for the isoflavones present in soy to be used as a platform for the design of anticancer agents. 73

Validation of molecular docking protocols
Two approaches were carried out to validate docking protocols. The first one included a superposition using the ligand in its X-ray complex and that predicted employing AutoDock Vina. For several complexes available in the PDB containing isoflavones as ligands or substrates, 100 re-docking runs were carried out using the co-crystallized ligands previously extracted from the protein complex (Table S12, SI section). The capability of the molecular docking protocols to reproduce the crystallographic pose of the ligands were examined by RMSD calculation (Å). 74 Calculated mean RMSD values ranged from 0.001 to 3.864 Å ( Table 2). The maximum RMSD value (3.864 Å) was observed for the complex estradiol 17-beta-dehydrogenase 1/EM-1745 (C 37 H 51 N 5 O 7 ), probably as a result of its bulky nature with a large number of rotatable bonds. 33 Mean RMSD distributions are presented in Figure 6 for PDB crystallographic structures, showing the reproducibility of poses by the AutoDock Vina, where trends are also presented with RMSD values for pose replication analyses, as well as some values far from the mean value. The histograms were drawn using PAST, 75 adjusted to eight class marks, with kernel density estimation (KDE), used to estimate the probability density function of a nonparametric random variable. 75 Mean affinity values obtained for co-crystallized ligands in the analyzed proteins (VDR, PNMT and SHBG) showed similar magnitude to those presented for soy isoflavones, these affinity values ranged between -6.2 and 11.7 kcal mol -1 . However, for the PNMT protein, the affinity with the co-crystallized ligand was much lower (-6.2 kcal mol -1 ) than with the soy isoflavone equol (-10.1 kcal mol -1 ). It is important to clarify that in most cases the binding sites of the co-crystallized ligands were different from those predicted by PharmMaper server for the soy isoflavones.  A second validation approach consisted of a correlation between experimental and predicted biological data for SHBG agonists based on their binding affinities. Experimental agonist concentration 50 (AC 50 ) were recovered from PubChem BioAssay Database for 60 compounds. 3D structures were docked on receptor sex hormone-binding globulin structure, and PDB: 1LHO with AutoDock Vina program. The PubChem chemical structure identifier (CID), AC 50 , AutoDock Vina affinity and logarithm of biological activity (log AC 50 ) for compound set evaluated are shown in the Supplementary Information (Table S13).
Linear correlation analysis showed the acceptable prediction capacity of protein-ligand interaction and biological activity of SHBG agonists with molecular docking protocols used, as shown in Figure 7. Linear Pearson correlation obtained for AutoDock Vina was 0.643 (p < 0.001), with a positive relation between binding affinity values (kcal mol -1 ) and log AC 50 . Details of statistical parameters for correlation analysis are described in Supplementary Information (Table S14).
The internal validation of molecular docking protocols generated RMSD values, mostly lower than 2 Å (Figure 6), as it has been reported for other studies using AutoDock Vina; 76,77 supporting the reliability of this protein-ligand docking protocols. Moreover, external validation showed that binding affinity values from AutoDock Vina displayed a good correlation with biological data (Figure 7), suggesting the simulated protein-ligand dockings are likely to occur at the molecular level. The correlation coefficients obtained during the validation are similar to those generated in similar theoretical studies. 48,78,79 The identification of potential targets for natural products using PharmMapper and AutoDock Vina, combining pharmacophore mapping, molecular docking and protein interaction network provides a theoretical strategy to detect molecular targets for small natural compounds. The results showed that soy isoflavones can interact with proteins present in different routes of hormonal metabolism, which agrees with the reported evidence that classifies these compounds as phytoestrogens. The advantages of these methodologies combine in a complementary way, including the identification of potential targets and the exploration of potential molecular mechanisms associated with bioactive natural compounds. 22,60,80

Conclusions
Inverse virtual screening and molecular docking protocols were employed to discover theoretical human protein targets for isoflavones found in soybean such as equol, daidzein and biochanin A which have the potential to interact with proteins such as PNMT, SHBG, and VDR. These interactions were mostly of type aromatic, hydrophobic and hydrogen bonds; which is favored by the structure of isoflavones. Predicted macromolecules are present in critical pathways including neurotransmission, cell differentiation metabolism, cancer and homeostasis of estradiol/testosterone, among others. AutoDock Vina binding affinities for soy isoflavones were validated using in silico and experimental data. This information should promote additional pharmacological testing for isoflavones found in many food products, especially those recommended for children.

Supplementary Information
Supplementary information (fit score values and binding affinities for the best 25 protein targets predicted by PharmMapper, KEGG and Reactome pathways for soy isoflavones targets using STRING 11.0, data set for validation of docking protocols with experimental data, interactions for SHBG-DHD and VDR-genistein complexes) is available free of charge at http://jbcs.sbq.org.br as PDF file.