Predicting In Silico Which Mixtures of the Natural Products of Plants Might Most Effectively Kill Human Leukemia Cells?

The aim of the analysis of just 13 natural products of plants was to predict the most likely effective artificial mixtures of 2-3 most effective natural products on leukemia cells from over 364 possible mixtures. The natural product selected included resveratrol, honokiol, chrysin, limonene, cholecalciferol, cerulenin, aloe emodin, and salicin and had over 600 potential protein targets. Target profiling used the Ontomine set of tools for literature searches of potential binding proteins, binding constant predictions, binding site predictions, and pathway network pattern analysis. The analyses indicated that 6 of the 13 natural products predicted binding proteins which were important targets for established cancer treatments. Improvements in effectiveness were predicted for artificial combinations of 2 or 3 natural products. That effect might be attributed to drug synergism rather than increased numbers of binding proteins bound (dose effects). Among natural products, the combinations of aloe emodin with mevinolin and honokiol were predicted to be the most effective combination for AML-related predicted binding proteins. Therefore, plant extracts may in future provide more effective medicines than the single purified natural products of modern medicine, in some cases.


Introduction
Plant-derived secondary metabolites have been used to treat acute infections, health disorders, and chronic illness for tens of thousands of years. Only during the last 100 years have natural products been largely replaced by synthetic drugs [1]. However, important anticancer agents have to be extracted from plants, due to their complex structures that oen contain several chiral centers. Further, some patients show resistance to known treatments [2]. erefore, new treatments with different modes of action are constantly sought. Plants are an abundant source of new natural products. Estimates of 200,000 natural products in plant species have been revised upward as mass spectrometry techniques have developed [3]. New databases, omics methods, and good practice standards are promising to deliver many new medicines based on plant natural products [4].
Several studies have demonstrated that mixtures in extracts from herbal medicines had anticancer potential in vitro or in vivo [5][6][7]. Among many other studies, aqueous extracts from willow (salix sp.; Salicaeae) leaves prevented proliferation of three cancer cell types acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), and Ehrlich ascites carcinoma cells [7]. erefore, the complex mixtures in crude extracts may be more effective than single puri�ed natural products.
Leukemia was among the most common cancers throughout human history [8]. However, the greater prevalence of leukemia in the modern world may be due to the reduction of infectious diseases that lead to the increased life span for most human populations. Unfortunately, by 2 Evidence-Based Complementary and Alternative Medicine 2012 treatments for cancer diseases were expensive with no assurance that even simple leukemia can be cured.
For developing countries the identi�cation and use of endogenous medicinal plantsas cures against leukemia and other cancers have become attractive [9]. In developed countries the use of in silico analysis to predict useful new treatments and potential side-effects has risen to prominence [4,5,10]. Here, the two approaches were combined.
Single natural products with well-known bioactivities [11,12] were used in this study. e objectives here were to identify potential antileukemic compounds; to predict effective combinations of natural products; and to rank them on the basis of putative antileukemic activity. In addition the in silico analysis sought to identify target protein(s) for potentially antileukemic natural products; to predict the modes of action of those compounds; to predict potential adverse drug reactions (toxicity); and to predict the absorption, distribution, metabolism, and excretion (ADME) pro�les.

Databases and
Soware. e reference databases and soware of Ontomine were used for predictive analysis. Ontomine was chosen because it provided an innovative chemoinformatics prediction tool based on the presence or absence of chemical group(s) of a set of related natural products. Ontomine searches were performed against large and manually curated databases. ey included (i) Literature searches based on experimentally determined properties from around 100.000 diverse small molecules, collected from databases, encyclopedias, and other literature followed by expert hand-curation; (ii) BioAssay Knowledgebase that was compiled from over 500 bioassay data found at NCBI-PubChem; (iii) Target Protein Knowledgebase that was compiled by curation among the ∼1500 proteins from DrugBank at NCBI-PubChem (details given in Figure 1). (iv) Pathway Analysis; KEGG pathways were used as references (p://p.genome.jp/pub/kegg/); (v) Docking Algorithms were used to identify molecular binding sites and predict ligand binding constants. Ontomine databases and tools are among those used widely in this �eld [4].

Natural Products Selected for Prediction of the Basis of Antileukemia
Activity. irteen commercially available, puri�ed, natural products of plants were selected to be tested for their antileukemic (AML) properties (Table 1). Natural products had been shown to cause some cell death when incubated with a primary AML cell lines for 24 hrs at low concentrations (Supplemental Table 1 see Supplementary materials available online at http://dx.doi.org/10.1155/2013/801501). Mortality rate and dose dependence were known [7], but bioassays had not clearly identi�ed the best therapeutics, and the in vivo analysis of all mixtures would have been costly. erefore, in silico prediction was used to identify the best candidates for later testing in vivo. A parallel analysis was made with established drugs for AML treatments so that the predictions could be compared.

Ontomine-Based Analyses of Functional Groups.
Ontomine was used to transform the structural information for chemically, biologically, or pharmacologically related molecules to a hierarchical schema of functional groups. It was used to discover patterns in the related schema and predict biological activity, toxicity, and side-effects using rules inferred from analyzing the patterns. e basic algorithms underlying Ontomine-based predictions were as follows.
For rule or pattern detection the cluster formation was based on a similarity threshold (ST) that was calculated for each molecule using the formula: No. of common functional groups in molecule 1 and 2 Maximum functional group count in either molecule (1) Once ST was calculated, clusters formed if the ST was less than or equal to 0.7. Two clusters could be merged if each contained the same molecules or a subset thereof. Results were generated with con�dence levels of high, medium, and low. e Ontomine algorithm took into account presence and absence of functional groups among sets of related molecules (with similar bioactivity or toxicity) to derive rules for making predictions. is approach was different from traditional maximum common subgraphs (MCS) approaches in which only a part of molecule is taken into consideration for decision making.

Drug Target
Analysis. e database used contained bioassay data for ∼500 predicted binding proteins (Supplemental Figure 1). Natural product target identi�cation was done by mining two speci�c knowledgebases: the "Drug Target KB" and "BioAssay KB". BioAssay KB was generated from 493 bioassay records from NCBI and Drug Target KB (1,346 records) was generated from PubChem, DrugBank, and TTD [4]. Predicted binding proteins for 13 natural products were predicted at three con�dence levels of high, medium, and low. e predicted bioassays were validated using scienti�c journals, and some of the bioassay targets were found as active against AML in the literature [13][14][15][16][17][18][19][20][21][22][23][24].

Reverse Docking to Predict Binding Affinities for Proteins.
Docking was a computational method used to estimate binding strength among biomolecules (protein-ligand and protein-protein interactions). Traditionally docking was used as computational tool for screening databases of natural products to mine a set of a few candidate drug-like compounds. Reverse docking was a comparatively new application of docking in which a database of proteins (drug targets) was docked against a set of natural products (potentially anticancerous compound), to predict binding affinities. AutoDock4.0 soware was used to carry out reverse docking over a database of ∼1,100 predicted binding proteins available in the Potential Drug Target Database (PDTD) [10]. e PDTD contained more than 1,100 protein entries with known F 1: Subnetwork dysregulated in AML versus normal white blood cells. Interaction-type annotations from KEGG were shown as the letters above the arrows where; E was enzymatic ; T was transcription with subscript + showing activation and − showing inhibition; B was protein-to-protein binding. Subscripts for the predicted protein-to-protein interactions were c: for compound interactions, +: activation, −: inhibition, i: an indirect effect, s: a state change, p+: phosphorylation, p−: dephosphorylation, m: methylation, u: ubiquitination, g: glycosylation and "none" for missing information.
3D structures. e PDTD covered diverse information of more than 830 known or potential drug targets, including protein and active site structures, related diseases, biological functions, and their associated regulating (signaling) pathways. Taking every ligand as a probe, reverse docking was carried out with the entire database (PDTD). e respective ligands were also prepared using AutoDock4.0. A grid of each protein's binding pocket (site) was constructed using the Autogrid module of AutoDock4.0. Every ligand was separately docked into the binding site of each protein. e interaction energies between the ligand and the proteins were calculated in the form of docking scores. AutoDock4.0 also yielded the inhibition constant ( ) for every docking calculation. e predicted binding proteins were screened using a cut off value of <10 M for the inhibition constant ( ).

Pathway Analyses.
Pathway analyses were performed on predicted binding proteins, to provide insights into potential mechanisms of natural product activity. ree statistical con�dence levels were considered for pathway analyses. Statistical analysis used Fishers Exact Test to identify overrepresented pathways. disease. Networks can include transcript and protein coexpression data. Here AML was compared to normal white blood cell transcript abundance data from published studies [12,25] available within the NCBI-GEO database (Gene Expression Omnibus) (Accession: GSE17054, GSE9476). e algorithm described [25] for �nding signi�cant subnetwork/modules in gene interaction network was used to �nd transcripts which are altered in abundance in AML samples when compared to normal samples. is algorithm focused on �nding small networks, which would be easier to be interpreted and validated. It computes values for subnetworks, which helped identify signi�cant subnetworks. is analysis produced lists of signi�cant sub-networks (Supplemental Table 2).

Final Protein Target
Selection. e objective of the protein target selection was to analyze data generated through chemoinformatics and structural informatics method, and select relevant and important predicted binding proteins, which would be used for selecting potential drug/drug combination. Detailed literature searches were conducted for predicted binding proteins generated by the earlier analyses (Target Discovery/Identi�cation). e disease pathways annotations from KEGG were used to select cancer related predicted binding proteins along with additional literature searches. e literature searches were used to support annotations from KEGG and also to provide additional information about target protein whose role in carcinogenesis has been established recently [13][14][15][16][17][18][19][20][21][22][23][24]. Predicted binding proteins were selected based on following criteria: (1) Target protein should be predicted as related with compounds of interest (natural products estimated from Reverse Docking or Ontomine).
(2) Predicted binding proteins should be related with AML/cancer (KEGG annotation and/or literature).

Validation of Existing Drugs for AML.
Seventeen existing drugs for AML were included in the analysis to (i) understand the mode of action of drugs; (ii) predict activity pro�les; and (iii) use them as benchmarks for the natural products analysis. e following drugs were used as benchmark sets in analysis; natural products 6-mercaptopurine; 6-thioguanine; L-malate and vincristine; and synthesized drugs, amona�de, belinostat, clofarabine, cytarabine, daunorubicin, etoposide, �udarabine, gemcitabine, idarubicin, mitoxantrone, paclitaxel, prednisone, and tipifarnib. Reverse docking for 13 natural products along with 17 benchmark drugs was carried out with a target database (PDTD) of ∼1060 protein F 2: Aloe emodin was predicted to be bound at the centre surrounded by the amino acid residues from the active site of 2GDZ. It can be seen that aloe emodin forms 4 hydrogen bonded interactions with the residues of the active site. ese hydrogen bonded interactions are partly responsible for stabilizing the ligandprotein complex. In addition the hydrophobic interactions present between the ligand and the protein are displayed as wireframe spheres.
structures. Each Docking was performed using 50,000 energy evaluations for 5 conformational searches per ligand, with a 60 × 60 × 60 dimensional grid box size and a 0.375 Å grid covering the whole of the active site. e predicted binding proteins were further screened using the cut off value of 10 M for the inhibition constant ( ).
where Ac was the number of predicted binding proteins related to cancer; Am was the number of predicted binding proteins related to AML; Ah was the number of predicted binding proteins detected as hubs in interaction networks from gene-expression network analysis; comm was the number of common target protein(s) shared by constituent drugs in combination; sp was the speci�city score de�ned as (the number of cancer-related predicted binding proteins minus the number of protein not related to cancer)/(Total no. of predicted binding proteins for drug/combination). Dscore was derived by analyzing target pro�le obtained through docking analysis for drug combinations where Dscore = ((0.2 * ( ( )) (0.4 * )) (0.3 * )) 0.1 * comm) 1 ; ( was the number of proteins used for forward docking ( = 59).
Combiscore: was the statistic derived from combining Cscore and Dscore: Computing the scores weights was associated with parameters, to alter their relative importance. For example predicted binding proteins related with AML were given more weight than proteins associated with cancers in general. For single natural products the process began by computing the Combiscore for each of the 13 natural products. e mean Combiscore was calculated (1.36) and natural products with Combiscore greater or equal to 1.36 were selected as potentially useful.
For pairs of natural products all possible binary combinations were considered, with constraint of considering only those combinations which started from a selected singlet. A Combiscore was computed for each pair and ranked. e mean Combiscore was calculated. Finally, potential combinations that had a Combiscore greater or equal to the mean Combiscore (4.25) were reported.
For sets of three natural products combinations the same process was followed with the Combiscore threshold set  greater or equal to the mean Combiscore (6.15). Higher Combiscores indicated better drug combinations.

Bioassays.
Ontomine analysis identi�ed probable targets for each of the 13 natural products among proteins that were potential bioassay targets (Table 2). In total 618 proteins were predicted to be bioassay targets at various con�dence levels (157 at high con�dence, 91 at medium con�dence and 370 at low con�dence). Table 1 showed the four proteins previously reported to be bioassays targets of the natural products. Multiple (4-8) natural products were predicted to interact with bioassays targets, guanine nucleotide binding protein, alpha activating polypeptide O, multidrug resistance protein 1, myeloid or lymphoid or mixed-lineage leukaemia protein, and runt-related transcription factor isoform 1 (AML1c). Clearly Ontomine identi�ed a larger set of bioassays than what has been previously reported. ose bioassay targets could be contributing to the effects or side-effects of the natural products.

Reverse Docking.
Reverse docking was predicted with structures of natural products docked against protein databases. Using the structures of the natural product to interrogate the database of protein structures identi�ed 17 proteins predicted to bind natural products with a less than 1 M. All 17 were predicted to be related to AML by network analysis. ey were ALOX12 that encoded arachidonate 12-lipoxygenase; AR for the androgen receptor; BCL2L1 and BCL2L2 for the B-cell lymphoma 2-like proteins; CDC42 encoding the cell division cycle 42 (GTP binding protein, 25 kDa); DUSP3 for the dual speci�city phosphatase 3; GSK3B encoded the glycogen synthase kinase 3 beta; IGF1R for the insulin-like growth factor 1 receptor; KLF5 for the ruppel-like factor 5 (intestinal); MAPK1 for the mitogen-activated protein kinase 1; MMP14 for the matrix metallopeptidase 14 (membrane inserted) protein; NFKB1 that encoded the nuclear factor of kappa light polypeptide gene enhancer in B cells; RHOA, the ras homolog gene family, member A; RUNX1, the runt-related transcription factor 1; SMAD3, for mothers against decapentaplegic homolog family member 3; and STAT1 and STAT3 the signal transducer and activator of transcription 91 kDa protein and acutephase response factor, respectively. In Figure 1 three of these proteins (STAT3, MAPK1, IGF1R) were found within a small node of just ��een interacting proteins. Further, two proteins were involved in B-cell regulation (BCL-2, KFKB1).
In total reverse docking for 13 natural products and 17 drugs to ∼1,060 protein structures yielded 92 targets with inhibition constants ( ) < 10 M. e reverse docking results were then analysed in conjunction with the Ontomine results. ese include targets which either belong to a subset of reverse docking and study or else they are identi�ed as important targets by Ontomine. A total of 95 targets were �nally identi�ed (Supplemental Table 3).

Docking.
Among the 95 binding proteins identi�ed by Reverse Docking, the protein structures which could be used for docking were available for 59 targets. ese 59 structures were �nally used for docking against the 14 ligand structures under consideration along with the benchmark ligands. e docking parameters were kept the same as those used for reverse docking.
Among the ligand-protein binding interactions of significance was aloe emodin predicted to bind to the protein 15hydroxyprostaglandin dehydrogenase type1 (2GDZ; Figure  2). Aloe emodin was predicted to bind at the center of   the structure surrounded by the amino acid residues from the active site of 2GDZ. It can be seen that aloe emodin forms 4 hydrogen bond interactions with the residues of the active site. ese hydrogen bond interactions were predicted to stabilize the ligand-protein complex. e human 15-hydroxyprostaglandin dehydrogenase type1 has been reported to be elevated in abundance and activity in AML cell lines [7].

Combinations of Natural Products and Drugs for AML
Treatments. e Cscore, Dscore and Combiscores were all signi�cant for aloe emodin, chrysin, honokiol, mevinolin, resveratrol, l-ascorbic acid, 6-palmitate, and cholecalciferol (Table 3 (a)). Considering pairs of natural products with predicted synergistic interactions, 25 combinations were found (Table 3 (b)). Twelve pairs included aloe emodin suggesting that this natural product would work well in many mixtures. ere were 7 pairs that included honokiol and 6 for chrysin. Even natural products not predicted to be effective alone, like limonene, could be effective in paired mixtures.
Considering sets of three natural products with predicted synergistic interactions 15 combinations were found (Table 3 (b)). Eleven drug combinations included aloe emodin, again suggesting that this natural product would work well in many mixtures. ere were 5 mixtures that included honokiol but Evidence-Based Complementary and Alternative Medicine 9 only 3 for chrysin. Again natural product not predicted to be effective alone like limonene could be effective in multiple mixtures (Figure 3).

Discussion
Analysis of 13 natural products inferred that some interactions might be useful and novel. Target pro�ling by Ontomine, Docking, and Gene Expression Network analysis indicated that the natural products were predicted to bind to proteins which were important targets for cancer treatments. Pathway analyses indicated statistical overrepresentation of cancer-related pathways among drug targets for aloe emodin, cerulenin, chrysin, honokiol, mevinolin, and resveratrol.
Mixtures of natural products were predicted to be more effective than single products, as reported experimentally [7,8,26]. Improvements in predicted effectiveness for mixtures of natural products could be attributed to drug synergism due to increase in relevant targets or improved speci�city of drug constituents. is increase in predicted effectiveness was not likely to be derived from random effects like pooling of result of individual drugs, since the analysis accounted for important factors for drug combinations like target relevance to cancer/AML, speci�city, and common targets among drug constituents, while calculating scores are used for ranking drug combination.
e parallel analysis on benchmarked drugs which exist in the market for AML treatment was signi�cant since the efficacy of these drugs was supported by many publications [2,8,10,27]. Combination analysis on drugs was also successful in discovering well-known combinations like amon-a�de plus cytarabine and daunorubicin plus prednisone. at aloe emodin plus mevinolin plus honokiol was identi�ed as the best combination that has interesting clinical implications. Analysis of this mixture with respect to drug speci�city, targeting AML-related proteins and targeting cancer-related hubs, will be a priority for future laboratory and clinical research. Several other combinations have potential to treat drug-resistant cancers in the future. In future analyses in silico more complex drug combination search algorithms can be applied to incorporate dose, absorption, and excretion as suggested [27].
�on�ict of �nterests e authors of the paper do not have any con�ict of interests nor any direct �nancial relation with the National CFIDS Foundation, 103 Aletha Road, Needham, MA 02492, USA.