New strategy for drug discovery by large-scale association analysis of molecular networks of different species

The development of modern omics technology has not significantly improved the efficiency of drug development. Rather precise and targeted drug discovery remains unsolved. Here a large-scale cross-species molecular network association (CSMNA) approach for targeted drug screening from natural sources is presented. The algorithm integrates molecular network omics data from humans and 267 plants and microbes, establishing the biological relationships between them and extracting evolutionarily convergent chemicals. This technique allows the researcher to assess targeted drugs for specific human diseases based on specific plant or microbe pathways. In a perspective validation, connections between the plant Halliwell-Asada (HA) cycle and the human Nrf2-ARE pathway were verified and the manner by which the HA cycle molecules act on the human Nrf2-ARE pathway as antioxidants was determined. This shows the potential applicability of this approach in drug discovery. The current method integrates disparate evolutionary species into chemico-biologically coherent circuits, suggesting a new cross-species omics analysis strategy for rational drug development.

Despite considerable progress in genome-and proteome-based high-throughput screening methods used for rational drug design, the pharmaceutical industry is not producing new drugs as rapidly as before 1,2 . The hope of the rapid translation of 'genes to drugs' has foundered on the reality that disease biology is complex, and that drug development must be driven by insights from biological responses 3 . Although the 'post-genome era' has resulted in a significant increase in the number of targets of therapeutic interest, most of these targets have no known small-molecule modulators 4 . It is thus urgently needed to select active molecules for specific human diseases from large libraries of chemical molecules.
"Omics" approaches to systems biology have led to the elucidation and analysis of multiple cellular networks representing transcriptional regulation, genetic interactions, protein-protein interactions, and metabolism 5,6 . Interestingly, the architecture of these molecular networks from different species are significantly conserved during evolution, an insight that is being used to better define and understand mammalian molecular networks based on homology with their counterparts in lower organisms that were better defined and studied 3 . Therefore, cross-species molecular network homologies may suggest key conserved areas that can be exploited for chemical intervention 3 . For example, previous studies indicated that there are many evolutionarily conserved and functionally convergent molecular subnetworks between humans and model organisms (e.g., mouse, yeast, etc.) [7][8][9] , some of which are disease-related and can be targeted by many bioactive compounds 10 . In particular, numerous molecules from disparate species such as plants and microorganisms can improve human health and treat various diseases by regulating man's signaling networks 11 . Thus, it has been hypothesized that different species have evolved conservative modules with similar biological functions and that chemicals may affect them similarly [12][13][14][15][16] .
For example, a comparative genome analysis revealed that 70% of human cancer genes have orthologues in Arabidopsis 17 ; and plants and humans can generate structurally similar fatty acid oxidation products in response to wounding (e.g., jasmonic acid in plants and prostaglandins in humans) 15 . However, the underlying mechanisms of these phenomena are unclear 16 . Therefore, determining chemical and biological network associations among different species may assist with understanding mechanism of action of natural products (NPs) and influence the discovery of new chemical entities form an NP library.
Here we present a cross-species molecular network association (CSMNA) profile to define chemical and biological connections between humans and 267 other species of plants, fungi and bacteria; and to identify pharmacologically active chemicals from massive NPs. CSMNA is based on the hypothesis that human and plants/ microbes have similar signaling chemicals and biochemical networks in specific functional categories so that molecules produced by different species may overlap functionally (Fig. 1). This technique can be used for targeted screening of bioactive chemicals that regulate certain human pathways via identifying chemico-biologically associated pathways in a specific plant or microbe. CSMNA offers reliable prediction of novel pharmacologically active NPs. With systematic validation, we verified the chemical and biological relationships between the plant Halliwell-Asada (HA) cycle and the human Nrf2-ARE pathway, and how HA cycle related nature chemicals modulate human Nrf2-ARE pathway against oxidative damage.

Results
Molecular networks of disparate evolutionary species are structurally and functionally related. Downstream of the transcriptome and proteome, the metabolome is closely related to diversified biological functions 18,19 , thus ideal for being applied to establish the chemico-biological association of different species 20 . We manually collected 13,109 functional metabolic network modules for human and 267 other species including 50 plants, 110 fungi, and 107 bacteria 21 (See  § Materials and Methods, Supplementary Table S1). The obtained modules represent tight metabolic units involving 1 to 29 biological reactions ( Supplementary Fig. S1).

Figure 1. An illustration of the hypothesis that the intrinsic chemico-biological link between human
and plants/microbes explains why natural products have pharmacological functions. Natural products (NPs) are produced by specific metabolic pathways of plants/microbes and some of them can be used as drugs to modulate molecular networks of human. We hypothesize the reason of why these NPs can serve as drugs is because their corresponding biosynthesis pathways are chemico-biologically associated with the human molecular networks. Here, a module chemico-biological similarity (MChS) is defined to evaluate the association degree between the human and plant/microbe modules. The similarity between NPs and drugs was assessed with both the chemical structure similarity and pharmacological similarity. To search for the chemico-biologically related modules in other species for each human module, we calculated module chemico-biological similarity (MChS) by integrating both metabolic reaction similarities and network topology similarities. This approach can reduce bias caused by genome divergence among different spices, and reveal deep inner biological relationships 22,23 . A normalization step is then applied to eliminate module size influence on MChS scoring ( §Materials and Methods).
The resulting cross-species MChS matrix represents the association degree of modules between humans and other species (Fig. 2a). Interestingly, modules from different species that carry out the same biological processes tend to cluster together. For example, the "glutathione biosynthesis" modules of 135 plant/microbe species are located in one group ( Supplementary Fig. S2). The average MChS scoring for each module (Fig. 2b) shows that M50, guanine ribonucleotide biosynthesis, has the highest module similarity in both humans and plants/ microbes. In addition, human M52 (pyrimidine ribonucleotide biosynthesis) and plant M28 (ornithine biosynthesis) also have high module similarity with other modules (Fig. 2b).
Next, we calculated the ratio of highly associated module pairs (MChS > 0.6) to all module pairs for every plant/microbe and human combination. Fungi have the closest association with humans compared with plants and bacteria (Fig. 2c). Average ratios between humans and plants, fungi, and bacteria are 0.0392, 0.0417, and 0.0392, respectively (P-value < 0.01, Student's t-test, Fig. 2c), suggesting that molecular networks are more evolutionarily conserved between fungi and humans. This result is consistent with previous phylogenetic analyses indicating that animals and fungi are closely related and plants constitute an independent evolutionary lineage 24,25 . This result also suggests that CSMNA can extract evolutionary relationships among organisms from different kingdoms, supporting our hypothesis that there are conserved molecular modules that provide convergent structures and functions in different species.

Natural chemical molecules from disparate evolutionary species show similar biological functions.
To determine which plant/microbe modules produce or are involved in producing the natural chemicals, we performed attribution analysis of these molecules based on existing biosynthesis information ( §Materials and Methods). The most available 2,067 NPs were assigned to the 13,023 plant/microbe modules, each of which contained 8 to 403 NPs ( Supplementary Fig. S3). Similarly, all 1,713 drugs were mapped to 76 human metabolic modules ( Supplementary Fig. S3) based on the available drug-target interactions 26,27 ( §Materials and Methods).
We first assessed chemical similarity between the NP and drug sets to uncover potential connections between their respective modules. A weighted-ensemble similarity approach was employed to calculate the overall similarity for any two sets of small molecules 28 ( §Materials and Methods). We observed that many chemical molecules corresponding to modules from humans and other species were chemically similar (P-value < 0.01) (Fig. 3a, Supplementary Fig. S4). Drug-like NPs tend to distribute in specific modules of a plant or microbe, not randomly (Fig. 3b). For example, 4 of 57 NP sets in Ricinus communis are chemically similar (P-value < 0.01) with drug sets targeting the 'glycosaminoglycan biosynthesis (linkage tetrasaccharide)' module in humans, and these NPs are mainly related to 'nucleotide metabolism' modules ( Supplementary Fig. S4). In addition, different species have different abilities to produce bioactive NPs. For example, plants, fungi and bacteria have an average of 60, 25, or 9 drug-like NP sets respectively, suggesting that plant NPs are the greatest potential source of new drugs (Fig. 3c,d).
These data proved a close relationship between NPs and drug molecules from two aspects. Chemical molecules produced by a module of particular plants and microorganisms could be closely related to drug molecules interacting with the specific human module. Chemically, these two sets of molecules have structural similarities and are pharmacologically, functionally consistent. These findings again suggested that natural chemical molecules having functional convergence characteristics across disparate evolutionary species may play similar biological functions.

New drug discovery based on the chemico-biological association between different species.
To investigate whether the chemical similarity between NP and drug sets can be attributed to the chemico-biological association between modules of human and plants/microbes, a box plot analysis was first performed on the two score sets (Fig. 4a). As MChS scores increased from 0.1 to 0.6, the chemical similarity of NP and drug sets did not change significantly. However, the chemical similarity dramatically increases 10 4 -fold as the MChS score increases from 0.6 to 0.8. We calculated the ratio of the number of highly related module pairs (MChS ≥ 0.6) that has similar chemicals to the number of all highly related module pairs (MChS ≥ 0.6). A hypergeometric test was applied to obtain the chance of module pairs with both MChS ≥ 0.6 and highly similar chemicals. We found that 37% highly related human-plant/microbe module pairs have similar chemical sets (P-value < < 0.01, hypergeometric test). Further, we calculated the correlation coefficient between the chemical similarity and MChS score and found that the chemical similarity between NP and drug sets was highly correlated with MChS (R 2 = 0.9, P-value < 0.01, Fig. 4b). Thus, for plant/microbe modules that are highly chemico-biologically associated with certain human module, their associated NPs can interact with the human module.
Based on this, we then searched for NPs that can be used to treat a particular disease. In total, 12,007 module pairs consisting of 43 human modules and 5,000 non-human modules were extracted with MChS score > 0.6 and  with significant chemical similarity (P-value < 0.01) (Fig. 4c). Relationships between modules and diseases were obtained from disease-gene associations ( §Materials and Methods). The 2,463 NPs were mapped to 387 human disease entities through 43 human functional modules. These NPs mainly included alkaloids (31%), terpenoids (17%), and flavonoids (9%) (Fig. 4d). As shown in Fig. 4e, human modules were associated with 21 disease types. For instance, neoplasms (C04) were associated with 6 different molecular modules/networks. Interestingly, modules of carbohydrate metabolism were connected to the maximum categories of NPs, and were involved in the most disease types. For example, the 233 NPs produced by Thiobacillus denitrificans' module "glycolysis" were enriched in N07X (nervous system drugs). Associated human modules have been confirmed to involve neurological symptoms as well 30 .
These data suggest that chemical molecules corresponding to chemically and biologically conserved molecular networks between different species are highly related. Investigation into the relationships of molecular networks between humans and other species and natural molecules corresponding to molecular networks of other species will assist us with drug discovery for compounds in great demand.
Natural products related to the plant HA cycle can modulate the human Nrf2-ARE module. To verify the reliability of this method, we selected a representative human Nrf2-ARE module and its associated module: HA cycle of plants, which have a strong chemico-biological association (MChS score = 0.61) and a pharmacological relationship (P-value = 4 × 10 −3 ) (Fig. 5a). The HA cycle is the main metabolic pathway in plants for resisting oxidative damages caused by various abiotic and biotic stresses via hydrogen peroxide (H 2 O 2 ) detoxification 31 . While the human Nrf2 module is needed to protect cells from oxidative stress, and dysfunction of this pathway is noted in cancers, neurodegenerative diseases and cardiovascular diseases 32 .
Surveying the literature, we collected 155 NPs whose synthases are regulated by the HA cycle (Supplementary  Table S3). Structural similarity between chemicals related to the plant HA cycle and the human Nrf2-ARE module is significantly greater than random (P-value < 0.01, Fig. 5b). To investigate which of these NPs can directly target the human Nrf2-ARE pathway, we predicted multiple ligand-target interactions for the 24 proteins involved in the Nrf2 pathway using systems drug targeting (SysDT) 33 and weighted ensemble similarity (WES) 28 . This process revealed 16 druggable proteins interacting with 95 HA cycle-related NPs (Supplementary Table S4). Then a cellular thermal shift assay (CETSA) was established to validate the 41 drug-target interactions between 10 hub  Table S5) 34 . We observed that 24% (10/41) of the predicted drug-target interactions were valid (Supplementary Fig. S5).
To determine whether resveratrol and α -viniferin can modulate the Nrf2-ARE pathway against oxidative damage in K562 cells, intracellular glutathione (GSH) and reactive oxygen species (ROS) were measured with CMF and DCF fluorescence and both increased with increasing resveratrol in a dose-dependent manner (Fig. 5e). Although pre-treatment with N-acetylcysteine (NAC) or SOD for 12 h increased intracellular GSH by 360% and 250%, respectively, GSH can be increased significantly further by resveratrol by 8 or 9 fold. In contrast, resveratrol reduced damage from BSO (an inhibitor of GCL) treatment of cells as demonstrated by GSH content variations from 90 to 230% between resveratrol-treated and untreated samples (Fig. 5e). Similarly, α -viniferin also alters intracellular GSH and ROS (Fig. 5e). Thus, NPs related to the plant HA cycle can modulate the Nrf2-ARE pathway in human K562 cells.
Taken together, the plant HA cycle and the human Nrf2-ARE pathway may share similar anti-oxidant mechanisms, which results in NPs related to the HA cycle can modulate the human Nrf2-ARE pathway.

Discussion
Although vast amounts of biological data are available, there is still the matter of how to use them to promote and accelerate drug development. To develop new drugs, one can start from a disease and look for natural products that can be used as drugs to treat it, or can start from a compound and try to identify the disease that it can target by understanding its pharmacological activity. However, because of limited availability of disease models, the technical difficulties of compound extraction, and other related technical problems, investigators are largely working in a blind fashion during the drug development process. Numerous natural products derived from plants and microbes have been successfully developed into drugs, but it is not known why products from such remotely related species can both modulate human physiological networks and benefit human health.
Herein, we established the chemico-biological associations among molecular networks of humans and 267 plants/microbes for targeted drug screening from natural sources. Approximately 1,400 functional module pairs were identified as being significantly associated and these associated pathway pairs have significant biochemical and functional similarities, chiefly involved in carbohydrate, lipid, nucleotide, and amino acid metabolism in different species. Of the pathways, in humans, the most were attributed to carbohydrate metabolism (50%), nucleotide metabolism (25%), and amino-acid metabolism (20%), which are implicated in cardiovascular diseases 37 , cancer 38 , and obesity 39 . Likewise, many plants have myriad of primary/secondary metabolites related to these metabolic processes and these NPs mediate plant defenses against pathogens and stress through signaling and inducing production of "pathogenesis-related" proteins 40 .
By mapping NPs and drugs into functional modules, we further found correlations between pharmacological relevance of NP and drugs and functional associations between humans and other species. This analysis suggests that many NPs (based on CSMNA profiles) may be promising candidates for treating a variety of diseases. These data also confirm that CSMNA, as a systems-based approach, can dissect complex regulatory circuits that govern secondary metabolism responsible for producing specific bioactive substances.
In addition to providing information about pathways associated with the synthesis of NPs, CSMNA can be used to explore drug mechanisms of action and target proteins of associated human disease networks, offering a direct cross-species associated molecular network to identify compounds with the most novel target selectivity. The greatest benefit of this method is the targeting of specific NPs and search for molecular pathways to infer possible treatments for the human body as we elucidate NP activity and pharmacological functions.
Specially, we analyzed both the human Nrf2-ARE pathway and the plant HA cycle to test whether HA-related NPs could target the human Nrf2-ARE pathway. Biologically, the network topology architecture of the plant HA cycle is similar to the human Nrf2-ARE module; enzymes involved in the human Nrf2-ARE pathway share high homology and GO function similarities with plant HA cycle enzymes. Chemically, nearly two-thirds of stress-induced compounds (including both substrates and products) in the plant HA cycle also overlapped or had high structural and functional similarities with known human Nrf2-ARE pathway activators/inhibitors. In addition, and most importantly, 76% of HA-related NPs could target the enzymes involved in the human Nrf2-ARE pathway. Finally, evidence of target validation indicates that plant NPs such as sulforaphane, curcumin, epigallocatechin-3-gallate, resveratrol, cafestol, and kahweol can also target the human Nrf2-ARE pathway and may treat diseases. For example, highly expressed in the skin, tongue, and nose 41 , human Nrf2-ARE target short transient receptor potential channel 3 is also a target for plant-derived aromatic agents for skin sensitization such as eugenol 42 , which is produced by the plants HA cycle under UV stress.
The limitation of CSMNA in discovering new drugs is that it cannot capture all scenarios of drug actions, but rather chemicals with conserved biological functions between human and other species. It is true that many natural-derived drugs exert the similar function in both human and their original plants/microbes. For example, vincristine, an alkaloid produced by Catharanthus roseus plants in response to stress conditions such as salinity and UV-B light, have proven effective in the treatment of leukemia and lymphoma 43,44 . Nevertheless, there are a lot of other drugs whose mechanism actions do not follow this scenario. Therefore, the scenario-specific applicability of CSMNA is to discovering potential effective NPs from plants/microbes whose molecular networks have associations with humans' .
To sum it up, the CSMNA offers us a new way to discover effective NPs from certain living organisms. However, presently, the method is still limited due to data insufficiency. With more data available in the future, CSMNA will provide a new chance for synthetic biologist to manipulate and re-engineer the molecular network in plants/microbes for yielding novel NPs and pharmaceuticals for human health benefit. In addition, given the traditional Chinese medicine (TCM) as a great resource of bioactive NPs, CSMNA will also help us elucidate the essence of TCM biological effects from an evolutionary and ecological perspective and facilitate the modernization of TCM.

Methods
Mining and compiling functional modules. Data for 666 biological modules from humans, plants, fungi, and bacteria were downloaded from the KEGG MODULE database 21 . This biological module represents a tight functional unit of molecules that generally correspond to a specific function in the KEGG pathway map. There are four types of KEGG modules: pathway modules, structural complexes, functional sets, and signature modules. We only used data for pathway modules because the others did not include metabolic reactions. In addition, we collected from the literature pathway modules that are absent from the database. The final dataset comprises 13,109 metabolic modules from human and 267 other species. Each module is described by a combination of enzyme ortholog entries (identified by K number in the KEGG Orthology (KO) database) and a set of biochemical reaction entries (identified by R number in the KEGG REACTION database). For enzymes without species annotations in the KEGG database, we added species information from other databases or from the literature.

Chemico-biological association between modules of human and plants/microbes. To find
chemico-biological associations between modules of human and plants/microbes, we calculated the homological similarity between two metabolic pathway modules via the SUBMAP model 23 . Given two metabolic pathways P and P and an upper bound k on the size of the connected subnetworks, their homological similarity was calculated when the consistent mapping of the subnetworks of P and P has the maximum similarity. This process was then transformed to an eigenvalue problem. The solution to this eigenvalue problem produced a good integration of metabolic reaction and topological similarities of the subnetworks.
For the metabolic reaction similarity of subnetworks, we first construct three sets for both reaction sets (R i and R j ). The reaction sets consist of the input compounds (I i ), the output compounds (O i ), and the enzymes (E i ) of the reactions in each subnetwork R i . Then the similarity of reaction sets R i andR j were defined as The W(A, B, SimX) represent the similarity between two sets A and B with respect to the similarity score SimX (SimE or SimC), where W is calculated as the sum of the similarities of the pairs returned by their maximum weight bipartite matching. The SimE and SimC are two well-known measures reflecting information content similarity for enzyme pairs and SIMCOMP for compound pairs. The topological similarities of the subnetworks were calculated by following the intuition of IsoRank that if the subnetwork R i is mapped to R j , their neighbors in the corresponding pathways should also be similar. The topological similarity was then used to favor mappings of subnetworks that induce similar topologies. The neighborhood definition of reactions were first expanded to reaction subnetworks. Then, the notion of support was generalized to include subnetwork mappings.
Using these similarity values, a vertex weighted graph that connects conflicting mappings with an edge was constructed. Then, the alignment problem is transformed into finding the maximum weight independent set (MWIS) of this graph. A heuristic method was employed to solve the MWIS problem. The result of this method provided us an alignment that has no conflicting pair of mappings (i.e., consistent). Finally, to eliminate the effects of module size to the association score, we normalized association scores by dividing them by the geometric mean of scores obtained from aligning each module against itself.
Drugs acting on human modules. Chemical, pharmaceutical, and biological information of drugs from DrugBank 26 , KEGG 21 , PubChem 45 and therapeutic target database (TTD) was retrieved 27 . In total, 1,713 drugs and related targets were collected; and drugs were then mapped to human biological modules if the drug targeted the enzymes in the module.
Associating NPs with plant/microbe modules. To determine plant/microbe module-related NPs, we first constructed a link between plant/microbe modules and NP biosynthesis pathways based on pathway annotation in KEGG MODULE databases. We then assigned NPs in the biosynthesis pathway to the modules.
Chemical structure similarity between NP and drug sets. To determine whether NPs assigned to a specific module in plants and microbes were structurally similar to drugs targeting human modules, the weighted ensemble similarity approach was used to calculate the overall similarity between any two compound sets 28 . This Scientific RepoRts | 6:21872 | DOI: 10.1038/srep21872 technique judges similarities between two sets of ligands even though they share no identical ligand. The parallel approach was demonstrated to be highly accurate in drug targeting and repositioning 28,46 .
The similarity between NP sets and drug sets was first approximated by summing the pairwise molecule similarity scores across the ligand sets using CDK fingerprints. Given two sets of compounds, C = (c 1 ,c 2 ,…c n ) and C' = (c′ 1 ,c′ 2 ,…,c′ m ), we firstly define a raw similarity score between them rs(C,C′), which is first approximated by summing the pairwise molecule similarity scores across the ligand sets. i j has contribution to the similarity of two compound sets. However, rs(C,C′) may be subjected to the product of the compound set size (ps(C,C′) = n × m). To correct for these drawbacks, we then convert the raw score to a size-bias-free z-score using the mean and standard deviation of raw scores modeled from sets of random molecules.
Where μ(ps(C,C′)) and σ ( ( , ′)) ps C C represent the expected mean and standard deviation of 50 raw scores of random compound set pairs with the ps(C,C′) of product size.
The detail processes of Z score are as follows: (1) For one product size, construct 50 random compound set pairs and calculate the mean and standard deviation values of raw scores at different similarity thresholds (S cut ) where 0 ≤ S cut < 1 by step size 0.01; (2) Sample at least 5,000 of product size from the range 1 (1 × 1) to 9000(300 × 300), where 1 and 300 are the minimum and maximum set sizes; and repeat the step (1) 5000 times. Therefore, for each S cut , we can obtain 5000 of μ and σ at different product sizes; (3) For each S cut , plot all μ and σ vs. the set size of protein ligand and then the y μ = α 1 + β 1 and y σ = α r 2 + β 2 were applied to determine the equations of μ and σ , respectively; Using the fitted y μ and y σ , transform all random raw scores to the random z scores by formula; Construct a histogram of these z scores and fit the histogram to extreme value type I distributions (EVD). Specially, the threshold fits to an extreme value distribution, which forms the basis of a blast algorithm and sequence similarity (4) Choose the S cut , such that the histogram best fits an EVD based on goodness of fit. S cut equals 0.51 here.
Finally, we expressed the chemical similarity score between two sets as the probability of a given z-score being higher than that obtained from random data (P-value).

Anatomical therapeutic chemical (ATC) code prediction for NPs. The Anatomical Therapeutic
Chemical (ATC) Classification System established by the WHO is used for the classification of drugs. This pharmaceutical coding system categorizes drugs according to the organ or system on which they act and their therapeutic, pharmacological and chemical characteristics. To predict the anatomical therapeutic chemical (ATC) code for NPs, we applied a similarity-based predictor of ATC code database known as SPACE 47 , which was designed to predict drug-ATC class (ATC code) associations. SPACE uses a logistic regression framework to integrate multiple heterogeneous data sources, including chemical structures, target proteins, side-effects, drug-induced gene expression, and chemical-chemical associations, to construct the prediction model. It is suitable for ATC code prediction of new compounds with structural information only. For each compound, SPACE offers predicted candidate ATC codes with a score measuring the possibility of compound-ATC code associations. Compounds with ATC codes whose probability scores > 0.8 were selected.
Pharmacological similarity between NP and drug sets. We calculated the ATC similarity between the NP and drug sets to evaluate their pharmacological similarity. Considering the hierarchical structure of ATC codes, we calculated the similarity scores between NPs and drugs using the semantic similarity algorithm 48 . Specifically, the similarity between two ATC-codes was calculated as follows: where d (t i , t j ) characterizes the shortest distance between ATC codes t i and t j in the hierarchical structure of the ATC classification system. γ is a pre-defined parameter (set to be 0.25 in this study) 48 . The ATC similarity between two compounds is then calculated as: ATC similarity between any two compound sets is calculated using the weighted-ensemble similarity approach as described above. The similarity score of ATC codes with the same first class code was set as the threshold.
Associating human modules with diseases. To characterize module-disease associations, a comprehensive disease-associated gene dataset was built by collecting genes known to be associated with various diseases from DisGeNET 49 and CTD 50 databases. Disease and gene information was mapped to MeSH and Entrez Gene ID, respectively, for normalization. Then, normalized disease-gene interactions were integrated and overlapped interactions were manually deleted. Finally, we obtained 28,437 disease-gene interactions including 8,184 genes and 6,337 diseases. We next linked the disease-associated genes to the human module if that gene was a member of that module. Given the two sets of genes in a module and a disease, we counted the number of genes in each of them and the number of their overlapping genes. Then, we calculated the P-value for overrepresentation with Fisher's exact test and corrected the P-value by multiple testing. For modules related to diseases assigned to a specific disease category, we selected the minimum P-value to reflect the strength of the module-disease category relationship.
Cell line and cell culture. The human leukemic K562 cell line was obtained from Cancer Cell Repository (Shanghai Cell Bank, Shanghai, China). The K562 cells were cultured in RPMI-1640 (Gibco-BRL; cat no. 31800-022) containing 10% fetal bovine serum (FBS; Israeli; cat no. 1413865) and 100 U/ml each of penicillin and streptomycin. Cells were grown and maintained at 37 °C in a 5% CO 2 humidified atmosphere.
Determination of intracellular ROS, GSH. Cells were treated with the indicated chemicals for 2, 4, and 8 h, washed with PBS, and dealt with 20 mM H2DCFDA (Ex/Em = 488 nm/525 nm) or 5 mM CMFDA (Ex/Em = 492 nm/517 nm) at 37 °C for 30 min. After incubation, the cells were washed twice with PBS, then the relative fluorescence intensity was measured using flow cytometry. The ROS and GSH levels were calculated as the mean fluorescence intensity (MFI) per 1,000 non-necrotic cells.