A Modularity-Based Method Reveals Mixed Modules from Chemical-Gene Heterogeneous Network

For a multicomponent therapy, molecular network is essential to uncover its specific mode of action from a holistic perspective. The molecular system of a Traditional Chinese Medicine (TCM) formula can be represented by a 2-class heterogeneous network (2-HN), which typically includes chemical similarities, chemical-target interactions and gene interactions. An important premise of uncovering the molecular mechanism is to identify mixed modules from complex chemical-gene heterogeneous network of a TCM formula. We thus proposed a novel method (MixMod) based on mixed modularity to detect accurate mixed modules from 2-HNs. At first, we compared MixMod with Clauset-Newman-Moore algorithm (CNM), Markov Cluster algorithm (MCL), Infomap and Louvain on benchmark 2-HNs with known module structure. Results showed that MixMod was superior to other methods when 2-HNs had promiscuous module structure. Then these methods were tested on a real drug-target network, in which 88 disease clusters were regarded as real modules. MixMod could identify the most accurate mixed modules from the drug-target 2-HN (normalized mutual information 0.62 and classification accuracy 0.4524). In the end, MixMod was applied to the 2-HN of Buchang naoxintong capsule (BNC) and detected 49 mixed modules. By using enrichment analysis, we investigated five mixed modules that contained primary constituents of BNC intestinal absorption liquid. As a matter of fact, the findings of in vitro experiments using BNC intestinal absorption liquid were found to highly accord with previous analysis. Therefore, MixMod is an effective method to detect accurate mixed modules from chemical-gene heterogeneous networks and further uncover the molecular mechanism of multicomponent therapies, especially TCM formulae.

where n c is the number of modules in a given partition; module c is virtually separated into three submodules, c = (Ac) ∪ (Πc) ∪ (Bc), Ac is the submodule with links from E AA , Πc is from E AB and Bc is from E BB ; l Ac is the number of edges in submodule Ac, m A is the size of subnetwork G A , d Ac is the sum of degrees of all Ac nodes in G A ; k Πc is the sum of degrees of A nodes of Πc in subnetwork G Π , d Πc is the sum of degrees of B nodes of Πc in G Πc . With mixed modularity as the optimal function, the Louvain strategy is then adopted to search for an optimal partition. We name such a method MixMod for convenience. The procedure of MixMod method is illustrated in Figure 1. Most of MixMod is similar to the process of Louvain method [1], except a little change in the second phase. At first, each node in the 2-HN is assigned to individual modules. So, the initial partition consists of modules as many as the number of nodes. In the first phase, we calculate the gain of mixed modularity by removing node i from its module and by placing i in the module of its neighbor j. The node i is then placed in the module for which the gain is maximum and positive. If all gains are less than zero, we keep node i in its original module. Such process is performed repeatedly and sequentially for all nodes until no positive gain is achieved. In the second phase, a new network is constructed according to the partition in the first phase that reaches the local maxima of mixed modularity. If a module in previous partition includes nodes of both group A and group B, two differnt nodes are added into the new network to represent A nodes and B nodes in the module respectively. Thus these two nodes belong to a same module ( Figure 1). The weights of links in the new network are given by the sum of weights of links between two sets of nodes in the original network. It is obvious that this new network is also a 2-HN with nodes from A and B group. We use this new network as input and then repeat previous two phases until the mixed modularity is maximum. In the end, mixed modules could be identified corresponding to the final partition.

Benchmark evaluation
One important measure to evaluate the precision of benchmark tests is normalized mutual information (NMI). It quantifies the similarity between "real" and "found" modules from a given network. The normalized mutual information is calculated base on a confusion matrix, where the rows correspond to the "real" modules and the columns correspond to the "found" modules [2]. The NMI is defined as follows.
where the number of "real" modules is denoted c X and the number of "found" modules is denoted c Y ; N ij is the number of common nodes between the "real" module i and the "found" module j; N i· is the sum over row i and N ·j is the sum over column j. The NMI ranges from 0 to 1, and large NMI indicates a good partition similar to the "real" one.
Another measure to evaluate the performance of module detection methods is classification accuracy (CA). To calculate CA, we firstly find the best matching with the largest overlap between pairs of modules from "real" and "found" partition and then measure the faction of overlapping nodes over all nodes in a network [3,4]. The classification accuracy (CA) is defined as follows.
where n is the number of all nodes in a 2-HN; c X and c Y are the numbers of "real" modules and "found" modules respectively; f (·) is an one-to-one mapping between "real" and "found" partitions. The difficult part in calculating CA is to find the optimal matching f (·) with maximum overlaps. We use Kuhn-Munkres algorithm to solve this problem [5]. Thus, CA could be computed fast and quantify the similarity between detected modules and "real" ones.

Randomized drug-target network
Based on the drug-target network, we use a "self-contained" test to verify the hypothesis that disease clusters in the network partly correlates with its module structure. A "shuffle" process is performed on the drugtarget network to generate random 2-HNs with various module structure. Three subnetworks are shuffled independently and then combined into a random 2-HN. The procedure of "shuffle" operation is as follows.
For every link (u, v) in each subnetwork, we randomly choose a pair of nodes (u ′ , v ′ ) to replace (u, v), that is, to remove link (u, v) and add link (u ′ , v ′ ) with the same weight as (u, v). Thereinto, u ′ is randomly chosen from the neighbors of node v with a probability proportion to the weight of link (u ′ , v), and so is v ′ . Note that the weights of reduplicate links are added together. After the drug-target 2-HN is randomized, we evaluate the correlation between disease clusters and module structure of random 2-HNs by NMI and CA, because the module structure can be represented by detected modules using different methods mentioned above. Such process is repeated for 1000 times. We define P NMI as the fraction of random 2-HNs with larger NMIs than the original drug-target network with respect to disease clusters, and P CA as the fraction of random 2-HNs with larger CAs than the original. In previous test of four methods on real drug-target system, we made an assumption that disease clusters in the drug-target 2-HN correlated with its module structure to some extent. Here, we further verified this assumption using random test. Random 2-HNs were generated by shuffling links of the real drug-target network. NMI and CA were employed to estimate the correlation between true disease clusters and the module structure of random 2-HNs, since the module structure could be represented by detected modules using different methods. The random process was repeated for 1000 times. P NMI and P CA were p-values for this random test. We excluded CNM in random test since the implementation of CNM was unable to deal with link weights. P NMI of MCL and MixMod were smaller than 10 −3 . However, P NMI of Infomap and Louvain were not significant enough (larger than 0.05). On the other hand, P CA of four methods were all less than 0.05. Thus, the p-values were generally significant for the random test. As a consequence, we can conclude that disease clusters in the drug-target network partly correlated with its module structure (Table 1).

Correlation test
The detection of mixed modules from chemical-gene heterogeneous network was based on an assumption that chemicals with similar structures usually act on same group of genes, between which there are plenty of interactions. To verify this assumption, we investigated the correlation between chemical similarity and the number of target interactions. The structural similarity was calculated by the Tanimoto coefficient of fingerprints of two chemicals using OpenBabel toolkit [6]. The Tanimoto coefficient was defined as follows.
where c 1 and c 2 are two chemicals; a and b are the bit lengths of c 1 and c 2 fingerprints, respectively; and c is the number of common bits between c 1 and c 2 fingerprints. For a pair of chemicals, we then computed the number of interactions between targets of two chemicals. If two chemicals had identical targets, the interaction number was set to 2 for each identical target. For example, chemical c 1 had target TNF and c 2 also had target TNF, we specified that there were 2 interactions between TNF of c 1 and TNF of c 2 . Since different chemicals always regulated different number of targets, we normalized the interaction number by dividing the geometric mean of target numbers of two chemicals. Thus, the number of interactions between target sets of chemical c 1 and c 2 was defined as follows.
where T 1 is the target set of c 1 , t i is one of its targets; there is an interaction between t i and t j , otherwise 0; |T 1 | is the number of targets in set T 1 . For each pair of chemicals, we could compute the structural similarity and the number of interactions between them. We performed correlation test using this data to investigate whether similar chemicals act on same group of genes. We used two examples to verify previous assumption. From real drug-target 2-HN, 277 drugs and their targets were selected for correlation test. The interactions between targets were curated from HPRD, Bi-oGRID, and IntAct database. Statistic analysis indicated that there were more interactions between targets of two drugs if they had more similar structures (Spearman ρ=0.117, p-value<2.2e-16). Besides, we collected 95 chemicals and their associated genes from the 2-HN of BNC for correlation test. All interactions were extracted from HPRD, BioGRID and IntAct database. We could draw a same conclusion as previously stated (Spearman ρ=0.136, p-value<2.2e-16). Therefore, the assumption that chemicals with similar structures usually act on same group of genes was correct according to statistical analysis.

Gene network of BNC
Although the predicted results derived from the BNC 2-HN were generally proved by in vitro experiments, we performed more tests to show the advantage of using the 2-HN of BNC. At first, the whole set of gene targets associated with chemicals of BNC were employed to discover enriched GO terms using DAVID tool [7]. Top 40 enriched entries were listed in Table 2. Most of these GO terms were related to signal pathways concerning anti-apoptosis and oxidative stress. However, in vitro experiments using BNC intestinal absorption liquid showed that BNC could protect H9c2 cardiomyocytes by enhancing antioxidative ability, activating ERK1/2 signaling pathways, inhibiting signal transduction pathways related to apoptosis and increasing mitochondrial membrane potential [8]. We found that no terms concerning mitochondrial membrane permeability were ranked in top 40 (Table 2). It implied that the prediction using all gene targets of BNC was partially accurate and important information was omitted. In fact, the term GO:0051881 (regulation of mitochondrial membrane potential) were ranked 313 in the result of enrichment analysis.
On the other hand, we used gene network of BNC to investigate potential functions of the constituents of BNC. The gene network was composed of 981 gene targets of BNC and 3612 interactions between them. Since all nodes of this gene network belonged to single class, MixMod was not applicable to this situation. Hence, we employed MCL to identify significant gene modules. As a result, 130 gene modules were detected by MCL. The DAVID tool was then employed to analyze the enriched GO terms for modules with no less than 10 genes. After enrichment analysis, we could discover the potential functions of each chemical of BNC based on the overlapping between its gene targets and detected modules. A chemical was thought to regulate a gene module if most of its targets were found in the module. Thus, it may achieve certain functions by mediating the biological processes in which its targets participated. Since eight bioactive chemicals were identified in BNC intestinal absorption liquid, predicted results with regard to these chemicals were presented in Table 3.