Constructing module maps for integrated analysis of heterogeneous biological networks

Improved methods for integrated analysis of heterogeneous large-scale omic data are direly needed. Here, we take a network-based approach to this challenge. Given two networks, representing different types of gene interactions, we construct a map of linked modules, where modules are genes strongly connected in the first network and links represent strong inter-module connections in the second. We develop novel algorithms that considerably outperform prior art on simulated and real data from three distinct domains. First, by analyzing protein–protein interactions and negative genetic interactions in yeast, we discover epistatic relations among protein complexes. Second, we analyze protein–protein interactions and DNA damage-specific positive genetic interactions in yeast and reveal functional rewiring among protein complexes, suggesting novel mechanisms of DNA damage response. Finally, using transcriptomes of non–small-cell lung cancer patients, we analyze networks of global co-expression and disease-dependent differential co-expression and identify a sharp drop in correlation between two modules of immune activation processes, with possible microRNA control. Our study demonstrates that module maps are a powerful tool for deeper analysis of heterogeneous high-throughput omic data.

graphs contain an embedded module-map of 20 modules in a tree structure. In addition, random cliques and bicliques are embedded in the graphs. Module, clique, and biclique size is chosen uniformly at random between 10 and 20. Each edge is replaced by a non-edge with probability p, and vice versa. The results show that using k=5 gives better performance than k<5, and that k>5 does not improve performance. Running times are very similar for k>3. Based on these results, since we expect biological data to contain both large and small modules, we concluded that using k=5 gives a good balance of quality and considering small modules, and used it in subsequent analyses.

Initiators
We tested five different initiators. Three are based on previous methods and two are novel. The three extant initiators contained the DICER algorithm (23) and two clustering algorithms. We developed two additional initiators. The first is a modification of the DICER initiator, and is called DICER k . The second utilizes an exhaustive solver for the maximal biclique problem (24,25) and is called MBC-DICER.

The DICER k initiator
The DICER initiator (23) starts from a positive edge (u,v) in G, and defines two node sets (U,V), where U is the set of (high weight) neighbors of u in H, and V is the set of neighbors of v in H.
The goal is to remove nodes from U and V such that the resulting sets will constitute heavy subgraphs of H and the weight of edges between U and V will be high in G. A simple example is shown in Supplementary Figure 1. Nodes that appear both in U and V are removed. In the next step, nodes in U and V are removed if this improves the score of the module map link in G or the module scores in H.
DICER works greedily, by iteratively removing a "bad" node, that is, a node that either has a negative sum of edge weights in H with its own group, or has a negative sum of weights in G with the other group. The total score of a node is the sum of the two scores. Nodes for which both G and H scores are negative are removed first, followed by other bad nodes, sorted by their total score. The process ends when there are no bad nodes. The resulting node sets U' and V' are accepted as modules only if each of them contains at least k nodes. In that case the nodes of U' and V' are removed from the graphs, and the process is repeated until no new module pair is found. In the original DICER algorithm we used k=2. Here we used k=5, which provided better results on real and simulated data (see Results).

The MBC-DICER initiator
We now describe an alternative method for constructing initial node sets U and V. Define an unweighted graph G'= (V,E') with the same node set as G, and edge (u,v) E' if and only if W G (u,v)>0. Two disjoint node sets (U,V) are called fully connected or a biclique in G' if every u U is connected to every v V. A biclique (U,V) is maximal if it is not a proper subset of another biclique. We search for maximal bicliques in G' using an exhaustive solver (24,25), restricting the search to maximal bicliques (U,V) such that |U| k and |V| k. Each such pair (U,V) is then subjected to the node removal procedure.
Since the number of maximal bicliques can be exponential (24,25) we use only the first 50,000 discovered bicliques as candidates for the node removal stage of the DICER k algorithm. Let S be a heap that contains the current set of candidate bicliques. We select the biclique (U,V) in S of maximal size |U|+|V| as the next candidate. The node removal stage produces from (U,V) a module pair (U',V'). If the latter is accepted, we remove the nodes of U' and V' from G' and from all bicliques in S, and remove bicliques whose new size is less than 2k. When S is empty we try to run the solver again. If the solver fails to find additional bicliques then the process is terminated.

Clustering algorithms
We included in our tests two clustering algorithms. Both look for clusters in H and disregard information from G. The first is complete-linkage hierarchical clustering (26). The second, which we call NodeAddition, starts with all nodes as modules, and repeatedly adds a singleton (a module with a single node) to a module if the sum of edge weights between them is the largest among all singleton-module pairs. This process is repeated until no singletons remain or until the best sum is negative.

Proof of the guaranteed improvement during the iterations of the global improver
Notations: The input to the problem is a pair of networks H=(V,E,W H ) and G=(V,F,W G ) defined on the same set of vertices. These networks can be weighted or un-weighted. The goal is to find a module-map that summarizes both networks. While the asymptotic worst-case running time of this procedure is similar to performing a single merge at a time, we discovered that in practice many merge steps are performed per iteration. For example, in the lung cancer differential correlation analysis the maximal number of merges per iteration was 20, and the average was 4.

Comparison of scores for module links in the global improver
Our global improver used a statistical score to determine if two modules are linked.

Comparison to other weighted approaches
In our analysis in the main text we used un-weighted PPI and GI networks and included algorithms that are akin to previous methods (1)(2)(3)(4). Other extant methods make use of the probabilistic scores of each GI edge, and incorporate both positive and negative GIs (5,6).
Leiserson et al. (6,7) developed a method called Genecentric, which looks for locally maximum cuts in the GI graph. On the data of Collins et al (8), this method was reported to outperform other methods, including algorithms that integrate GI and PPI information (9,10). We compared the performance of our methods to Genecentric and the graph compression method of Kelley and Kingsford (5)

Differential correlation cross-validation analysis
Our tests on human data utilized expression profiles of lung cancer and Alzheimer's disease and matching controls in each dataset. The first tested dataset, GSE13255 (11), contained 256 peripheral blood mononuclear cells gene expression profiles of patients with non-small cell lung cancer (NSCLC, n=150) and controls (n=106). The second tested dataset, GSE15222 (12), contained 363 post mortem cortex gene expression profiles of Alzheimer's disease (AD) patients (n=176) and controls (n=187). Since the networks used in this analysis were completely different from these used in the yeast studies, we first re-evaluated the different algorithms on them, based on the ability to reveal major changes in co-expression between sick and healthy individuals.
We used the method of Amar et al. (4) to compute two log-likelihood ratio scores for each gene pair: the consistent correlation (CC) score is positive if the gene pair is consistently correlated across phenotypes, and the differential correlation (DC) score is positive if the correlation difference between the cases and controls is higher than expected by chance. These scores were then used as edge weights in networks H and G, respectively, on which a module map was learned.
Given a module map constructed on a set of profiles (the training set) and a disjoint set of samples (the test set), the quality of the map prediction was evaluated on the test set as follows.
For each pair of modules we calculated the absolute average DC between the modules on the test set data, and compared the DC values for links and non-links (i.e., two modules in the map that are not linked) using the Wilcoxon rank-sum test, where the null hypothesis is that there is no difference in DC between links and non-links. This measure is parameter-free and reflects all DC changes. As an additional test, in order to focus on major DC changes, we ignored all links with DC < 0.4, removed unlinked modules and calculated the proportion and number of remaining modules, links and the gene coverage. These parameters reflect the overall predictive quality of each reported map, and its ability to find strong DC signals. We used 2-fold cross-validation, that is, half of the data served as the training set, and the other half served as the test set. The process was repeated with the roles of test and training set switched and results were averaged.
An important parameter in calculating the DC LLR scores is the prior probability of real DC changes. In (4) a parameter K controlled this prior probability. Given a value of K, the prior probability was set such that only DC scores that are distant from the mean of the random distribution by at least K standard deviations (of the random distribution) will get a positive LLR score. Informally, this process guarantees that if the difference between the real and random distributions is minor, all LLR scores will be negative. In (4) a stringent approach was taken and the K parameter was set to 2. In this study we take a different, more direct approach to set the prior probability, using the following simple procedure: given a fixed threshold η>0 we set the prior to the maximal probability for which the LLR of η is negative. The intuition is that only DC of at least η receives a positive LLR score. Thus, unlike the K parameter, our approach is easily interpretable: we are guaranteed that absolute correlation changes lower than η will be assigned a non-positive LLR score. We used η= 0.4, which was equivalent to K2.3 on the tested datasets.
Thus, our criterion was even slightly more conservative than (4).
An important parameter of the global improver is α, which is used to determine if the link between two modules is significant. The full cross-validation results for α=1E-6 are shown in Supplementary Table 11 (NSCLC   data) and Supplementary Table 12 (AD data). The maps produced by the local improver received a very low p-value in the Wilcoxon rank-sum test between DC of map links and nonlinks, but suffered from low coverage. For example, for the MBC-DICER initiator, the local improver achieved a p-value of 4.43E-4 in the NSCLC data, and 3.31E-11 in the AD data.