Association study based on topological constraints of protein–protein interaction networks

The non-random interaction pattern of a protein–protein interaction network (PIN) is biologically informative, but its potentials have not been fully utilized in omics studies. Here, we propose a network-permutation-based association study (NetPAS) method that gauges the observed interactions between two sets of genes based on the comparison between permutation null models and the empirical networks. This enables NetPAS to evaluate relationships, constrained by network topology, between gene sets related to different phenotypes. We demonstrated the utility of NetPAS in 50 well-curated gene sets and comparison of association studies using Z-scores, modified Zʹ-scores, p-values and Jaccard indices. Using NetPAS, a weighted human disease network was generated from the association scores of 19 gene sets from OMIM. We also applied NetPAS in gene sets derived from gene ontology and pathway annotations and showed that NetPAS uncovered functional terms missed by DAVID and WebGestalt. Overall, we show that NetPAS can take topological constraints of molecular networks into account and offer new perspectives than existing methods.


Results
Association Z and Zʹ-score of two gene sets.As illustrated in Fig. 1, we can use the Z-scores to evaluate the over-or under-representation of interactions between two gene sets A and B-where Set A is a group of genes, e.g., genes associated with colorectal cancer, and Set B is another group of genes, e.g., genes associated with breast cancer.The gene IDs for both sets are obtained from OMIM 35 .The two sets share 3 genes.NetPAS first calculates the total number of edges (interactions) between set A and set B that appear in the original network-the human InWeb_IM PIN 8 used in the present work (Fig. 1b).Then by comparing with the numbers of edges from null network models (one example is in Fig. 1d), a Z-score is calculated (see "Methods").For interactions between both sets, 51 are observed in the PIN (Fig. 1c), compared to 25.4 ± 4.8 observed in 10,000 null network models (one example is in Fig. 1e), yielding an association Z-score of (51-25.4)/4.8= 5.3.In Fig. 1b, very few isolated interactions can be seen.In contrast, many isolated interactions can be seen in one example of a permuted null network model in Fig. 1d.The contrast suggests that genes with single interaction tend to interact with genes with more connections.Figure 1c illustrates the importance of topological constraints in association tests.Moreover, we tested the modified Zʹ-scores based on the interquartile range (IQR) from the null models Application of NetPAS in hallmark gene sets.We used 50 hallmark gene sets from the molecular signature database (MSigDB) 36 .These hallmark sets can be considered "refined" benchmarks on top of > 20,000 gene sets in MSigDB (version 7), which respectively represent well-defined biological processes with coherent expressions 37 .The names and details of these hallmark sets are listed in the Table S1 of the Supporting Information (SI).The gene names can also be found in Fig. 2a, the boxplots of Z-score distributions of all hallmark sets.We calculated association Z-scores, one-tailed p-values and Jaccard-indices (see "Methods") between all pairs of gene sets (including self-interactions).Figure 2b shows the heatmap of the association Z-scores calculated from all pairwise associations among the 50 hallmark gene sets using 10,000 MS02 null models compared with the original PPI.In this heatmap, positive Z-score (red) indicate over-representation, whereas negative Z-score (blue) indicates under-representation, respectively.
The Z-score approach (Eq. 2 in "Methods") has an implicit assumption that the interaction numbers from the null models follow the normal distribution.We found that most null distributions can pass the normality test, as shown in Figure S1a in SI.We also compared the modified Zʹ-scores (Eq.(3) in "Methods").For the hallmark gene sets, the estimated Z-scores and Zʹ-scores are highly correlated, and a comparison of the heatmaps derived from both Z-and Zʹ-scores is presented in Figure S1b.
We directly estimated one-tailed p-values for associations from the PPI using 10,000 MS02 null models.Heatmap of the p-values (−log 10 scale) are plotted in Fig. 2c and the p-value distribution is highly correlated with that of the Z-scores with a Pearson's correlation coefficient (PCC) of 0.794 (P < 2.2 × 10 -16 ).A comparison of the heatmaps based on the p-values and q-values (−log 10 scale) is shown in Figure S2 of the SI.
We also calculated the Jaccard-indices (see "Methods") between the pairs of gene sets with a heatmap shown in Fig. 2d.A general agreement was also observed between association Z-scores and Jaccard indices with PCC = 0.48 (P < 2.2 × 10 -16 ).
For comparison, we constructed networks to illustrate association patterns among the 50 hallmark sets using the association Z-scores, p-values, and Jaccard-indices, respectively.All networks use the gene sets as nodes and association scores as edge weights.Figure 2e-g show parts of all three networks, respectively.In the Z-score network (Fig. 2e) top 5% over-represented (red) interactions have Z larger than 11.8, and the top 5% under-represented (blue) interactions have Z smaller than − 5.8, respectively.The p-value network (Fig. 2f) shows 326 associations (for all 1,225 pairs of gene sets excluding the self-interactions) with p-value < 1 × 10 -4 (i.e., more interactions observed in the PPI than all null models), and in this network a uniform edge-weight is applied for these interactions.For comparisons, the Z-score network in Fig. 2e has 76 positive and 61 negative interactions, the p-value network (Fig. 2f) has 326 interactions, whereas the Jaccard network (Fig. 2g) has only 40 interactions, respectively.Note that all networks would possess more interactions by using looser cutoffs.For instance, a criterion of |Z|> 2 for the Z-score network would lead to 509 positive and 254 negative interactions; a cutoff of p-value < 1 × 10 -3 results in 427 interactions; and a cutoff of J > 0 for the Jaccard network-similar to a previous human disease network 38 in which two diseases are connected if they share at least one gene-would lead to 871 interactions, respectively.
Estimations of p-values are limited by the number of null models used.For 10,000 null models applied in present work, we cannot estimate a p-value smaller than 1 × 10 -4 , which is roughly equivalent to Z = 3.72 for a one-tail test under normal distributions.Therefore, based on limited number of null models, it is difficult to rank the interaction strengths that have low p-values to a graph, as such, in Fig. 2f p-value < 1 × 10 -4 interactions are visualized with a uniform weight.However, the Z-scores (Fig. 2e) spread a considerably wide range using a limited number of null models.In addition, we show that in a Z-score heatmap both enriched (red) and suppressed (blue) associations can be plotted.However, for the one-tail p-value analysis, only one of both associations can be addressed at a time, based on the null hypothesis used-such as enriched associations in Fig. 2c,f-despite both enriched and suppressed associations can be analyzed separately.Similar to using p-values, the Jaccard indices also cannot describe the under-representation information on how gene sets 'avoid' interacting with each other and is only informative on enrichment.Using Z-scores we can identify both enriched and suppressed interactions with relatively small number of null network models.In addition, the standard deviation of the Z-score are similar between using 10,000 and using 1,000 null models (see below in the discussion of random models).In this work, the empirical choice of the number of null models is set to 10,000 because it gives more accurate p-values.
Interestingly, for all the 326 hallmark-hallmark interactions with p-value < 1 × 10 -4 , their Z-scores are 9.59 ± 6.36 with a minimum of 3.78, which are equivalent to p-value < 1 × 10 -4 under a normal distribution onetail test.Moreover, 11 of these 326 interactions have Jaccard-index of 0: although these 11 pairs show positive interactions (p-value < 1 × 10 -4 and Z = 6.11 ± 1.97 with Z min = 4.24), no shared genes between each pair could be found.One example is for gene sets 6 and 24 (full names in Table S1 of SI) that have J = 0, p-value < 1 × 10 -4 , and Z = 10.3, which reflects a significant over-represented interaction number (344) in the PIN compared to null models (204.5 ± 13.5), as shown in Figure S3 in the SI.
A negative Z-score calculated by NetPAS reflects under-represented interactions between two gene sets.In the box plot of Z-scores between gene sets (self-interactions are excluded, Fig. 2a), some hallmark sets appear to have a negative mean Z-score and appear to have 'avoided' interactions to most of the other hallmark sets.For example, the hallmark set 12 (full name in Table S1 of SI) has a mean Z-score of − 3.6 for interactions with all other hallmark sets.Figure S3 in the SI shows interactions between set 12 and set 25 observed from the PIN and a representative null model.

Recommended cutoffs for application in practice based on background Z-scores.
To find out recommended Z-score cutoffs for application of NetPAS in practice, we constructed random gene sets with comparable sizes to MSigDB in Fig. 3a (Figure S2 of SI).The association Z-scores among these random sets are narrowly centered around zero (color bar in Fig. 3a).In contrast, association Z-scores of the 50 hallmark sets have a long-tailed distribution with a skewed-peak at the positive upbound (color bar in Fig. 2a).The association Z-scores between the random gene sets (Fig. 3a) are much less and looser than the hallmark gene sets.Moreover, as randomly constructed networks reflect the genetic background, distributions of the Z-scores among these random gene sets can be used to validate the cutoffs for quantifying associations of gene sets.For all 15,000 association Z-scores between random gene sets, 451 have Z > 2 and 160 have Z < − 2, corresponding to p-value = 0.030 and p-value = 0.011, respectively (Fig. 3b).Self-associations are excluded in Fig. 3b although they do not show noticeable differences to non-self-associations for the random sets.For the random gene sets, using |Z|> 2 as a cutoff we observed a limited number of enriched (red, 30) and suppressed (blue, 12) associations (Fig. 3c), which are much less than the hallmark gene sets.Similar trends are found in random networks of different sizes (Figure S4 of SI).
The association Z-score between two gene sets-say, set A and B-reflects how likely the genes in set A favor (Z > 0) or avoid (Z < 0) the interactions with those in set B, and vice versa.Note that for normal distributions |Z|> 2 is roughly equivalent to p < 0.023 from a one-tailed t-test; however, different cutoff in the Z-scores may lead to different interpretations.Nevertheless, we observed that compared to randomly constructed gene sets a cutoff of |Z|> 2 is appropriate for a single enrichment test (Fig. 3b).
To further understand how to interpret association Z-scores, we selected two randomly constructed gene sets, each comprised of 100 genes, that have no apparent association with Z = − 0.14.In this example the number of the bootstrap combinations is 200 100 = 910 58 .We did not sample all bootstraps.Instead, using 1,000 boot- straps, the Z-scores are distributed in − 0.47 ± 0.66, as shown in Fig. 3d.The ratio of |Z|> 2 is 0.016 for all bootstraps (i.e., p = 0.016 for a two-tail test).Therefore, we suggest that in practice, a cutoff of |Z|> 2 is appropriate, in line with the discussions of the random constructed gene sets as shown in Fig. 3b.
The above analysis indicates that the background association of gene sets has relatively small Z-scores and it is an appropriate practice to use a cutoff such as |Z|> 2 to infer an association between two gene sets.For multiple comparisons, we would recommend the use of false-discovery rates to control multiple statistical tests.

Constructing a weighted human disease network.
A previous work 38 analyzed more than a thousand of human disorders with associated genes maintained by OMIM 35 .This work produced the "human disease network" (HDN), assuming that two disorders are connected if they share at least one gene, i.e., the Jaccardindex > 0. It was shown that the genes associated with the same disorder have a tenfold increase of likelihood to interact with each other than those that are not associated 38 .
Here, we use NetPAS to estimate the association Z-scores of 19 descriptive entries from OMIM.These entries are associated with different disorders and contain at least 5 associated genes for each entry.These entries include 13 cancers, 3 mental disorders and 3 other disorders (Table S2 of the SI).Although there is no association between certain diseases, such as Alzheimer's and colorectal cancer shown in Fig. 3D, the associations between some diseases are significant.The Z-score heatmap and the resulting weighted human disease network (wHDN) are shown in Fig. 4.This wHDN has several isolated nodes, including esophageal cancer, renal cell carcinoma (RCC, a type of kidney cancer), pheochromocytoma (Pheoch, rare cancer related to the adrenal gland), Alzheimer's and Parkinson's diseases.Each isolated node contains 5-8 genes.However, some nodes with similar sizes are strongly associated to other diseases, such as ovarian cancer (6 genes), non-Hodgkin Lymphoma (NHL, 5 genes) and meningioma (6 genes).Therefore, the strength of associations between gene sets (disorders in this example) is not determined by the number of genes.Instead, the direct interactions between genes associated with the gene sets (disorders), and with comparisons to those observed in null network models, have contributed to determining the association strength of two gene sets.
The wHDN shown in Fig. 4b indicates that 10 out of 13 cancers (except three isolated cancers mentioned above) have strong associations with each other.The mental disorders Schizophrenia and Major Depression Disorder (MDD) are highly associated with certain cancers, whereas Alzheimer's is not.The Type-II Diabetes is also associated with cancer as well as Schizophrenia.Obesity is not directly associated with cancer but is associated with Type-II Diabetes.This result may be useful to the understanding of disease-disease relationships.In

GO and pathway enrichment analyses using NetPAS.
A GO term or a pathway functional term can be regarded as a gene set affiliated to this term.Because NetPAS can be used to estimate the association strength between any two gene sets, it is straightforward to extend the NetPAS approach to the GO and pathway enrichment analysis.For a given target gene set, its association Z-scores with all gene sets related to the GO/pathway functional terms can be separately calculated and ranked, from which the enriched or suppressed functional terms can be inferred.To demonstrate this utility, we performed the GO 16 term and KEGG 17 pathway enrichment analysis of the 50 hallmark gene sets (see above), and compared the results with those obtained by a traditional enrichment method DAVID 39 .In this analysis, the association Z-scores are calculated between the target gene set and the 18,033 gene sets derived from 17,715 GO terms and 318 KEGG pathways (see "Methods").All GO terms and KEGG pathways are then ranked to infer both enriched and suppressed functional terms.
The top 10 enriched terms by both NetPAS and DAVID for one example, HALLMARK_OXIDATIVE_PHOS-PHORYLATION, are shown in Fig. 5a.Consistency between the two methods can be seen: 9 out of 10 BP terms, 10 out of 10 CC terms, 8 out of 10 MF terms, and 8 out of 10 KEGG terms predicted by NetPAS are also predicted by DAVID.However, some functional terms detected by NetPAS are missed by DAVID and other enrichment tools, such as the BP term GO:0015990 ("electron transport coupled proton transport").In this example, the target hallmark gene set has 94 interactions with genes that carry the term GO:0015990, observed in the PIN.In contrast, there are only 4.7 ± 2.1 interactions from the 10,000 null models, leading to a large Z-score of 42.9 for this GO term.For all 50 hallmark sets and the top-10 enriched GO terms by NetPAS, 73.4% BP, 70.2% CC and 55.0% MF terms were verified by DAVID.For all functional terms suggested by NetPAS but missed by DAVID, the enrichment signals come from the fact that more interactions between the target set and the function annotation term have been observed in the PIN than random null models.Figure 5b shows the subnetworks for interactions between the hallmark set exemplified in Fig. 5a and the gene sets affiliated with the top-10 BP terms by NetPAS.
As a network-permutation-based approach, NetPAS is sensitive to the subnetwork configuration within the gene sets, including its global cluster coefficient, maximal cluster size, and maximal clique degree, summarized in Table S3 of SI.Consequently, NetPAS can yield substantial differences when the subnetwork under study is weakly connected.To illustrate the sensitivity of NetPAS to subnetwork topology, four synthetic gene sets SynGS-1a, SynGS-1b, SynGS-2a and SynGS-2b have been constructed based on the hallmark sets 20 and 28 (Figure S5 in the SI).Both SynGS-1a and SynGS-2a contain genes that are highly connected, whereas SynGS-1b and SynGS-2b contain genes that do not interact with other genes in these gene sets (Figure S5).For both SynGS-1a (Fig. S5c) and SynGS-2a (Fig. S5g), NetPAS showed more enriched BP terms than the original hallmark sets 20 (Fig. S5b) and 28 (Fig. S5f), respectively.In contrast, the number of enriched terms obtained by DAVID decreased.These contrasting changes support that NetPAS is more sensitive to the highly connected cliques than DAVID, and also suggest that DAVID is more sensitive to the sizes of input gene sets than NetPAS.Less shared GO terms are found for both SynGS-1b and SynGS-2b, which may be attributed to that these two synthetic sets contain lessconnected nodes than SynGS-1a and SynGS-2a.Interestingly, for SynGS-2a, both network-based tools NetPAS and WebGestalt uncovered more enriched terms (668:504 for NetPAS and 23:7 for WebGestalt, respectively) and there were more shared terms (10:1) among all three methods.For the hallmark set 28, NetPAS, DAVID and WebGestalt showed one shared BP term GO:0008015 ("blood circulation").For SynGS-2a, this BP term was not found by either DAVID or WebGestalt.However, it was scored Z = 14.049 by NetPAS.The difference between NetPAS and WebGestalt can be illustrated by SynGS-1b and 2b (Fig. S5d and S5h).Because NetPAS looks for interactions and WebGestalt looks for traversal paths, NetPAS can detected enriched BP terms through associated genes, but WebGestalt gave zero enriched terms.
Overall, these results show that NetPAS can serve as a useful complementary tool to DAVID and WebGestalt.

Discussion
No gene or protein functions alone 40 .The cellular functions can be regarded as being conducted by functional modules or communities 40 of genes/proteins in the interactome 25 .The concept of disease module 9,41 has also been proposed based on the fact that the genes associated with the same disease are more likely to interact with each other (the "local" and "disease module" hypotheses in 9 ).This principle can also be applied to other curated gene sets such as those in MSigDB 36,42 .Indeed, for the 50 hallmark sets (Fig. 2), the mean association Z-score excluding self-interactions (Fig. 2d) is 2.0.However, the self-interactions for all gene sets have a significantly higher mean Z-score of 17.8.This trend holds for the 19 diseases shown in Fig. 5.For the random sets shown in Fig. 3, however, the mean Z-score is − 0.04 and the mean self-association Z-score is − 0.20.Therefore, our results indicate over-presented interactions for genes in the curated data sets, such as those from MSigDB or related to diseases, in contradict with random chances.
A biological network such as PIN is scale-free with the degrees of all nodes following the power law.Because in the null models of present work all node degrees have been preserved, they have the same power-law distribution as those in the original PIN.In biological networks, low-degree nodes tend to connect to high-degree nodes, or hubs 13 .For example, there are 1,004 nodes in the PIN with the degree k = 1, i.e., each of them only has a single interaction.In the PIN, only two interacting pairs (CLEC2A:KLRF2 and REC114:MEI4) are formed by such nodes, constituting two isolated interacting pairs.However, for 10,000 null models, there are 113.3 ± 47.6 isolated pairs with a minimum value of 13. Figure 6 in the Methods shows the histograms of the number of isolated pairs in all 10,000 null models, and an example is shown in Fig. 1C.
A modified Zʹ-score approach using the interquartile ranges (IQRs) may also be useful to the association study (Figure S1 and Methods).We found that for the well-connected gene sets such as the hallmark sets shown in Fig. 2, both Z-and Zʹ-scores yielded quantitatively consistent results (Figure S1).We noticed that the Zʹ-score approach may encounter a numerical challenge, i.e., the IQR may return to zero when few interactions could be observed in the null models, which leads to an infinite Zʹ-score-even though the standard deviation is non-zero.For example, this kind of numerical errors often happens when the gene set related to a GO term only contains a handful of genes.
Several limitations of NetPAS need to be emphasized, however.The first limitation is the incompleteness of the resources including the interactomes, the coverage of genes in different gene sets, and gene annotations in GO or pathway knowledge databases.In addition, protein-protein interactions are dynamic 43 , and may vary significantly among different tissues or cell types 44,45 .These limitations may be addressed in future studies by the integration of tissue-specific or cell-type-specific interactomes to further our understanding of the biological significances of different gene sets.In summary, we show that NetPAS can quantify the association between two different gene sets by taking network constraints into account.We demonstrate the utility of using Z-scores in NetPAS compared to using p-values or Jaccard-scores.NetPAS is useful in classifications of gene sets, including those associated with different diseases.We also show that NetPAS can be applied in GO and pathway enrichment analysis, in which every single GO or pathway functional term is regarded as an affiliated gene set.The NetPAS approach can be applied to extrapolate the biological association between different gene sets such as potential relationships between various gene sets behind different phenotypes and diseases.NetPAS can also be applied in other types of networks to estimate the association strengths between network subsets.

Methods
MS02 null permutation of the PPI network.The permutation-based network null model is based on a work of Maslov and Sneppen in 2002 13 (hence named MS02 null model in present work).The human PIN used in the present work contains 592,685 edges spreading on 16,641 nodes.This PIN is considered as simple graphs, i.e., it is undirected and does not contain self-interactions (self-loops) or multi-interactions.
A network is regarded as a graph G = (V, E) with order of |V|= N, the vertices (or nodes) are where k(v i ) is the degree or edge numbers-also known as connectivity-associated with v i , i.e., it uses the same vertex set and degrees for all vertices are preserved as the original network.
It is worth noting that all MS02 null network models follow the power-law because the node degrees are the same as the observed PIN.However, there are significant topological differences between the null models and the PIN.For example, in the PIN there are only two isolated pairs (CLEC2A:KLRF2 and REC114:MEI4) connected by the nodes with degree k = 1.Here we define the two k = 1 nodes that interact with each other as an "isolated pair", because they are not connected to any other nodes in the network.However, for 10,000 null models, there are 113.3 ± 47.6 isolated pairs with a minimum value of 13 connected by the k = 1 nodes (Fig. 6).The abundance of isolated interactions in MS02 null models indicates that the power-law distribution of node degrees does not originate that the low-degree vertices tend to connect to the hub vertices, as suggested previously 11 .Instead, the low-degree vertices tend to interact with the high-degree vertices may be a unique feature of the natural networks such as the PINs, compared to MS02 null network models.We observed that the number of isolated pairs of the null models may not follow a normal distribution.Nevertheless, we did observe the normality of the interaction numbers between hallmark gene sets or between the random gene sets (see Figure S1a in the SI for an example).
Z-score, modified Zʹ-score, p-value, q-value and Jaccard indices.The Z-score calculation follows the original analysis based on MS02 models 13 : E pin is the edge number between two gene sets based on the PIN, and E null and sd null are the mean and stand- ard deviation of the edge numbers based on the MS02 null models (10,000 models are used in present work).A modified Zʹ-score is also used: (1) (2) Z = E pin − E null /sd null Figure 6.Topological differences between the original PIN and null models exemplified by the number of isolated pairs (interactions between k = 1 nodes) in the PIN with observation (red arrow) of 2 isolated pairs.However, the number of isolated pairs in 10,000 null models is distributed from 13 to 289 with a median value of 115 (blue histograms).For comparison also see Fig. 1b,c.The number of isolated pairs in the null models is significantly larger than that in the PIN (p < 1 × 10 -4 , for 10,000 null models).

Figure 1 .
Figure 1.An example of calculating association Z and Zʹ scores of two gene sets.The two gene sets are selected from OMIM for colorectal cancer (MIM entry: 114,500, set A) and breast cancer (MIM entry: 114,480, set B). (a) Venn diagram shows that Set A has 26 and Set B has 22 genes, respectively.There are three overlapping genes (AKT, PIK3CA, and TP53).(b) The human PIN.(c) 51 interactions between Set A and Set B observed from the PIN.(d) An example of the null network model.(e) 23 interactions between Set A and Set B observed from the null model shown in d.The interaction numbers from 10,000 null models are 25.4 ± 4.8 (mean ± sd), leading to a Z-score of 5.3; and the median (25) and IQR (6) values yield a Zʹ-score of 4.3.Hence there is an enriched association between both cancers.

Figure 2 .
Figure 2. Association studies of 50 hallmark gene sets from MSigDB.(a) The boxplots of association Z-scores of the 50 gene sets with others-self-interactions are not considered in this plot.The names of the hallmark gene sets are listed along with their serial numbers.Heatmaps of (b) association Z-scores, (c) p-values, and (d) Jaccard-indices of the 50 gene sets as illustrated by their serial numbers.The names of the gene sets can be found in the TableS1of the SI.Both enriched (red) and suppressed (blue) interactions can be revealed by Z-score.One-tail p-values (−log10 scale is used in the heatmap) are calculated for enriched interactions in the PIN compared to MS02 null models.When the observed interactions in the PIN are more than that in each of all 10,000 null models, we can only infer that P < 1 × 10 -4 instead of P = 0. We used P = 1e−5 (or −log 10 P = 5) for these situations.Networks of the hallmark gene sets have been generated for (e) Z-scores, the top 5% enriched (Z > 11.8, red) and top 5% suppressed (Z < − 5.8, blue) interactions are shown in the network; (f) p-values, only those of P < 1 × 10 -4 are have been shown, and (g) Jaccard-indices.

Figure 3 .
Figure 3. Association Z-scores of random gene sets and statistical validation of cutoffs.(a) Heat map of association Z-scores among 50 randomly constructed gene sets with gene numbers in the range of [15,200].(b) Histograms of association Z-scores calculated from five sets of random networks (see Methods and Figure S2 in the SI).Among all 15,000 Z-scores between random gene sets, 451 have Z > 2 (P = 0.030) and 160 have Z < − 2 (P = 0.011), respectively, indicating that using |Z|> 2 as the cutoff would be appropriate.(c) Network of the random gene sets weighted by the association Z-scores.A cutoff of |Z|> 2 is used for both enhanced (red) and suppressed (blue) interactions.(d) We then bootstrapped the affiliations of all genes from the two randomly constructed gene sets, each contains 100 genes.The association Z-score of the original pair is − 0.14 (red arrow) whereas the Z-score of the 1,000 bootstraps are − 0.47 ± 0.66.

Figure 4 .
Figure 4.A weighted human disease network (HDN) generated by NetPAS.(a) The association Z-score heatmap of 19 diseases, which include 13 cancers (grey), 3 mental disorders (orange) and three other diseases (red).(b) A weighted human disorder network constructed from the Z-score matrix indicates that most cancers highly interacted with each other.Some mental disorders including Schizophrenia and Major depression disorder (MDD) are associated with certain cancers, and Type II Diabetes (Diabetes2) is associated with ovarian cancer.Note that in this wHDN all interactions are positive, and no negative or suppressed interactions among these diseases (i.e., Z < − 2) have been observed.

Figure 5 .
Figure 5.An example of GO and pathway enrichment performed by NetPAS using the hallmark set HALLMARK_OXIDATIVE_PHOSPHORYLATION.(a) The Top ten enriched gene ontology terms, including biological process (BP), cellular component (CC) and molecular function (MF) (left), and enriched KEGG pathway terms (right) using NetPAS (left) and DAVID (right).The magnitude of enrichments is scaled by the colors from white (Z = 0) to red (Z = Z max ).The p-values estimated by DAVID were converted to Z-scores using a two-tailed normal distribution for coloring.(b) Interaction sub-network between the target gene (red nodes) and the genes affiliated with the top 10 biology process (BP) GO terms (blue nodes); the genes that are both affiliated with the functional term and belong to the target gene set are shown in yellow nodes.Formulas used for calculating the Z-scores for each BP term are written on top of each subnetwork.