Optimizing miRNA-module diagnostic biomarkers of gastric carcinoma via integrated network analysis

Several microRNAs (miRNAs) have been suggested as novel biomarkers for diagnosing gastric cancer (GC) at an early stage, but the single-marker strategy may ignore the co-regulatory relationships and lead to low diagnostic specificity. Thus, multi-target modular diagnostic biomarkers are urgently needed. In this study, a Zsummary and NetSVM-based method was used to identify GC-related hub miRNAs and activated modules from clinical miRNA co-expression networks. The NetSVM-based sub-network consisting of the top 20 hub miRNAs reached a high sensitivity and specificity of 0.94 and 0.82. The Zsummary algorithm identified an activated module (miR-486, miR-451, miR-185, and miR-600) which might serve as diagnostic biomarker of GC. Three members of this module were previously suggested as biomarkers of GC and its 24 target genes were significantly enriched in pathways directly related to cancer. The weighted diagnostic ROC AUC of this module was 0.838, and an optimized module unit (miR-451 and miR-185) obtained a higher value of 0.904, both of which were higher than that of individual miRNAs. These hub miRNAs and module have the potential to become robust biomarkers for early diagnosis of GC with further validations. Moreover, such modular analysis may offer valuable insights into multi-target approaches to cancer diagnosis and treatment.


Introduction
Gastric cancer (GC) is one of the most common malignancies and the second cause of cancer mortality worldwide, posing a major risk to public health [1][2]. Due to the difficulties in early diagnosis, the overall five-year survival rate after surgery is only 20%-30% in advanced-stage GC patients [3]. Although chemotherapy may improve the survival rate of GC patients after surgery, the efficacy of chemotherapy is limited by metastasis and drug resistance [4][5][6]. Several novel chemotherapeutic and molecular targeted agents, including irinotecan (CPT-11), taxanes, oxaliplatin, trastuzumab, sunitinib, and bevacizuma, have been used to improve the outcomes of GC patients, but the prognosis of advanced or recurrent GC still remains unsatisfactory [7]. Therefore, it is crucial to identify novel biomarkers for diagnosing GC at an early stage.
As with many other complex diseases, the occurrence of GC also owes to the disturbance of multiple genes at the global molecular network level [8][9]. The current single-target diagnostic or therapeutic strategies may ignore the interactions between several molecular targets and lead to a low efficacy. Therefore, multi-target modular network research may contribute significantly to more rational and effective diagnostic and therapeutic strategies [10]. A lot of network-based studies have attempted to find module biomarkers or targets of GC at gene expression, mRNA transcription or microRNA (miRNA) levels [11][12], but how to quantificationally identify GCrelated activated modules as early diagnostic biomarkers still remains challenging.
MiRNAs are small non-coding RNAs of 18-24 nucleotides that regulate gene expression by mediating mRNA degradation or repression of mRNA translation [13]. It has been shown that miRNAs can serve as biomarkers, drug targets, or tumor suppressors by regulating the expression of cancer-related genes [14]. Many miRNAs have been found to be related with GC [15][16], and miRNAs may represent the bridge between Hp-related gastritis and GC development [17]. Although GC-related miRNA networks have been analyzed [18], there is still a lack of diagnostic miRNA module biomarkers for GC.
In this study, based on miRNA co-expression networks, Z summary and Support Vector Machine (SVM) algorithms were used to systematically identify the hub miRNAs and activated modules that could potentially be used as GC biomarkers.

Normal and GC miRNA expression datasets
The miRNA expression data of GC were obtained from Gene Expression Omnibus database (GEO) with accession number of GSE7390, which were originated from a study by Kim et al [19]. The custom-designed Agilent microarray contained 1,667 unique miRNA sequences across all species and each probe had 4 replicates. In this study, human-related unique miR-NAs were selected. The miRNA expression data of 90 GC patients were used as the reference group (GC group, S1 Table), another replicated datasets of the same samples were used as the test group (Rep group, S2 Table), and the miRNA expression data from 34 healthy volunteers were considered as the normal control group (Norm group, S3 Table).

Detection of miRNA co-expression modules
The construction of miRNA co-expression networks was implemented in the WGCNA R package [20]. A matrix of pairwise correlations was constructed between all pairs of miRNA probes by using appropriate soft-thresholding. And then, topological overlap measure (TOM) and Dynamic Hybrid Tree Cut algorithm [21] were used to detect the miRNA co-expression modules, and each module was assigned a color. Appropriate soft-thresholds were selected for each dataset when the network met the best scale-free topology criterion, and the minimum module size was set at 3.

Module-based comparison among different groups
In order to compare the miRNA co-expression patterns at network module level, we used the module-based consensus ratio (MCR) to illustrate the module similarities across different groups [22]. The MCR was defined as the ratio of significantly overlapped module pairs to all the module pairs between two groups. The significantly overlapped module pairs were determined based on a Fisher's exact test (p < 0.05). The MCR is defined by Eq 1, where NM represents the number of modules, a and b represent two different groups, and NM overlap is the number of module pairs with a p < 0.05.

Identification of miRNA module biomarkers
We adopted a Z summary statistic to identify the preserved and activated modules related to GC. The Z summary value is composed of 4 statistics related to density and 3 statistics related to connectivity, which can quantitatively assess whether the co-expression patterns of a specific module in the reference group is preserved or disrupted in the test group. The equation of Z summary is listed below Eq 2, where a Z summary ! 2 indicates preservation or reproducibility, while a negative Z summary value indicates disruption or variation [23]. Compared with the Norm group, modules with a negative Z summary value in the GC group would be considered activated modules, which might potentially be used as the miRNA module biomarkers of GC.

SVM-based identification of core miRNAs and sub-network
In order to identify the core miRNAs and sub-network as potential GC biomarkers, the NETwork constrained Support Vector Machines (NetSVM) Cytoscape App was used in our study [24]. As an extension of the conventional SVM, NetSVM exploits the decision hyperplane to predict the classification of genes, which can be used to identify biologically meaningful network biomarkers from network and gene expression data. Along with the core nodes and subnetwork, NetSVM also reports the sensitivity, specificity, ROC curve and AUC values for the classification. In our study, the GC miRNA expression data and network data were entered into the NetSVM Cytoscape App, and the top miRNAs and sub-network with higher weights were identified.

Evaluation of the diagnostic performance of module biomarkers
Based on the expression level of the miRNAs in the GC and Norm groups, the area under the ROC curve (AUC) was used to evaluate the diagnostic performance of the miRNA module biomarkers. All analyses were performed using SPSS 19.0 software. For the modular biomarkers, the original single miRNAs' expression value were combined, and the mean value was used for the ROC curve analysis. The NetSVM-based weights were taken into account when calculating the mean expression value of miRNA modules. The weighted value was used as the summary score for the diagnostic tests. Optimal cut-off value was set as the threshold with the highest Youden's index (Sensitivity + Specificity -1).

Prediction of the target genes of miRNA modules
After the identification of GC activated miRNA modules, we predicted their target genes based on three independent databases, i.e. Targetscan Human 7.1 (http://www.targetscan.org/), miRDB (http://www.mirdb.org/miRDB/), and miRTarBase (http://mirtarbase.mbc.nctu.edu. tw/). The GC activated miRNA modules were loaded into the three databases and the background species was selected as "human". Genes that were predicted by all of the three databases were considered as target genes for further analysis.

Functional annotation of miRNA modules
To characterize the functions of activated miRNA modules and further investigate the mechanisms underlying the development of GC, we performed GO and KEGG pathway enrichment analysis by using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [25]. For each module, an over-representation of a functionally relevant annotation was defined by a modified Fisher's exact p-value with an adjustment for multiple tests using the Benjamini method. GO terms and pathways with a p < 0.05 were considered significant.

GC miRNA co-expression network and modules
As described in the Methods section, the WGCNA analysis was performed to construct the GC miRNA co-expression network and detect modules. Based on the scale-free topology model computation, the soft threshold power was set at β, of which was appropriate for the network construction (S1 Fig). The hierarchical clustering procedure found 16 miRNA coexpression modules, and each module corresponded to a branch of the resulting clustering tree and was labeled a unique color (Fig 1A). The network hotmap of all miRNAs is provided in S2 Fig. The detailed information about the miRNA module membership labeled by colors and numbers can be found in S1 Table. The average size (the number of miRNAs) of GC modules was 20, ranging from 3 to 120.

Module similarities
To investigate the miRNA co-expression patterns of GC, Rep and Norm groups at network module level, we compared their module similarities based on MCR. There were 20 and 33 modules in the Rep and Norm groups, respectively. Compared with the Rep and Norm groups, the consistencies in miRNA composition and module pairs with a certain number of overlapping miRNAs in the GC group are presented in Fig 1B and 1C. It showed that the number of GC-Rep overlapping module pairs was much larger than that of GC-Norm overlapping module pairs. Almost all the miRNA modules in the GC group overlapped with those in the Rep group, particularly Mod_8 completely overlapped with Mod_rep_6, reflecting the reproducibility of modules. The MCRs for pairwise comparisons were 5.0% (GC vs. Rep) and 2.8% (GC vs. Norm), respectively, indicating significant variation of the miRNA co-expression pattern in GC patients.

Z summary -based module preservation
In order to assess whether the miRNA modules in the GC group were preserved, the Z summary values of each module were calculated with Rep as the test group. As mentioned above, a Z summary ! 2 indicates module preservation, and a Z summary ! 10 represents strong preservation [23] or module reproducibility. Fig 2A lists

Z summary -based module biomarker
Compared with the Norm group, disruption or rewiring of the miRNA co-expression modules in the GC group might reflect the pathogenesis of GC. We still used Z summary to assess the disruption of miRNA modules. Modules with a negative Z summary were considered as the activated modules of GC. Fig 2B lists (Fig 3D) was chosen as the activated module of GC v.s. Norm, which was viewed as a potential miRNA module biomarker for GC. Modules with major variations of the Z summary values when the GC group was compared with the Norm group or Rep group are also visualized in Fig 3.

NetSVM-based core miRNAs and sub-network
Based on the miRNA expression levels and network interactions, the NetSVM [24] was used to identify the core miRNAs and sub-network as the potential GC biomarkers. Compared with the Norm group, the weights of all the miRNAs in networks were obtained (S4 Table). The top 20 up-or down-regulated miRNAs and their attributed modules are listed in Table 1.
Among the top 40 miRNAs, MiR-424 and MiR-146a had the largest weights in the up-and down-regulated miRNAs, respectively; and 70% of these miRNAs belonged to a specific module. The members of the biomarker module (Mod_14), miR-486 and miR-451, ranked 14 and 20 of the up-regulated miRNAs, respectively (Fig 3). Taken the up-and down-regulated miRNAs together, a sub-network of the top 20 miRNAs (13 up-regulated and 7 down-regulated) was constructed (Fig 4), which had a high sensitivity and specificity of 0.94 and 0.82, respectively.

The diagnostic performance of the GC module biomarker
The diagnostic performance of the biomarker Mod_14 and its miRNA members were evaluated based on the ROC AUC value (Fig 5A, S5 Table). The AUC value of the biomarker Mod_14 was 0.838, with optimal specificity of 0.941 and sensitivity of 0.689 (Youden's index = 0.63) at the cut-off value of 0.241, which was much higher than that of any individual miRNAs. Besides, we also evaluated the diagnostic performance of the combinations of certain Mod_14 labeled with a color below the dendrogram. B. Similarity of modules between GC and Rep groups; each row of the  (Fig 5B, S5 Table). We found that the combination of miR-451 and miR-185 reached an even higher AUC value of 0.904 (Youden's index = 0.782) at the cut-off value of 0.26, with a sensitivity of 0.811 and a specificity of 0.971.

Target genes and biological functions of the miRNA module biomarker
The target genes of the biomarker Mod_14 were predicted based on three databases. As mentioned in the Methods section, only those that were predicted by all of the three databases were considered as target genes. A total of 24 target genes of Mod_14 were obtained, of which 4, 1, 6 and 13 genes belonged to miR-486, miR-151, miR-600 and miR-185, respectively. The network of Mod_14 miRNAs and their target genes are shown in Fig 6. To characterize the biological functions of the identified biomarker module Mod_14, we obtained its GO functions and pathways through the target gene enrichment analysis. The Identification of miRNA-module diagnostic biomarkers for gastric carcinoma  significantly enriched GO terms and pathways with their p values are listed in Fig 6. As shown in Fig 6, the most significantly enriched pathway and Go term were focal adhesion and cytoplasm, and pathways directly related to cancer were also found, such as the viral carcinogenesis pathway.

Discussion
There is increasing evidence that the occurrence of complex diseases, including cancers, may be attributed to the perturbation of complex molecular networks, and both diseases and drug actions have a modular basis [26][27]. Module-based methods have promoted the discovery of cancer biomarkers and drug targets [28][29]. In this study, with the use of a Z summary -based method, a GC miRNA module biomarker was identified from the miRNA co-expression network, and the hub miRNAs and sub-network related to GC patients were also determined by the NetSVM algorithm. The target genes of the GC miRNA module biomarker and its functions were predicted based on multiple databases. By way of the modular analysis of GC miRNA and their targeted genes, we may light the further insight of GC diagnostics and therapy at multi-target level.
Within the identified miRNA module biomarker Mod_14 (consisting of miR-486, miR-451, miR-185, and miR-600), three of its miRNA members were over-expressed in GC patients [19], and thus this module was an up-regulated module for GC. Previous miRNA microarray studies revealed that miR-486 could regulate tumor progression and the OLFM4 antiapoptotic factor in GC [30]. Several studies indicated that miR-486 and miR-451 might act as novel prognostic biomarkers and potential therapeutic targets in human GC [31][32]. Moreover, a prior study reported that miR-451 and miR-486 showed consistently elevated levels in the plasma of GC patients as compared with controls [33]. As for miR-185, it has been shown that it may suppress tumor metastasis, regulate chemotherapeutic sensitivity, and serve as an independent prognostic factor for GC [34][35]. Therefore, three out of the four miRNA nodes in this module have been reported to have close relationship with GC, and these findings were largely consistent with our results. In terms of the biological functions, pathways and GO terms directly related to cancer were significantly enriched in this module, such as the viral carcinogenesis pathway.
In addition to the activated module biomarker, the hub miRNAs and sub-network for GC were also determined based on the NetSVM algorithm. The activated module nodes miR-486 and miR-451 were both included in the list of the top 20 up-regulated miRNAs. MiR-424 which had the largest weight in the up-regulated miRNAs was reported to promote the proliferation of gastric cancer by targeting Smad3 via TGF-u signaling pathway [36]. MiR-146a that had the largest weight in the down-regulated miRNAs may acts as a metastasis suppressor in GC by targeting WASF2 [37]. Besides, most of the top ranked miRNAs were members of a specific module, indicating that these miRNAs not only have great weights for GC, but also have close interactions with one another. Thus, this integrated network analysis may be effective in screening cancer-related miRNAs or sub-networks.
In conclusion, based on integrated network analysis, this study identified an activated miRNA module biomarker of GC, which was composed of miR-486, miR-451, miR-185, and miR-600. The hub miRNAs and sub-network of GC were also detected by a NetSVM-based method. These identified hub miRNAs and modules have the potential to become robust biomarkers for the diagnosis of early stage GC, although further validations are still required. Moreover, such a modular analysis of miRNA networks may offer valuable insights into multitarget approaches to cancer diagnosis and treatment.