WBNPMD: weighted bipartite network projection for microRNA-disease association prediction

Background Recently, numerous biological experiments have indicated that microRNAs (miRNAs) play critical roles in exploring the pathogenesis of various human diseases. Since traditional experimental methods for miRNA-disease associations detection are costly and time-consuming, it becomes urgent to design efficient and robust computational techniques for identifying undiscovered interactions. Methods In this paper, we proposed a computation framework named weighted bipartite network projection for miRNA-disease association prediction (WBNPMD). In this method, transfer weights were constructed by combining the known miRNA and disease similarities, and the initial information was properly configured. Then the two-step bipartite network algorithm was implemented to infer potential miRNA-disease associations. Results The proposed WBNPMD was applied to the known miRNA-disease association data, and leave-one-out cross-validation (LOOCV) and fivefold cross-validation were implemented to evaluate the performance of WBNPMD. As a result, our method achieved the AUCs of 0.9321 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.9173 \pm 0.0005$$\end{document}0.9173±0.0005 in LOOCV and fivefold cross-validation, and outperformed other four state-of-the-art methods. We also carried out two kinds of case studies on prostate neoplasm, colorectal neoplasm, and lung neoplasm, and most of the top 50 predicted miRNAs were confirmed to have an association with the corresponding diseases based on dbDeMC, miR2Disease, and HMDD V3.0 databases. Conclusions The experimental results demonstrate that WBNPMD can accurately infer potential miRNA-disease associations. We anticipated that the proposed WBNPMD could serve as a powerful tool for potential miRNA-disease associations excavation. Electronic supplementary material The online version of this article (10.1186/s12967-019-2063-4) contains supplementary material, which is available to authorized users.


Background
MiRNAs are a class of the short endogenous non-coding RNAs (ncRNAs), and their length are about 20-25 nucleotides [1]. These miRNAs can bind to specific target messenger RNAs (mRNAs), triggering regulated degradation or suppressing their translation [1][2][3][4]. In this way, various important biological processes are influenced by miRNAs, including cell development [5], proliferation [6], apoptosis [7], differentiation [8], metabolism [9,10], aging [9,10], and signal transduction [11]. In 2005, Croce and Calin discovered that the differential expression of miRNAs has a great influence on the development of various cancer [12], such as breast cancer [13], lung cancer [14], and prostate cancer [15]. Therefore, scientists devoted themselves to mining the disease-associated miRNAs in recent years, to have a better comprehension of the mechanism of diseases on the molecular level, and thus improve the disease diagnosis and treatment [16][17][18]. In the early stage of miRNA research, the identification of disease-miRNA associations was conducted by biological experiments, which are rather expensive

Open Access
Journal of Translational Medicine *Correspondence: syp@gdut.edu.cn and time-consuming. Therefore, increasing numbers of computational methods were developed into usage in the field of bioinformatics. Guided by the prediction result, miRNA-disease pairs with high potential uncovered by biological experiments were much more effective than before.
According to previous researches, miRNAs that have functional similarity regulates similar diseases and vice versa [19,20]. Thus, various computational methods were developed for potential miRNA-disease associations excavation based on this assumption. So far, methods for miRNA-disease associations prediction can be roughly summarized into two categories, machine learning methods and complex network-based methods.
Generally, machine learning methods utilize the biological features of miRNA and disease to train classifiers for miRNA-disease associations prediction. So far, supervised and semi-supervised methods were widely employed for associations identification, and their difference lies in the requirement of negative samples in the training stage. In the supervised method presented by Xu et al. a support vector machine (SVM) classifier was trained by utilizing the topological information of miRNA target-dysregulated network (MTDN) for positive associations identification [21]. However, high confidence negative samples are very hard to obtain, which significantly influences the accuracy of a supervised classifier. Considering this factor, many semi-supervised methods were proposed by latter studies. For example, Chen and Yan [19] proposed a global method named RLSMDA based on regularized least squares. The RLSMDA could predict novel miRNA-disease associations without utilizing negative sample sets. Later, the GRMDA method proposed by Chen et al. [22] performed graph regression technique in three different latent spaces to infer potential miRNA-associated diseases. Recently, the IMCMDA proposed by Chen et al. [23] completed the missing miRNA-disease associations based on the known miRNA and disease similarity information. Another method proposed by Zhao et al. [24] namely NRLMFMDA focuses on the prediction task by mapping a miRNA and a disease to a shared low dimensional latent space. By using the L2 regularization to produce a finally optimized non-sparse combination of multiple base kernel, the MKRMDA proposed by Chen et al. [25] obtained a high prediction accuracy. Although these semi-supervised methods no longer require negative samples, their performance is unstable. In conclusion, the machine learning methods obtained an excellent result in miRNA-disease associations prediction.
By extracting information from the known miRNA-disease association network, complex network-based method offered an alternative approach in this field. There are two key factors for proposing network-based methods, the introduction of novel similarity information and different network construction techniques. With the fast development of biological research, more and more miRNA and disease similarity information became available, thus increasing numbers of studies started to introduce these novel information in their methods. The prediction accuracy can possibly be improved if these similarity information is made good use of, and the key lies in the construction technique of the miRNA-disease association network. Considering that the prediction accuracy of similarity measurement in the local network was unsatisfying [16], latter studies introduced many global network methods [26][27][28][29]. By implementing a random walk with restart into miRNA functional similarity network, Chen et al. developed the RWRMDA method for associations prediction [30]. With a given starting seed node, it simulates the process of the walker transfer from the current node to its neighborhood. However, the drawback of RWRMDA is that it could not predict new miRNA-disease pairs. The HDMP method proposed by Xuan et al. [31] employed the K-Nearest Neighbors technique to complete the prediction, which inspired many latter methods. Later, Liu et al. [32] calculated miRNA similarity based on miRNA-target and miRNA-lncRNA associations. Then a heterogeneous network was constructed by integrating known miRNA and disease information. Similarly, Luo and Xiao [33] implemented the unbalanced bi-random walk on a heterogeneous network. The HlPMDA proposed by Chen et al. also constructed a heterogeneous network, and implemented a heterogeneous label propagation to infer possible association [34]. By incorporating miRNA and disease similarity information, Jiang et al. [35] proposed an improved collaborative filtering algorithm. Recently, Chen et al. proposed a bipartite network projection model named BNPMDA [36]. By integrating known miRNA and disease similarity information, the BNPMDA constructed a weighted bipartite network, then the two-round resource allocation was implemented to uncover miRNA-disease associations.
According to previous works, network-based methods generally yield a higher prediction accuracy compared to machine learning methods, while the appropriate utilization of miRNA and disease similarities could further improve performance. In addition, the technique of assigning transfer weight to bipartite network model is widely employed to many research fields, and according to the study of Zhou et al. [37] the optimization of initial information in the bipartite network could bring extra benefit for improving prediction accuracy. Inspired by the aforementioned discussion, we proposed a novel method called weighted bipartite network projection for miRNA-disease association prediction (WBNPMD). In WBNPMD, the transfer weights in the bipartite network are assigned by combining known miRNA and disease similarities, and the initial information is properly configured by reducing the recommendation power of popular nodes. Compared to the previous machine learning methods, our method does not need negative samples. With the assignment of transfer weight and the configuration of initial information, our method acquired an even better result compared to other network-based methods. To evaluate the prediction accuracy of WBNPMD, we implemented leave-one-out crossvalidation (LOOCV) and fivefold cross-validation on our collected dataset downloaded from HMDD V2.0 [38], obtaining the AUCs of 0.9321 and 0.9173 ± 0.0005 . As an approach to further validation, we employed two types of case studies on three vital human diseases. These results indicated that our proposed method is a powerful tool for uncovering potential miRNA-disease associations.

Human miRNA-disease associations
In this article, we downloaded the known human miRNA-disease associations from HMDD v2.0 database, including 5430 associations, 383 diseases and 495 miR-NAs. Also, the number of miRNA and disease are represented as nm and nd respectively. In order to formalize these associations, a adjacency matrix A is constructed. If disease d j has confirmed relation with miRNA m i , then A ij is set to 1, otherwise 0.

MiRNA functional similarity
According to the assumption that functionally similar miRNAs tend to related with phenotypically similar diseases, Wang et al. [39] proposed a calculation method for miRNAs functional similarity, and its scores is obtained from http://www.cuila b.cn/files /image s/cuila b/misim .zip. A nm by nm matrix FS is constructed to represent miRNA functional similarity. Then the similarity score between two miRNAs m i and m j is denoted as FS(i, j).

Disease semantic similarity model 1
Here, we will introduce two models for disease semantic similarity calculation. Based on the Medical Subject Headings (MeSH) descriptors, Wang et al. developed the first model [39]. Given a specific disease S, Directed Acyclic Graph (DAG) can be utilized for its representation, i.e. DAG(S) = (S, T (S), E(S)) , where T(S) and E(S) denote the node set and edge set respectively. The contribution value of disease t in DAG(S) is defined as follows: where is the semantic contribution decay parameter. The semantic value of disease S is defined as follows: (1) where T(S) means all ancestor nodes of S and S itself. It is easy to conclude that the more DAG parts two diseases shared, the higher the semantic similarity score. Thus a nd by nd semantic similarity matrix SS1 is constructed, and entity SS1 (A, B) representing the semantic similarity score between disease A and B can be defined as follows:

Disease semantic similarity model 2
In disease similarity model 1, different ancestor diseases on the same layer of DAG(S) have same semantic contribution value. Considering that a more specific disease which appears in DAGs less frequently should have a higher contribution value to the semantic similarity of disease S, another disease semantic similarity model was proposed by Xuan et al. [31]. The contribution value of disease S in DAG(S) is defined as follows: Based on model 2, the semantic similarity matrix SS2 is computed with the utilization of DV2(A) and DV2(B), and they are calculated by the same way as formula 2. Then the semantic similarity score SS2(A, B) between disease A and B can be calculated as follows: At last, these two semantic similarity matrices SS1 and SS2 are combined into final semantic similarity matrix SS as follows:

Gaussian interaction profile kernel similarity
As an another approach to measure miRNA similarity and disease similarity, Gaussian interaction profile kernel similarities were also be constructed using the Radial Basic Functions. In adjacency matrix A, the ith row means whether miRNA m i have associations with every disease, and the jth column means whether disease d j have associations with every miRNA. Vector IP(m i ) and IP(d j ) represent the ith row vector and the jth column vector as feature vector for Gaussian kernel. Thus, we defined the Gaussian interaction profile kernel similarity between diseases d i and d j as KD, the Gaussian interaction profile kernel similarity D2 S (t) = − log the number of DAGs including t the number of diseases . between miRNAs m i and m j as KM, and they can be calculated as follows: Here, the kernel bandwidth β d and β m are defined as follows: where we set the value of original kernel bandwidth parameters β ′ d and β ′ m to 1.

Integrated similarity for miRNAs and diseases
From previous sections, we constructed several similarity matrices including miRNA functional similarity, disease semantic similarity and Gaussian profile kernel similarity. In here, we combined them into the integrated matrix for miRNAs and diseases. Concretely, if miRNA m i and m j are functionally similar, then the integrated similarity score for them is equal to FS(m i , m j ) , otherwise is equal to KM(m i , m j ) . The disease integrated matrix can be processed in a similar way. Then we computed the integrated matrices for miRNAs and diseases as follows:

WBNPMD
In this paper, we presented a bipartite network based method for miRNA-disease associations prediction named WBNPMD. The data preparation process for WBNPMD has been presented from previous six sections. The flowchart of WBNPMD is shown in Fig. 1. According to the assumption that similar miRNAs have higher chance to associate with similar diseases and vice versa, we utilized the integrated similarity of miRNA and disease to assign transfer weight to every edges in the miRNA-disease bipartite network. Therefore, the transfer weights are denoted as the following equation: , d i and d j has semantic similarity KD(d i , d j ), otherwise. , where wr(m j , d i ) is the transfer weight of the edge from miRNA m j to disease d i , and wd(m j , d i ) is the transfer weight of the edge from disease d i to miRNA m j . The transfer weight wr represents the recommendation power of every miRNA to different diseases, while wd represents the recommendation power of every disease to different miRNAs, indicating miRNA-disease pairs with higher potential.
We utilized known miRNA and disease similarity information to construct a more accurate bipartite network. Concretely, we separately implemented the disease-based bipartite network and the miRNA-based bipartite network. In the first implementation, all miRNAs are recommended to diseases, while in the second implementation all diseases are recommended to miRNAs. The recommendation score is obtained by averaging the final information matrices.
In the next, we will detailedly introduce the implementation of disease-based bipartite network. According to the study of Zhou et al. [37] reducing the initial information of popular nodes may lead to higher prediction accuracy. Therefore we denote the initial information between miRNA m j and disease d i as follows: where S ini is the initial information matrix, k i is the number of miRNAs that associated with disease d i , and parameter β ∈ (−1, 0).
After the initial information of all miRNAs and the transfer weight of every edges in the bipartite network are all set, we begin the information propagation process to obtain the final recommendation score. The information propagation process can be separated into two steps. In the first step, the initial information propagated from every miRNA to disease d i is calculated as: where In the second step, we propagate the information of diseases gathered from step one back to miRNAs to obtain the recommendation score, and can be calculated as the following equation: The disease-based recommendation score matrix S M can also be defined as follows: Here, P is defined as the nm by nm propagation matrix, and S M is the recommendation score gathered by twostep information propagation of weighted miRNAdisease bipartite network. The entity P(m j , m k ) in propagation matrix P, which represents the information gathered by miRNA m j from m k is defined as follows: Hence, equation 18 can also be rewritten as follows: Fig. 1 The basic idea of WBNPMD. In the first step, integrated similarity matrix are constructed by combining known miRNA-disease associations, miRNA and disease similarity information. Next, after the steps of transfer weight assignment and initial information configuration, two bipartite networks are constructed. Finally, the disease-based and miRNA-based bipartite network are separately implemented, and the final prediction result is obtained by averaging the recommendation score of above The equations from 15 to 22 are the details for the disease-based bipartite network. We similarly implemented the miRNA-based bipartite network to recommend diseases to miRNAs, and obtained the recommendation score matrix S D which represents the information propagated from diseases to miRNAs. Lastly, we calculated the final recommendation score matrix S fin between every miRNA-disease pairs by averaging S M and S D as follows:

Evaluation metrics
To evaluate the performance of WBNPMD for miRNAdisease associations identification, the LOOCV and fivefold cross-validation techniques were performed on the collected dataset. In each trial of LOOCV, each known miRNA-disease associations were treated as a test sample in turn while the rest were taken as training samples. The receiver operating characteristic (ROC) curve was plotted to visualize the performance of WBNPMD, and the area under the ROC curve (AUC) was computed to illustrate the superiority of our method. In fivefold crossvalidation, all known miRNA-disease associations were (23) S fin = S M + S D 2 randomly divided into 5 groups with equal size. Each group was left out as a test sample in turn, while the other 4 groups were utilized for training. To avoid data bias, the fivefold cross-validation was repeated 100 times, then we computed the average AUC value.

Effect of parameter
The WBNPMD method introduced one parameter β . According to Eq. (15), β configures the initial information of every node in the bipartite network. To study the effect of β , the LOOCV technique was implemented in the miRNA-disease associations dataset to observe how different β values would influence the AUCs. LOOCV was repeated multiple times by choosing the parameter value of β from − 1 to 0 with the step of 0.1. As shown in Fig. 2, we can observe that the AUCs have little fluctuation in the parameter range from − 1 to 0. The optimal parameter β is chosen based on the highest AUC value in the figure. In this paper, we set the parameter value of α to − 0.1.

Performance comparison
In order to express the reliability of WBNPMD, we compared WBNPMD with other four state-of-the-art methods, including RWRMDA, RLSMDA, GRMDA, and

Case studies
As an approach of further evaluation, three important human diseases were further verified through two types of case studies based on three different miRNA-disease databases named dbDEMC, miR2Disease and HMDD v3.0. We recorded the number of experimentally confirmed miRNAs in top 10, top 20, and top 50 that have associations with three diseases. In addition, the prediction result of all candidate miRNAs were publicly released for further expermental verification (see Additional file 1). Prostate neoplasms are one of the most frequently diagnosed malignant tumor in men, resulting in increased morbidity and mortality with age [40,41]. According to studies, some miRNAs could be the diagnostic biomarker for prostate neoplasms and even be helpful for the treatment process. For example, previous studies showed that miR-20 is vital to the regulation of prostate neoplasms [42], and upregulated expression of miR-483-5p would cause prostate cancer cell growth [43]. As shown in Table 1, 10 out of the top 10, 20 out of the top 20, and 47 out of the top 50 predicted miRNAs were experimentally confirmed to have an association with prostate neoplasms based on dbDEMC or miR2Disease.
Colorectal neoplasms are the third most common cancer type in both men and women with high a mortality rate, causing about 700,000 deaths every year. Only about 10% of colorectal neoplasms cases are hereditary, while most of the rest are posteriority. Studies confirmed that several factors may be the cause of colorectal neoplasms, including alcohol consumption, smoking, and physical inactivity [44]. Various miRNAs were confirmed to have a relation with colorectal neoplasms in recent researches. Take miR-10a for an example, by differently expressed in SW480 and SW620 cell lines, it could suppress the metastasis of colorectal cancer [45]. The proposed WBNPMD was employed on colorectal neoplasms and verified through dbDEMC and miR2Disease. As shown in Table 2, 10 out of the top 10, 19 out of the top 20, and 46 out of the top 50 miRNAs were experimentally confirmed.
In the second type of case studies, we evaluated the prediction accuracy of WBNPMD in lung neoplasms based on HMDD V2.0 database, and our results were validated in HMDD V3.0, dbDEMC and miR2Disease. As the most  [46]. Based on the result given by Table 3, 10, 20 and 47 out of the top 10, 20 and 50 miRNAs were confirmed to have an association with lung neoplasms by the aforementioned three databases. Taken together, these case studies above have indicated that WBNPMD has an outstanding performance for uncovering potential miRNA-disease associations.

Discussion
The results from above illustrate that both in LOOCV and fivefold cross-validation, the WBNPMD outperforms other comparison methods in terms of AUC. In addition, two types of case studies further confirmed the excellent performance of our proposed method. The excellent performance of WBNPMD can mainly be attributed to two reasons, the construction of transfer weight in the bipartite network and the adjustment of initial information. By combining known miRNA similarities and disease similarities, the weighted bipartite network is suitable for our work, guaranteeing a more precise result. Meanwhile, decreasing the initial information of popular nodes can further improve the prediction accuracy.
However, our method still has some limitations. First of all, the information completeness of the adjacency matrix A will have a heavy impact on the performance of WBNPMD. Moreover, the bipartite network projection model that we employ for predicting potential miRNAdisease associations cannot deal with the isolated nodes, 1 thus WBNPMD is not suitable for the excavation of the associations for a miRNA without any known associated disease or vice versa.

Conclusions
In this paper, we proposed the weighted bipartite network projection for miRNA-disease prediction (WBNPMD) method. LOOCV and fivefold cross-validation techniques were implemented to evaluate the performance of WBNPMD based on our collected dataset. The AUC values of the WBNPMD was 0.9321 in LOOCV and 0.9173 ± 0.0005 in fivefold cross-validation. Also, two types of case studies were conducted by implementing