Elsevier

Signal Processing

Volume 190, January 2022, 108312
Signal Processing

MiRNA-Disease association prediction via non-negative matrix factorization based matrix completion

https://doi.org/10.1016/j.sigpro.2021.108312Get rights and content

Highlights

  • We propose a new matrix completion model for miRNA-disease association prediction.

  • We decompose the association matrix into a known part and an unknown part.

  • Disease similarity and miRNA similarity are well embedded into the proposed model.

  • Non-negative matrix factorization model is used to assist the prediction process.

Abstract

A large number of biological studies have shown that microRNAs (miRNAs) are closely related to the occurrence and development of various human diseases. Nowadays, more and more research has explored the relationship between miRNAs and human diseases. However, existing known associations are often sparse, and it is not easy to predict the potential miRNA-disease associations accurately from large amounts of biological data. Hence, how to predict these associations effectively is an exploratory scientific topic. In this work, we propose a new matrix completion algorithm based on non-negative matrix factorization (NMFMC) to infer potential miRNA-disease associations. In NMFMC, we decompose the miRNA-disease association matrix into a known part and an unknown part. In such a manner, the experimentally validated associations can be well preserved, and the potential associations can be better recovered. In addition, both disease similarity and miRNA similarity are embedded into the proposed model to assist the association recovering process. As a result, the non-negative matrix factorization, matrix completion and graph regularization constraints are integrated into a unified framework to serve miRNA-disease association prediction. The validity of our method is confirmed by global and local leave-one-out-cross-validation and achieves AUCs of 0.9165 and 0.8512, respectively, which is an effective improvement over previous methods. Furthermore, we conduct case studies on three widespread human diseases, and NMFMC is also applicable. For Colon Neoplasms, Prostate Neoplasms, and Breast Neoplasms, 45, 44, and 50 of the top 50 predictions based on existing associations are confirmed by experimental reports.

Introduction

MicroRNA (miRNA) is an RNA molecule widely found in eukaryotes and consists of approximately 21 to 23 nucleotides. It is estimated that miRNA can regulate nearly one-third of human genes [1], [2], [3]. In addition, miRNAs are closely related to many complex human diseases [4], [5], [6], [7]. miRNAs function mainly by inhibiting downstream gene expression, attenuating or eliminating the function of downstream genes, and using this indirect way to achieve regulation of physiological and pathological conditions. A large number of biological studies have confirmed the associations between certain miRNAs and diseases. For example, in Colorectal cancer, the content of miRNA-143 is decreased, which results in an increase in the target gene KRAS content, and promotes the proliferation of cancer cells [8]. Another important tumor suppressor miRNA-miR-16, also inhibits the proliferation of Colorectal cancer by inhibiting KRAS [9]. miR-145, which is expressed in the same cluster as miR-143, synergizes with miR-143 to inhibit IGF1R, thereby inhibiting Colorectal cancer cell proliferation [10]. In addition, miRNAs can regulate gene expression at the post-transcriptional level. For example, miRNA-29 can down-regulate intracellular Mcl-1 protein levels, which will lead to the expression of Bim and Puma proteins, making cancer cells sensitive to cytotoxicity of TRAIL and prone to apoptosis [11].

In the years since the first miRNA was discovered [8], thousands of miRNAs have been found in humans, mice, rats, zebrafish, fruit flies, rice, and almost all groups of animals and plants [1], [12], [13], opening up a new field of scientific research. Therefore, it is urgent to develop powerful computational models to predict new potential disease-miRNA associations [14], [15], [16]. Records from the Web of science and miRBase databases show that the number of research works related to miRNA and the number of miRNAs discovered have increased year by year since 2001 [12], [17], [18], [19], [20]. In addition, considerable efforts have been made to clarify the relationship between miRNAs and diseases. In the earlier years, based on the fact that microRNAs related to certain diseases are more phenotypically similar and the diseases associated with certain microRNAs are more functionally related, an integrated peptide-microRNAome network [21], [22] was constructed through known experimentally validated microRNA-disease association to describe the relationship between miRNAs and diseases. Since the target genes of two microRNAs might be functionally related instead of simply overlapping observably when they are present in the same cell ways or function module, this method did not achieve good performance. In [23], Li et al. come up with a meritorious statistical approach for the explicit simulation of uncertainties in the relevance between disease genes and target genes. In addition, the uncertainties of the seed set are merged into the GBA framework to predict the priority of candidate genes. Xu et al. [24] proposed a method of miRNA-target dysregulated network (MTDN), which is realized by integrating the calculated target estimation with the expression levels of miRNA and mRNA, which exist in tumor and non-tumor tissues respectively. Moreover, a SVM classifier was generated based on these characteristics and miRNA expression, which was used in miRNAs in MTDN, and disease positive miRNAs and negative miRNAs were distinguished according to topological features. Because negative samples are generally built manually by hand, and such negative samples are often not the actual negative samples we need, which reduces the prediction accuracy of the SVM-based model. Shi et al. [25] suggested a framework by exploiting the functional link between miRNA targets and disease genes in protein-protein interaction (PPI) networks, which is combined with random walk analysis. It is worth mentioning that a binary miRNA disease network was also built to identify miRNA-disease co-regulation modules. In [26], Qin et al. proposed an effective method for excavating disease-associated miRNAs based on domains rather than proteins or genes. In addition, the structure and function of the protein are also fully considered. Pasquier et al. [27] proposed a MiRAI model which represents miRNA and disease distribution information in a high-dimensional vector space, since miRNA and disease information can be revealed by distribution semantics. And define the association between miRNA and disease based on their vector similarity. All of the above methods have achieved promising results in the potential correlation prediction. However, due to the quite high false rates in the positive and negative samples in these target prediction databases [28], [29], the miRNA-based target models are not very effective in practice.

As a basic hypothesis, miRNAs with similar functions are thought to be associated with similar diseases, and vice versa [20], [30], [31], [32]. Based on this hypothesis, Chen et al. [33] proposed a new approach (RWRMDA) that uses global network similarity and identifies disease-associated miRNAs by restarting random walks using miRNA similarity networks. However, this method cannot work well for new diseases with no known related miRNAs. Xuan et al. [34] proposed a reliable computing model (HDMP) which combines the distribution of disease-related miRNA in k neighborhoods with the similarity of miRNA function to predict potential disease-related miRNAs. Since HDMP only considers local network similarity, it does not achieve decent performance. In [35], Mork et al. predicted the potential associations between diseases and miRNAs by combining the linkages among miRNAs, proteins and diseases. Xuan et al. [36] made full use of the characteristics of miRNA in the constructed miRNA network and proposed a framework called MIDP. Their framework assigns different weights to nodes of different categories and uses random walk model to predict disease-related miRNAs. Recently, Luo and Xiao [37] developed a Kronecker regularized least squares-based method by integrating heterogeneous omics data for identifying disease related miRNAs. Xiao et al. [38] suggested a graph regularized non-negative matrix factorization method for identifying miRNA-disease associations (GRNMF). However, most of the above methods strongly rely on known association information, but obtaining experimentally validated interactions is laborious, which makes predicting miRNA-disease associations remain a challenging problem. By integrating neighborhood constraint with matrix completion. Chen et al. [17] proposed a new matrix completion model by embedding neighborhood constraint for miRNA-disease association prediction. The low-rank property is also added to constrain matrix completion in [19], which is under the assumption that the miRNA-disease relation matrix is low-rank. By integrating the predicted association probability obtained from matrix decomposition through sparse learning method, Chen et al. developed a computational model of matrix decomposition and heterogeneous graph inference for miRNA-disease association [39]. Chen et al. [40] proposed a novel computational method named ensemble of decision tree based miRNA-disease association prediction, which innovatively integrates ensemble learning and dimensionality reduction. In [41], Chen et al. combined kernel-based nonlinear dimensionality reduction, matrix factorization and binary classification to construct a neoteric Bayesian model for potential miRNA-disease association prediction. In order to preserve the local structures of the training data, feature selection and Laplacian graph regularization [42], [43] are combined by sparse subspace learning for this prediction task [44]. By constructing bias ratings for miRNAs and diseases by using agglomerative hierarchical clustering according to different types of networks, Chen et al. proposed a novel computational model of bipartite network projection for miRNA-disease association prediction [45]. In order to enhance the efficiency and scalability for large scale datasets than previous methods, Zhao et al. [18] developed adaptive boosting to predict potential associations between diseases and miRNAs. Of course, there are also many other machine learning based methods proposed for this task, such as deep-belief network [46], graph learning [47], logistic model tree [48] and gradient boosting [49].

In the past years, a large number of studies have shown that matrix completion is an effective tool for miRNA-disease association prediction and other bioinformatics research [50], [51], [52], [53], [54], [55], [56]. By modelling experimentally validated miRNA-disease associations as a binary matrix, potential association prediction can be regarded as a problem of recovering the missed values in the matrix. Based on the partially known miRNA-disease association matrix, Li et al. [53] proposed a matrix completion algorithm (MCMDA) to update the adjacency matrix of known miRNA-disease associations and furthermore predict the potential associations. By combining the matrix completion with the similarity feature matrix to optimize the MCMDA algorithm, Chen et al. [54] proposed a improved version named IMCMDA, which can obtain better results. In [57], Tang et al. proposed a dual Laplacian regularization term to regularize the association matrix completion process, which takes full consideration of both the miRNA functional similarity and the disease semantic similarity.

Although previous matrix completion based methods obtain great success for miRNA-disease association prediction, most of them accomplished the miRNA-disease relation matrix completion directly, which is sensitive to some noisy information. Due to the robust data representation by decomposing original data matrix into a basis matrix and a coefficient matrix [58], [59], we integrate non-negative matrix factorization into original matrix completion model to promote the miRNA-disease association prediction performance. In detail, we introduce a new matrix completion algorithm based on non-negative matrix factorization to infer potential miRNA-disease associations in this work. Specifically, our goal is to recover a true completed low-rank matrix from a partially known correlation matrix, the low-rank matrix can be decomposed into two non-negative matrices. In addition, the local manifold structure of original data is preserved by a graph regularization term. To assess the effectiveness of our method, both global and local leave-one-out-cross-validation (LOOCV) are performed on known miRNA-disease association dataset downloaded from HMDD v2.0 [60]. In addition, case studies are conducted on three common human diseases to further evaluate the prediction accuracy and reliability of the proposed model. Experimental results show that the proposed NMFMC model has reliable performance superior to previous methods and can contribute to potential miRNA-disease association prediction. It is worth noting that the latest version of HMDD database is v3.2. Since the compared methods used in our experiments all used v2.0, in order to keep consistent with previous works, we also use v2.0 in the experimental comparison section in this work. The detailed implementation process of NMFMC is shown in Fig. 1.

Section snippets

Related work

In this section, we give a briefly review about some matrix completion based miRNA-disease association prediction methods that are most related to our proposed work. Before that, we firstly present some notations about the binary miRNA-disease association matrix. It is well known that a large number of miRNA-disease associations have been validated through accumulating biological experiments [12]. Similar to many previous studies, we use the human disease-miRNA association dataset obtained from

Proposed NMFMC

In this section, we give the details of our proposed method, i.e., NMFMC. Since we also need the miRNA similarity and disease similarity to regularize our model, we also introduce the generation of the two kinds of similarity matrices for making our paper more complete.

Cross validation

Cross-validation is a standard and frequently-used method to evaluate the miRNA-disease association prediction models. This scheme has been used in many previous studies [13], [34], [77]. We validate the proposed model by using leave-one-out-cross-validation (LOOCV) based on known miRNA-disease associations. In the experiments, each known miRNA-disease association was in turn treated as a test sample, and other known associations were considered as the training samples. Those miRNA-disease

Conclusion

Identification of disease-related miRNAs has important practical value and helps explain the underlying pathogenesis of human diseases. In this study, we propose a computational method called NMFMC for association prediction of miRNA-disease. The main contribution of our work is to combine the matrix completion with non-negative matrix factorization, considering the impact on the part of the known association during the matrix completion process, the known association itself is used as the

Declaration of Competing Interest

None.

Acknowledgment

The work was partly supported by the National Natural Science Foundation of China (NO. 62076228).

References (91)

  • J. Luo et al.

    A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network

    J. Biomed. Inform.

    (2017)
  • X. Chen et al.

    Potential miRNA-disease association prediction based on kernelized Bayesian matrix factorization

    Genomics

    (2020)
  • C. Tang et al.

    Robust unsupervised feature selection via dual self-representation and manifold regularization

    Knowl. Based Syst.

    (2018)
  • C. Tang et al.

    Unsupervised feature selection via latent representation learning and manifold regularization

    Neural Netw.

    (2019)
  • P. Ding et al.

    Human disease miRNA inference by combining target information based on heterogeneous manifolds

    J. Biomed. Inform.

    (2018)
  • S. Bandyopadhyay et al.

    Development of the human cancer microRNA network

    Silence

    (2010)
  • X. Chen et al.

    HAMDA: hybrid approach for miRNA-disease association prediction

    J. Biomed. Inform.

    (2017)
  • J.N.J. Buie et al.

    The role of miRNAs in cardiovascular disease risk factors.

    Atherosclerosis

    (2016)
  • Y. Yao et al.

    IMDAILM: inferring miRNA-disease association by integrating IncRNA and miRNA data

    IEEE Access

    (2019)
  • X. Chen et al.

    MicroRNAs and complex diseases: from experimental results to computational models

    Brief. Bioinformatics

    (2019)
  • M. Heidari et al.

    MicroRNA profiling in the bursae of Marek’s disease virus-infected resistant and susceptible chicken lines

    Genomics

    (2020)
  • H. MotieGhader et al.

    mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms

    Genomics

    (2020)
  • A.M. Cheng et al.

    Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis

    Nucleic Acids Res.

    (2005)
  • X. Chen et al.

    RBMMMDA: predicting multiple types of disease-microRNA associations

    Sci. Rep.

    (2015)
  • K. Che et al.

    Predicting miRNA-disease association by latent feature extraction with positive samples

    Genes

    (2019)
  • Y. Qu et al.

    KATZMDA: prediction of miRNA-disease associations based on KATZ model

    IEEE Access

    (2017)
  • X. Chen et al.

    NCMCMDA: miRNA-disease association prediction through neighborhood constraint matrix completion

    Brief. Bioinformatics

    (2021)
  • Y. Zhao et al.

    Adaptive boosting-based computational model for predicting potential miRNA-disease associations

    Bioinformatics

    (2019)
  • J. Xu et al.

    LRMCMDA: predicting miRNA-disease association by integrating low-rank matrix completion with miRNA and disease similarity information

    IEEE Access

    (2020)
  • Y. Tang et al.

    Identifying potential miRNA-disease associations based on an improved manifold learning framework

    IEEE Access

    (2020)
  • Q. Jiang et al.

    Prioritization of disease microRNAs through a human phenome-microRNAome network

    BMC Syst Biol

    (2010)
  • G. Li et al.

    Predicting microRNA-disease associations using network topological similarity based on DeepWalk

    IEEE Access

    (2017)
  • I. Lee et al.

    Prioritizing candidate disease genes by network-based boosting of genome-wide association data

    Genome Res.

    (2011)
  • J. Xu et al.

    Prioritizing candidate disease miRNAs by topological features in the miRNA target–dysregulated network: case study of prostate cancer

    Mol. Cancer Ther.

    (2011)
  • H. Shi et al.

    Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes

    BMC Syst. Biol.

    (2013)
  • C. Pasquier et al.

    Prediction of miRNA-disease associations with a vector space model

    Sci. Rep.

    (2016)
  • W. Ritchie et al.

    Predicting microRNA targets and functions: traps for the unwary

    Nat. Methods

    (2009)
  • L. Zhu et al.

    A two-stage geometric method for pruning unreliable links in protein-protein networks

    IEEE Trans. Nanobiosci.

    (2015)
  • P. Ding et al.

    A path-based measurement for human miRNA functional similarities using miRNA-disease associations

    Sci. Rep.

    (2016)
  • Z. Gao et al.

    Graph regularized l 2, 1-nonnegative matrix factorization for miRNA-disease association prediction

    BMC Bioinformatics

    (2020)
  • X. Chen et al.

    RWRMDA: predicting novel human microRNA–disease associations

    Mol. Biosyst.

    (2012)
  • P. Xuan et al.

    Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors

    PLoS ONE

    (2013)
  • S. Mørk et al.

    Protein-driven inference of miRNA–disease associations

    Bioinformatics

    (2014)
  • P. Xuan et al.

    Prediction of potential disease-associated microRNAs based on random walk

    Bioinformatics

    (2015)
  • Q. Xiao et al.

    A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations

    Bioinformatics

    (2018)
  • Cited by (17)

    • DGAMDA: Predicting miRNA-disease association based on dynamic graph attention network

      2024, International Journal for Numerical Methods in Biomedical Engineering
    View all citing articles on Scopus
    View full text