Skip to main content
Log in

Recent advances on the machine learning methods in predicting ncRNA-protein interactions

  • Review
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Recent transcriptomics and bioinformatics studies have shown that ncRNAs can affect chromosome structure and gene transcription, participate in the epigenetic regulation, and take part in diseases such as tumorigenesis. Biologists have found that most ncRNAs usually work by interacting with the corresponding RNA-binding proteins. Therefore, ncRNA-protein interaction is a very popular study in both the biological and medical fields. However, due to the limitations of manual experiments in the laboratory, machine-learning methods for predicting ncRNA-protein interactions are increasingly favored by the researchers. In this review, we summarize several machine learning predictive models of ncRNA-protein interactions over the past few years, and briefly describe the characteristics of these machine learning models. In order to optimize the performance of machine learning models to better predict ncRNA-protein interactions, we give some promising future computational directions at the end.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Abbreviations

ncRNA:

Non-coding RNA

rRNA:

Ribosomal RNA

tRNA:

Transfer RNA

miRNA:

MicroRNA

snRNA:

Small nuclear RNA

lncRNAs:

Long non coding RNAs

SVM:

Support vector machine

RF:

Random forest

LOOCV:

Leave-one-out cross validation

K-CV:

K-Fold Cross Validation

ROC:

Receiver operator characteristics

AUC:

The area under ROC curve

PPSNs:

Protein–protein similarity networks

SNF:

Similarity network fusion

XGB:

Extreme gradient enhancement

SAN:

Stacking autoencoder networks

PSSM:

Position-specific scoring matrix

SVD:

Singular value decomposition

PZM:

Pseudo-Zernike moment

PCC:

Pearson correlation coefficient

GBDT:

Gradient boosting decision tree

Extra tree:

Extremely randomized trees

LMs:

Legendre moments

PWM:

Position weight matrix

References

  • Adelman K, Egan E (2017) Non-coding RNA: more uses for genomic junk. Nature 543:183–185

    CAS  PubMed  Google Scholar 

  • Ahmad S, Sarai A (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33

    Google Scholar 

  • Anastasiadou E, Jacob LS, Slack FJ (2018) Non-coding RNA networks in cancer. Nat Rev Cancer 18:5–18

    CAS  PubMed  Google Scholar 

  • Barros RC, Basgalupp MP, de Carvalho AC, Freitas AA (2013) Automatic design of decision-tree algorithms with evolutionary algorithms. Evol Comput 21:659–684

    PubMed  Google Scholar 

  • Bastanlar Y, Ozuysal M (2014) Introduction to machine learning. Methods Mol Biol 1107:105–128

    PubMed  Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    CAS  PubMed  PubMed Central  Google Scholar 

  • Blumensath T (2016) Directional clustering through matrix factorization. IEEE Trans Neural Netw Learn Syst 27:2095–2107

    PubMed  Google Scholar 

  • Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A (2017) Deep learning: a primer for radiologists. Radiographics 37:2113–2131

    PubMed  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. ACM SIGKDD Int Conf Knowl Discov Data Min 16:785–794

    Google Scholar 

  • Chen X, Yan CC, Zhang X, You ZH (2017) Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform 18:558–576

    CAS  PubMed  Google Scholar 

  • Chen X, Wang L, Qu J, Guan NN, Li JQ (2018a) Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics 34:4256–4265

    CAS  PubMed  Google Scholar 

  • Chen X, Yin J, Qu J, Huang L (2018b) MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLoS Comput Biol 14:e1006418–e1006418

    PubMed  PubMed Central  Google Scholar 

  • Chen X, Xie D, Zhao Q, You Z-H (2019a) MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform 20:515–539

    CAS  PubMed  Google Scholar 

  • Chen X, Zhu CC, Yin J (2019b) Ensemble of decision tree reveals potential miRNA-disease associations. PLoS Comput Biol 15:e1007209

    PubMed  PubMed Central  Google Scholar 

  • Chhabra R (2015) miRNA and methylation: a multifaceted liaison. ChemBioChem 16:195–203

    CAS  PubMed  Google Scholar 

  • Czarnecki WM, Podlewska S, Bojarski AJ (2015) Extremely randomized machine learning methods for compound activity prediction. Molecules 20:20107–20117

    CAS  PubMed  PubMed Central  Google Scholar 

  • Ding Y, Tang J, Guo F (2019) Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing 325:211–224

    Google Scholar 

  • Esteller M (2011) Non-coding RNAs in human disease. Nat Rev Genet 12:861–874

    CAS  PubMed  Google Scholar 

  • Ge E, Yang Y, Gang M, Fan C, Zhao Q (2020) Predicting human disease-associated circRNAs based on locality-constrained linear coding. Genomics 112:1335–1342

    CAS  PubMed  Google Scholar 

  • Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458:223–227

    CAS  PubMed  PubMed Central  Google Scholar 

  • Hombach S, Kretz M (2016) Non-coding RNAs: classification, biology and functioning. Adv Exp Med Biol 937:3–17

    CAS  PubMed  Google Scholar 

  • Hu H, Zhang L, Ai H, Zhang H, Fan Y, Zhao Q, Liu H (2018) HLPI-Ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol 15:797–806

    PubMed  PubMed Central  Google Scholar 

  • Kondo Y, Shinjo K, Katsushima K (2017) Long non-coding RNAs as an epigenetic regulator in human cancers. Cancer Sci 108:1927–1933

    CAS  PubMed  PubMed Central  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    CAS  PubMed  Google Scholar 

  • Li ZW, You ZH, Chen X, Gui J, Nie R (2016) Highly accurate prediction of protein–protein interactions via incorporating evolutionary information and physicochemical characteristics. Int J Mol Sci 17:1396

    PubMed Central  Google Scholar 

  • Liu H, Ren G, Chen H, Liu Q, Yang Y, Zhao Q (2020) Predicting lncRNA–miRNA interactions based on logistic matrix factorization with neighborhood regularized. Knowl Based Syst 191:105261

    Google Scholar 

  • Liu H, Ren G, Hu H, Zhang L, Ai H, Zhang W, Zhao Q (2017) LPI-NRLMF: lncRNA-protein interaction prediction by neighborhood regularized logistic matrix factorization. Oncotarget 8:103975–103984

    PubMed  PubMed Central  Google Scholar 

  • Mattick JS, Makunin IV (2006) Non-coding RNA. Hum Mol Genet 15(Spec No 1):R17–29

    CAS  PubMed  Google Scholar 

  • Muppirala UK, Honavar VG, Dobbs D (2011) Predicting RNA-protein interactions using only sequence information. BMC Bioinform 12:489

    CAS  Google Scholar 

  • Nedaie A, Najafi AA (2018) Support vector machine with Dirichlet feature mapping. Neural Netw 98:87–101

    PubMed  Google Scholar 

  • Pan X, Fan YX, Yan J, Shen HB (2016) IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genom 17:582

    Google Scholar 

  • Peschansky VJ, Wahlestedt C (2014) Non-coding RNAs as direct and indirect modulators of epigenetic regulation. Epigenetics 9:3–12

    CAS  PubMed  Google Scholar 

  • Ruths T, Ruths D, Nakhleh L (2009) GS2: an efficiently computable measure of GO-based similarity of gene sets. Bioinformatics 25:1178–1184

    CAS  PubMed  PubMed Central  Google Scholar 

  • Shen C, Ding Y, Tang J, Jiang L, Guo F (2019) LPI-KTASLP: prediction of lncRNA-protein interaction by semi-supervised link learning with multivariate information. IEEE Access 7:13486–13496

    Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    CAS  PubMed  Google Scholar 

  • Su R, Liu X, Wei L, Zou Q (2019) Deep-Resp-Forest: a deep forest model to predict anti-cancer drug response. Methods 166:91–102

    CAS  PubMed  Google Scholar 

  • Vareka L, Mautner P (2017) Stacked autoencoders for the P300 component detection. Front Neurosci 11:302

    PubMed  PubMed Central  Google Scholar 

  • Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11:333–337

    CAS  PubMed  Google Scholar 

  • Wei JW, Huang K, Yang C, Kang CS (2017) Non-coding RNAs as regulators in epigenetics (Review). Oncol Rep 37:3–9

    PubMed  Google Scholar 

  • Xiao Y, Zhang J, Deng L (2017) Prediction of lncRNA-protein interactions using HeteSim scores based on heterogeneous networks. Sci Rep 7:3664

    PubMed  PubMed Central  Google Scholar 

  • Yang F, Zhang H, Mei Y, Wu M (2014) Reciprocal regulation of HIF-1alpha and lincRNA-p21 modulates the Warburg effect. Mol Cell 53:88–100

    CAS  PubMed  Google Scholar 

  • Yi HC, You ZH, Huang DS, Li X, Jiang TH, Li LP (2018) A deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information. Mol Ther Nucleic Acids 11:337–344

    CAS  PubMed  PubMed Central  Google Scholar 

  • Yi HC, You ZH, Wang MN, Guo ZH, Wang YB, Zhou JR (2020) RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. Bio Inform 21:60

    CAS  Google Scholar 

  • Yi Y, Zhao Y, Li C, Zhang L, Huang H, Li Y, Liu L, Hou P, Cui T, Tan P, Hu Y, Zhang T, Huang Y, Li X, Yu J, Wang D (2017) RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res 45:D115–D118

    CAS  PubMed  Google Scholar 

  • Yuan J, Wu W, Xie C, Zhao G, Zhao Y, Chen R (2014) NPInter v2.0: an updated database of ncRNA interactions. Nucleic Acids Res 42:D104–D108

    CAS  PubMed  Google Scholar 

  • Zeng X, Liao Y, Liu Y, Zou Q (2017) Prediction and validation of disease genes using HeteSim scores. IEEE/ACM Trans Comput Biol Bioinform 14:687–695

    CAS  PubMed  Google Scholar 

  • Zhan ZH, Jia LN, Zhou Y, Li LP, Yi HC (2019) BGFE: a deep learning model for ncRNA-protein interaction predictions based on improved sequence information. Int J Mol Sci 20(4):978

    CAS  PubMed Central  Google Scholar 

  • Zhang H, Shu H, Coatrieux G, Zhu J, Wu QM, Zhang Y, Zhu H, Luo L (2011) Affine Legendre moment invariants for image watermarking robust to geometric distortions. IEEE Trans Image Process 20:2189–2199

    PubMed  PubMed Central  Google Scholar 

  • Zhang L, Ai HX, Li SM, Qi MY, Zhao J, Zhao Q, Liu HS (2017) Virtual screening approach to identifying influenza virus neuraminidase inhibitors using molecular docking combined with machine-learning-based scoring function. Oncotarget 8:83142–83154

    PubMed  PubMed Central  Google Scholar 

  • Zhang T, Wang M, Xi J, Li A (2018) LPGNMF: predicting long non-coding RNA and protein interaction using graph regularized nonnegative matrix factorization. IEEE/ACM Trans Comput Biol Bioinform 17:189–197

    PubMed  Google Scholar 

  • Zhao Q, Liang D, Hu H, Ren G, Liu H (2018a) RWLPAP: random walk for lncRNA-protein associations prediction. Protein Pept Lett 25:830–837

    CAS  PubMed  Google Scholar 

  • Zhao Q, Yu H, Ming Z, Hu H, Ren G, Liu H (2018b) The bipartite network projection-recommended algorithm for predicting long non-coding RNA-protein interactions. Mol Therapy Nucleic Acids 13:464–471

    CAS  Google Scholar 

  • Zhao Q, Zhang Y, Hu H, Ren G, Zhang W, Liu H (2018c) IRWNRLPI: integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction prediction. Frontiers Genet 9:239–239

    Google Scholar 

  • Zhao Q, Yang Y, Ren G, Ge E, Fan C (2019) Integrating bipartite network projection and KATZ measure to identify novel CircRNA-disease associations. IEEE Trans Nanobiosci 18:578–584

    Google Scholar 

  • Zheng X, Wang Y, Tian K, Zhou J, Guan J, Luo L, Zhou S (2017) Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions. BMC Bioinform 18:420

    Google Scholar 

  • Zhou YK, Shen ZA, Yu H, Luo T, Gao Y, Du PF (2020) Predicting lncRNA-protein interactions with miRNAs as mediators in a heterogeneous network model. Front Genet 10:1341

    PubMed  PubMed Central  Google Scholar 

  • Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM (2019) Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Inf Fusion 50:71–91

    PubMed  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 11805091.

Author information

Authors and Affiliations

Authors

Contributions

QZ designed and conceived the project. LZ, JS and QZ wrote the manuscript. MZ designed the figures and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Qi Zhao.

Ethics declarations

Conflict of interest

The authors declare no conficts of interest and in particular: Lin Zhong declares that she has no confict of interest, Meiqin Zhen declares that she has no confict of interest, Jianqiang Sun declares that he has no confict of interest, Qi Zhao declares that he has no confict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, L., Zhen, M., Sun, J. et al. Recent advances on the machine learning methods in predicting ncRNA-protein interactions. Mol Genet Genomics 296, 243–258 (2021). https://doi.org/10.1007/s00438-020-01727-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-020-01727-0

Keywords

Navigation