Prediction of human microRNA hairpins using only positive sample learning

Abstract

MicroRNAs(miRNA) are small molecular non-coding RNAs that have important roles in the post-transcriptional mechanism of animal and plant. They are commonly 21-25 nucleotides (nt) long and derived from 60-90 nt RNA hairpin structures, called miRNA hairpins. A larger num-ber of sequence segments in the human genome have been computationally identified with such 60-90 nt hairpins, however a majority of them are not miRNA hairpins. Most computational meth-ods so far for predicting miRNA hairpins were based on a two-class classifier to distinguish between miRNA hairpins and other sequence segments with hairpin structures. The difficulty of these methods is how to select hairpins as negative examples of miRNA hairpins in the classifier-training datasets, since only a few miRNA hairpins are available. Therefore, their classifier may be mis-trained due to some false negative examples of the training dataset. In this paper, we introduce a one-class support vector machine (SVM) method to predict miRNA hair-pins among the hairpin structures. Different from existing methods for predicting miRNA hairpins, the one-class SVM model is trained only on the information of the miRNA class. We also illus-trate some examples of predicting miRNA hair-pins in human chromosomes 10, 15, and 21, where our method overcomes the above disad-vantages of existing two-class methods.

Share and Cite:

Tran, D. , Pham, T. , Satou, K. and Ho, T. (2008) Prediction of human microRNA hairpins using only positive sample learning. Journal of Biomedical Science and Engineering, 1, 141-146. doi: 10.4236/jbise.2008.12023.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] V. Ambros. (2004) The functions of animal microRNAs. Nature, 431, 350–355.
[2] D. P. Bartel. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116, 281–297.
[3] I. Bentwich. (2005) Prediction and validation of microRNAs and their targets. FEBS Lett, 579, 5904–5910.
[4] E. Berezikov, E. Cuppen, and R. H. Plasterk. (2006) Approaches to
[5] C. J. Burges. (1998) A tutorial on support vector machines for pattern recognition. J. Data Mining and Knowledge Discovery, 2, 121-167.
[6] M. T. Bohnsack, K. Czaplinski and D. Grlich. (2004) Exportin 5 is
[7] J. Brown, P. Sanseau. (2005) A computational view of microRNAs and their targets. Drug discovery today: biosilico, 10(8), 595–601.
[8] C. -C. Chang, and C. -J. Lin. (2001) LIBSVM: a library for support vector machines.
[9] Y. Chen, X. Zhou, and T. S. Huang. (2001) One-class SVM for learning in image retrieval. Proc. IEEE Int’l Conf. on Image Processing, Thes-saloniki, Greece.
[10] M. A. Denli, B. J. Tops, H. A. Plasterk, R. F. Ketting, and G. J.Hannon. (2004) Processing of primary microRNAs by the Microprocessor complex. Nature, 432, 231–235.
[11] Y. Grad, J. Aach, G. D. Hayes, B. J. Reinhart, G. M. Church, G. Ruvkun, and J. Kim. (2003) Computational and experimental identifica-tion of C. elegans microRNAs. Mol Cell, 11,1253–1263.
[12] R. I. Gregory, K. P. Yan, G. Amuthan, T. Chendrimada, B. Doratotaj, N. Cooch, and R. Shiekhattar. (2004) The microprocessor complex medi-ates the genesis of microRNAs. Nature, 423, 235–240.
[13] S. Griffiths-Jones, R. J. Grocock, S. Dongen, A. Bateman, A. J. Enright. (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res., 34, D140–D144.
[14] S. Griffiths-Jones. (2004) The microRNA Registry, Nucleic Acids Res., 32, D109–D111.
[15] S. A. Helvik, O. S. Jr, and P. Strom. (2007) Reliable prediction of Drosha processing sites improves microRNA gene prediction. Bioin-formatics, 23(2), 142-149.
[16] I. L. Hofacker, S. Fontana, W. Stadler, S. Bonhoeffer, M. Tacker, and P. Schuster. (1994) Fast folding and comparison of RNA secondary struc-tures, Monatshefte f. Chemie, 125, 167-188.
[17] I. L. Hofacker. (2003) Vienna RNA secondary structure server. Nucleic Acids Res, 31, 3429–3431.
[18] M. Kiriakidou, P. T. Nelson, A. Kouranov, P. Fitziev, C. Bouyioukos, Z. Mourelatos, and A. Hatzigeorgiou. (2004) A combined computational experimental approach predicts human microRNA targets. Genes Dev, 18, 1165–1178.
[19] A. Kowalczyk, and B. Raskutti. (2002) One-class svm for yeast regu-lation prediction. Proc. SIGKDD Explorations Workshop, 99–100.
[20] R. Kohavi. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th IJCAI, San Fran-cisco, CA, Morgan Kaufmann Pubshers, 1137–1143,.
[21] Y. Kong, J.-H. Han. (2005) MicroRNA: Biological and computational perspective. Geno. Prot. Bioinfo., 3(2), 62–72.
[22] J. Krol, K. Sobczak, U. Wilcztnska, M. Drath, A. Jasinska, D.
[23] M. Lagos-Quintana, R. Rauhut, W. Lendeckel, and T. Tuschl. (2001) Identification of novel gene coding for small expressed RNAs. Science, 294, 853–858.
[24] E. C. Lai, P. Tomancak, R. W. Williams, and G. M. Rubin. (2003) Computational identification of Drosophila microRNA genes. Genome Biol, 4, R42.
[25] Y. Lee, C. Ahn, J. Han, H. Choi, J. Yim, P. Provost, O. Radmark, S. Kim, and V. N. Kim. (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature, 424, 415–419.
[26] Y. Lee, M. Kim, J. Han, K. Yeom, S. H. Lee, S. H. Baek, and V. N. Kim. (2004) MicroRNA genes are transcribed by RNA polymerase II. EmboJ, 23,4051–4060.
[27] L. P. Lim, M. E. Glasner, S. Yekta, C. B. Burge, and D. P. Bartel. (2003) Vertebrate microRNA genes. Science, 299, 1540.
[28] L. P. Lim, N. C. Lau, E. G. Weinstein, A. Abdelhakim, S. Yekta, M. W. Rhoades, C. B. Burge, and D. P. Bartel. (2003) The microRNAs of Caenorhabditis elegans. Genes Dev, 17, 991-1008.
[29] E. Lund, S. Guttinger, A. Calado, J. E. Dahlberg, and U. Kutay. (2004) Nuclear export of microRNA precursors. Science, 303, 95–98.
[30] L. M. Manevitz, and M. Yousef. (2001) One-class SVMs for document classification. Journal of Machine Learning, 2, 139–154.
[31] J. W. Nam, K. R. Shin, Y. V. Lee, N. Kim, and B. T. Zhang. (2005) Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res., 33, 3570–3581.
[32] U. Ohler, S. Yekta, L. P. Lim, D. P. Bartel, and C. B. Burge. (2004) Patterns of flanking sequence conservation and a characteristic up-stream motif for microRNA gene identification. RNA, 10, 1309–1322.
[33] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. (2001) Estimating the support of a high-dimensional distri-bution. Neural Comput, 13, 1443–1471.
[34] P. Strom, O. S. Jr, M. Nedland, T. B. Grnfeld, Y. Lin, M. B.Bass, J. Canon. (2006) Conserved microRNA characteristics in mammals. Oli-gonucleotides, 16, 115–144.
[35] K. Szafranski, M. Megraw, M. Reczko, G. H. Hatzigeorgiou. (2006) Support vector machine for predicting microRNA hairpins. Proc. The 2006 International Conference on Bioinformatics and Computational Biology, 270–276.
[36] A. Tsirigos, and I. Rigoutsos. (2005) A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes. Nucleic Acids Research, 33(12):3699–3707.
[37] D. H. Tran, T. H. Pham, K. Satou, and T. B. Ho. (2008) Prediction of microRNA hairpins using one-class support vector machine. Proc. The 2nd international conference on bioinformatics and biomedical engineering (iCBBE), Sanghai, China, May 16-18.
[38] V. Vapnik. Statistical learning theory, Wiley, Chichester, United Kingdom, 1998.
[39] X. Xie, J. Lu, E. J. Kulbokas, T. R. Golub, V. Mootha, K. Lindblad-Toh, E. S. Lander, and M. Kellis. (2005) Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature, 434, 338–346.
[40] C. Xue, F. Li, T. He, G. P. Liu, Y. Li, and X. Zhang. (2005) Classi-fication of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics, 6, 310.
[41] L. H. Yang, W. Hsu, M. L. Lee, and L. Wong. (2006) Identification
[42] Y. Zeng, R. Yi, and B. R. Cullen. (2005) Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. Embo J, 24,138–148.
[43] http://www.csie.ntu.edu.tw/ cjlin/libsvm/
[44] http://microrna.sanger.ac.uk/sequences/index.shtml
[45] http://www.tbi.univie.ac.at/RNA/

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.