ABSTRACT
Classification of samples in expression microarray experiments represents a crucial task in bioinformatics and biomedicine. In this paper this scenario is addressed by employing a particular class of statistical approaches, called Topic Models. These models, firstly introduced in the text mining community, permit to extract from a set of objects (typically documents) an interpretable and rich description, based on an intermediate representation called topics (or processes). In this paper the expression microarray classification task is cast into this probabilistic context, providing a parallelism with the text mining domain and an interpretation. Two different topic models are investigated, namely the Probabilistic Latent Semantic Analysis (PLSA) and the Latent Dirichlet Allocation (LDA). An experimental evaluation of the proposed methodologies on three standard datasets confirms their effectiveness, also in comparison with other classification methodologies.
- U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96(12):6745--6750, 1999.Google ScholarCross Ref
- S. Armstrong, J. Staunton, L. Silverman, R. Pieters, M. den Boer, M. Minden, S. Sallan, E. Lander, T. Golub, and S. Korsmeyer. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41--47, 2002.Google ScholarCross Ref
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. J. of Mach. Learn. Res., 3:993--1022, 2003. Google ScholarDigital Library
- A. Bosch, A. Zisserman, and X. Munoz. Scene classification via PLSA. In Proc. of European Conference on Computer Vision, volume 4, pages 517--530, 2006. Google ScholarDigital Library
- N. Brändle, H. Bischof, and H. Lapp. Robust DNA microarray image analysis. Machine Vision and Applications, 15:11--28, 2003. Google ScholarDigital Library
- G. Brelstaff, M. Bicego, N. Culeddu, and M. Chessa. Bag of peaks: interpretation of nmr spectrometry. Bioinformatics, 25(2):258--264, 2009. Google ScholarDigital Library
- M. Cristani, A. Perina, U. Castellani, and V. Murino. Geo-located image analysis using latent representations. In Proc. Conf. Computer Vision and Pattern Recognition, 2008, pages 1--8, 2008. Google ScholarDigital Library
- A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39:1--38, 1977.Google ScholarCross Ref
- C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In Proc. of IEEE Computer Society Bioinformatics Conference, pages 523--529, 2003. Google ScholarDigital Library
- R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley and Sons, second edition, 2001. Google ScholarDigital Library
- M. Girolami and A. Kabán. On an equivalence between plsi and lda. In Proc. of ACM SIGIR conf. on Research and development in informaion retrieval, pages 433--434, 2003. Google ScholarDigital Library
- T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439):531--537, October 1999.Google ScholarCross Ref
- T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn., 42(1--2):177--196, 2001. Google ScholarDigital Library
- J. Lee, J. Lee, M. Park, and S. Song. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis, 48(4):869--885, 2005.Google ScholarCross Ref
- H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(8):1226---1238, 2005. Google ScholarDigital Library
- S. Rogers, M. Girolami, C. Campbell, and R. Breitling. The latent process decomposition of cdna microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(2):143--156, 2005. Google ScholarDigital Library
- A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631--643, 2005. Google ScholarDigital Library
- F. Valafar. Pattern recognition techniques in microarray data analysis: A survey. Annals of the New York Academy of Sciences, 980:41--64, 2002.Google ScholarCross Ref
- Y. Ying, P. li, and C. Campbell. A marginalized variational bayesian approach to the analysis of array data. BMC Proceedings, 2(Suppl 4): S7, 2008.Google ScholarCross Ref
Index Terms
- Expression microarray classification using topic models
Recommendations
Investigating Topic Models' Capabilities in Expression Microarray Data Classification
In recent years a particular class of probabilistic graphical models—called topic models—has proven to represent an useful and interpretable tool for understanding and mining microarray data. In this context, such models have been almost only applied in ...
Topic sentiment mixture: modeling facets and opinions in weblogs
WWW '07: Proceedings of the 16th international conference on World Wide WebIn this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent ...
A Novel Approach for Classifying Gene Expression Data using Topic Modeling
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health InformaticsUnderstanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ ...
Comments