skip to main content
10.1145/1458449.1458455acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Biological pathways as features for microarray data classification

Published:30 October 2008Publication History

ABSTRACT

Classification using microarray gene expression data is an important task in bioinformatics. Due to the high dimensionality and small sample size that characterizes microarray data, there has recently been a drive to incorporate any available information in addition to the expression data in the classification process. As a result, much work has begun on selecting biological pathways that are closely related to a clinical outcome of interest using the gene expression data, and incorporating this pathway information opens up new avenues for classification. As opposed to previous approaches that consider individual genes as features, we propose a new approach that treats biological pathways as features. Each pathway found to be significantly related to an outcome of interest is treated as a feature, and is mapped to a feature value. We define several methods for mapping pathways to features, and compare the performance of several classifiers using our feature transformations to that of the classifiers using individual genes as features for different feature selection methods.

References

  1. G. Cawley and N. Talbot. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.Bioinformatics 22(19):2348, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Chen and J. Jeong. Minimum reference set based feature selection for small sample classifications. Proceedings of the 24th international conference on Machine learning pages 153--160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Diaz-Uriarte and S. Alvarez de Andres. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3):1471--2105, 2006.Google ScholarGoogle Scholar
  4. G. Fort and S. Lambert-Lacroix. Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7):1104--1111, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Goeman, S. van de Geer, F. de Kort, and H. van Houwelingen. A global test for groups of genes:testing association with a clinical outcome, 2004.Google ScholarGoogle Scholar
  6. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46(1):389--422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer, 2001.Google ScholarGoogle Scholar
  8. T. Jirapech-Umpai and S. Aitken. Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6(1):148, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Kanehisa and S. Goto. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1):27, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the fourteenth international joint conference on artificial intelligence volume 2, pages 1137--1143, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Liang, V. Mandal, Y. Lu, and D. Kumar. Mcm-test: a fuzzy-set-theory-based approach to differential analysis of gene pathways. BMC Bioinformatics 9 (Suppl 6):S16, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Liao and K. Chin. Logistic regression for disease classification using microarray data:model selection in a large pandsmall ncase. Bioinformatics 23(15):1945, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13:51--60, 2002.Google ScholarGoogle Scholar
  14. V. Mootha, C. Lindgren, K. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstraale, E. Laurila, et al. PGC-1 α -responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34(3):267--273, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Pang, A. Lin, M. Holford, B. Enerson, B. Lu, M. Lawton, E. Floyd, and H. Zhao. Pathway analysis using random forests classification and regression. Bioinformatics 22(16):2028, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Robnik-šikonja and I. Kononenko. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53(1):23--69, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Saeys, I. Inza, and P. Larranaga. A review of feature selection techniques in bioinformatics.Bioinformatics 23(19):2507--2517, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Shevade and S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression, 2003.Google ScholarGoogle Scholar
  19. F. Tai and W. Pan. Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.Bioinformatics 23(23):3170, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Tai and W. Pan. Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics 23(14):1775, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Tang, Y. Zhang,and Z. Huang. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis.IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3):365--381, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Tomfohr, J. Lu, and T. Kepler. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6(1):225, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. Wang, I. Tetko, M. Hall, E. Frank, A. Facius, K. Mayer, and H. Mewes. Gene selection from microarray data for cancer classification -- a machine learning approach.Computational Biology and Chemistry 29(1):37--46, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Zhang, J. Ahn, X. Lin, and C. Park. Gene selection using support vector machines with non-convex penalty. Bioinformatics 22(1):88--95, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Biological pathways as features for microarray data classification

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              DTMBIO '08: Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
              October 2008
              92 pages
              ISBN:9781605582511
              DOI:10.1145/1458449

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 30 October 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate41of247submissions,17%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader