ABSTRACT
Classification using microarray gene expression data is an important task in bioinformatics. Due to the high dimensionality and small sample size that characterizes microarray data, there has recently been a drive to incorporate any available information in addition to the expression data in the classification process. As a result, much work has begun on selecting biological pathways that are closely related to a clinical outcome of interest using the gene expression data, and incorporating this pathway information opens up new avenues for classification. As opposed to previous approaches that consider individual genes as features, we propose a new approach that treats biological pathways as features. Each pathway found to be significantly related to an outcome of interest is treated as a feature, and is mapped to a feature value. We define several methods for mapping pathways to features, and compare the performance of several classifiers using our feature transformations to that of the classifiers using individual genes as features for different feature selection methods.
- G. Cawley and N. Talbot. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.Bioinformatics 22(19):2348, 2006. Google ScholarDigital Library
- X. Chen and J. Jeong. Minimum reference set based feature selection for small sample classifications. Proceedings of the 24th international conference on Machine learning pages 153--160, 2007. Google ScholarDigital Library
- R. Diaz-Uriarte and S. Alvarez de Andres. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3):1471--2105, 2006.Google Scholar
- G. Fort and S. Lambert-Lacroix. Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7):1104--1111, 2005. Google ScholarDigital Library
- J. Goeman, S. van de Geer, F. de Kort, and H. van Houwelingen. A global test for groups of genes:testing association with a clinical outcome, 2004.Google Scholar
- I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46(1):389--422, 2002. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer, 2001.Google Scholar
- T. Jirapech-Umpai and S. Aitken. Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6(1):148, 2005.Google ScholarCross Ref
- M. Kanehisa and S. Goto. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1):27, 2000.Google ScholarCross Ref
- R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the fourteenth international joint conference on artificial intelligence volume 2, pages 1137--1143, 1995. Google ScholarDigital Library
- L. Liang, V. Mandal, Y. Lu, and D. Kumar. Mcm-test: a fuzzy-set-theory-based approach to differential analysis of gene pathways. BMC Bioinformatics 9 (Suppl 6):S16, 2008.Google ScholarCross Ref
- J. Liao and K. Chin. Logistic regression for disease classification using microarray data:model selection in a large pandsmall ncase. Bioinformatics 23(15):1945, 2007. Google ScholarDigital Library
- H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13:51--60, 2002.Google Scholar
- V. Mootha, C. Lindgren, K. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstraale, E. Laurila, et al. PGC-1 α -responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34(3):267--273, 2003.Google ScholarCross Ref
- H. Pang, A. Lin, M. Holford, B. Enerson, B. Lu, M. Lawton, E. Floyd, and H. Zhao. Pathway analysis using random forests classification and regression. Bioinformatics 22(16):2028, 2006. Google ScholarDigital Library
- M. Robnik-šikonja and I. Kononenko. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53(1):23--69, 2003. Google ScholarDigital Library
- Y. Saeys, I. Inza, and P. Larranaga. A review of feature selection techniques in bioinformatics.Bioinformatics 23(19):2507--2517, 2007. Google ScholarDigital Library
- S. Shevade and S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression, 2003.Google Scholar
- F. Tai and W. Pan. Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.Bioinformatics 23(23):3170, 2007. Google ScholarDigital Library
- F. Tai and W. Pan. Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics 23(14):1775, 2007. Google ScholarDigital Library
- Y. Tang, Y. Zhang,and Z. Huang. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis.IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3):365--381, 2007. Google ScholarDigital Library
- J. Tomfohr, J. Lu, and T. Kepler. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6(1):225, 2005.Google ScholarCross Ref
- Y. Wang, I. Tetko, M. Hall, E. Frank, A. Facius, K. Mayer, and H. Mewes. Gene selection from microarray data for cancer classification -- a machine learning approach.Computational Biology and Chemistry 29(1):37--46, 2005. Google ScholarDigital Library
- H. Zhang, J. Ahn, X. Lin, and C. Park. Gene selection using support vector machines with non-convex penalty. Bioinformatics 22(1):88--95, 2006. Google ScholarDigital Library
Index Terms
- Biological pathways as features for microarray data classification
Recommendations
Integrating Biological Information for Feature Selection in Microarray Data Classification
ICCEA '10: Proceedings of the 2010 Second International Conference on Computer Engineering and Applications - Volume 02Due to the high dimensionality of microarray data, feature selection is an indispensable task in classification to identify a smaller subset of relevant genes. However, feature selection techniques that consider solely on gene expression values might ...
Gene pathways and subnetworks distinguish between major glioma subtypes and elucidate potential underlying biology
Molecular diagnostic tools are increasingly being used in an attempt to classify primary human brain tumors more accurately. While methods that are based on the analysis of individual gene expression prove to be useful for diagnostic purposes, they are ...
Gene selection from microarray data for cancer classification-a machine learning approach
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of ...
Comments