ABSTRACT
One of the main challenges in the classification of microarray gene expression data is the small sample size compared with the large number of genes, so feature selection is an essential step to remove genes not relevant to class labels. Most feature selection methods are solely based on expression values to determine discriminative values of genes and remove redundancy. However, due to the characteristics of microarray technology, some values may not be accurately measured. This may reduce the effectiveness of these models. To cope with this problem, in this paper, we integrate Gene Ontology (GO) annotations into gene selection. The novelty of our work is to evaluate genes based on not only their individual discriminative powers but also the powers of GO terms that annotate them. This strategy implicitly verifies the accuracies of the measurements and reduces redundancy. Experimental results in four public datasets demonstrate the effectiveness of the proposed method.
- U. Alon, N. Barkai, and D. A. Notterman. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In Proc. Natl Acad. Sci, 1999.Google ScholarCross Ref
- M. Diehn and G. Sherlock. Source: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Research, 31:219--223, 2003.Google ScholarCross Ref
- U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993.Google Scholar
- Gordon GJ and Jensen RV. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62:4963--4967, 2002.Google Scholar
- T. R. Golub and et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.Google Scholar
- B. Hanczar, M. Courtine, A. Benis, and C. Hennegar. Improving classification of microarray data using prototype-based feature selection. In ACM SIGKDD Explorations Newsletter, 2003. Google ScholarDigital Library
- Ron Kohavi and George H. John. Wrapper for feature subset selection. Artificial Intelligence, 97(1-2):273--274, 1997. Google ScholarDigital Library
- Mark Schena, Dari Shalon, Ronald W. Davis, and Patrick O. Brown. Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science, 270:467--470, 1995.Google Scholar
- J. Tuikkala, L. Elo, OS. Nevalainen, and T. Aittokallio. Improving missing value estimation in microarray data with gene ontology. Bioinformatics, 22:566--572, 2006. Google ScholarDigital Library
- van't Veer LJ and Dai H. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530--536, 2002.Google Scholar
- H. Wang and F. Azuaje. Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In Proc. of IEEE CIBCB 2004, 2004.Google ScholarCross Ref
- Yuhang Wang, Fillia S. Makedon, and James C. Ford. Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 21:1530--1537, 2005. Google ScholarDigital Library
- Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2003. Google ScholarDigital Library
- E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, pages 601--608, 2001. Google ScholarDigital Library
- Xian Xu and Aidong Zhang. Selecting informative genes from microarray dataset by incorporating gene ontology. In Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, 2005. Google ScholarDigital Library
Index Terms
- Integrating gene ontology into discriminative powers of genes for feature selection in microarray data
Recommendations
Using Gene Ontology annotations in exploratory microarray clustering to understand cancer etiology
Gene expression profiling provides insight into the functions of genes at a molecular level. Clustering of gene expression profiles can facilitate the identification of the underlying driving biological program causing genes' co-expression. Standard ...
Integrating Biological Information for Feature Selection in Microarray Data Classification
ICCEA '10: Proceedings of the 2010 Second International Conference on Computer Engineering and Applications - Volume 02Due to the high dimensionality of microarray data, feature selection is an indispensable task in classification to identify a smaller subset of relevant genes. However, feature selection techniques that consider solely on gene expression values might ...
Integrated Visualization Tool for Differentially Expressed Genes and Gene Ontology Analysis
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsMicroarray is a common technique used to identify differentially expressed genes for a target concept. The output is demonstrated in a heatmap, and biologists analyze related terms of gene ontology to identify characteristics of differentially expressed ...
Comments