Semi-Supervised Clustering for the Identification of Different Cancer Types Using the Gene Expression Profiles

Semi-Supervised Clustering for the Identification of Different Cancer Types Using the Gene Expression Profiles

Manuel Martín-Merino
ISBN13: 9781466618039|ISBN10: 1466618035|EISBN13: 9781466618046
DOI: 10.4018/978-1-4666-1803-9.ch004
Cite Chapter Cite Chapter

MLA

Martín-Merino, Manuel. "Semi-Supervised Clustering for the Identification of Different Cancer Types Using the Gene Expression Profiles." Medical Applications of Intelligent Data Analysis: Research Advancements, edited by Rafael Magdalena-Benedito, et al., IGI Global, 2012, pp. 50-66. https://doi.org/10.4018/978-1-4666-1803-9.ch004

APA

Martín-Merino, M. (2012). Semi-Supervised Clustering for the Identification of Different Cancer Types Using the Gene Expression Profiles. In R. Magdalena-Benedito, E. Soria-Olivas, J. Martínez, J. Gómez-Sanchis, & A. Serrano-López (Eds.), Medical Applications of Intelligent Data Analysis: Research Advancements (pp. 50-66). IGI Global. https://doi.org/10.4018/978-1-4666-1803-9.ch004

Chicago

Martín-Merino, Manuel. "Semi-Supervised Clustering for the Identification of Different Cancer Types Using the Gene Expression Profiles." In Medical Applications of Intelligent Data Analysis: Research Advancements, edited by Rafael Magdalena-Benedito, et al., 50-66. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-4666-1803-9.ch004

Export Reference

Mendeley
Favorite

Abstract

DNA Microarrays allow for monitoring the expression level of thousands of genes simultaneously across a collection of related samples. Supervised learning algorithms such as -NN or SVM (Support Vector Machines) have been applied to the classification of cancer samples with encouraging results. However, the classification algorithms are not able to discover new subtypes of diseases considering the gene expression profiles. In this chapter, the author reviews several supervised clustering algorithms suitable to discover new subtypes of cancer. Next, he introduces a semi-supervised clustering algorithm that learns a linear combination of dissimilarities from the a priory knowledge provided by human experts. A priori knowledge is formulated in the form of equivalence constraints. The minimization of the error function is based on a quadratic optimization algorithm. A norm regularizer is included that penalizes the complexity of the family of distances and avoids overfitting. The method proposed has been applied to several benchmark data sets and to human complex cancer problems using the gene expression profiles. The experimental results suggest that considering a linear combination of heterogeneous dissimilarities helps to improve both classification and clustering algorithms based on a single similarity.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.