Kernel Independent Component Analysis for Gene Expression Data Clustering

Jin, Xin; Xu, Anbang; Bie, Rongfang; Guo, Ping

doi:10.1007/11679363_57

Xin Jin²⁰,
Anbang Xu²⁰,
Rongfang Bie²⁰ &
…
Ping Guo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3889))

Included in the following conference series:

International Conference on Independent Component Analysis and Signal Separation

2974 Accesses
3 Citations

Abstract

We present the use of KICA to perform clustering of gene expression data. Comparison experiments between KICA and two other methods, PCA and ICA, are performed. Three clustering algorithms, including weighted graph partitioning, k-means and agglomerative hierarchical clustering, and two similarity measures, including Euclidean and Pearson correlation, are also evaluated. The results indicate that KICA is an efficient feature extraction approach for gene expression data clustering. Our empirical study showed that clustering with the components instead of the original variables does improve cluster quality. In particular, the first few components by KICA capture most of the cluster structure. We also showed that clustering with components has different impact on different algorithms and different similarity metrics. Overall, we would recommend KICA before clustering gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Li, C., Haiyan, H., Seth, B., Jun, L., Connie, C., Wing, W.: Clustering Analysis of SAGE Data Using a Poisson Approach. Genome Biology 5, R51 (2004)
Article Google Scholar
Eisen, M.B., et al.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
Article Google Scholar
Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999) doi: 10.1016/S0378- 4371(00)00404-0
Article Google Scholar
Strehl, A., Ghosh, J.: Value-based Customer Grouping from Large Retail Data-sets. In: Proc. SPIE Conference on Data Mining and Knowledge Discovery, Orlando, vol. 4057, pp. 33–42. SPIE (April 2000)
Google Scholar
Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet Google Scholar
NCBI SAGE website (2005), http://www.ncbi.nlm.nih.gov/SAGE
Dhillon, I.S., Modha, D.S.: Concept Decomposition for Large Sparse Text Data using Clustering. Machine Learning 42(1), 143–175 (2001)
Article MATH Google Scholar
Zeng, X.-Y., Chen, Y.-W., Nakao, Z., Lu, H.: A New Texture Feature based on PCA Maps and Its Application to Image Retrieval. IEICE Trans. Inf. and Syst. E86-D(5), 929–936 (2003)
Google Scholar
Sander, J., Ng, R.T., et al.: A Methodology for Analyzing SAGE Libraries for Cancer Profiling. ACM Transactions on Information Systems 23(1), 35–60 (2005)
Article Google Scholar
Kocsor, A., Csirik, J.: Fast Independent Component Analysis in Kernel Feature Spaces. In: Pacholski, L., Ružička, P. (eds.) SOFSEM 2001. LNCS, vol. 2234, pp. 271–281. Springer, Heidelberg (2001)
Chapter Google Scholar
Hyvarinen, A.: New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit. In: Advances in Neural Information Processing Systems, vol. 10, pp. 273–279. MIT Press, Cambridge (1998)
Google Scholar
Diamantaras, K.I., Kung, S.Y.: Principal Component Neural Networks: Theory and Applications. John Wiley & Sons, Inc., Chichester (1996)
MATH Google Scholar
Amari, S.: Natural Gradient for Over- and Under-complete Bases in ICA. Neural Computation 11(8), 1875–1883 (1999)
Article MathSciNet Google Scholar
Cardoso, J.F., Laheld, B.H.: Equivariant Adaptive Source Separation. IEEE Trans. Signal Processing 44(12), 3017–3030 (1996)
Article Google Scholar
Choi, S., Cichocki, A., Amari, S.: Flexible Independent Component Analysis. Journal of VLSI Signal processing 26, 25–38 (2000)
Article MATH Google Scholar
El-Hamdouchi, A., Willet, P.: Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval. The Computer Journal 32(3) (1989)
Google Scholar
Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484 (1995)
Article Google Scholar
Cancer Genomics Publications Datasets (2005), http://www-genome.wi.mit.edu/cgi-bin/cancer/datasets.cgi

Download references

Author information

Authors and Affiliations

Department of Computer Science, Beijing Normal University, Beijing, 100875, China
Xin Jin, Anbang Xu, Rongfang Bie & Ping Guo

Authors

Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Anbang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Rongfang Bie
View author publications
You can also search for this author in PubMed Google Scholar
Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Siemens Corporate Research, 755 College Road East, 08540, Princeton, NJ, USA
Justinian Rosca
Department of CSEE, Oregon Health and Science University, Portland, Oregon, USA
Deniz Erdogmus
Dep. of Electrical and Computer Engineering, University of Florida, Gainesville, Florida, USA
José C. Príncipe
McMaster University, 1280 Main Street West, L8S 4K1, Hamilton, Ontario, Canada
Simon Haykin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, X., Xu, A., Bie, R., Guo, P. (2006). Kernel Independent Component Analysis for Gene Expression Data Clustering. In: Rosca, J., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds) Independent Component Analysis and Blind Signal Separation. ICA 2006. Lecture Notes in Computer Science, vol 3889. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11679363_57

Download citation

DOI: https://doi.org/10.1007/11679363_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32630-4
Online ISBN: 978-3-540-32631-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics