Skip to main content

Kernel Independent Component Analysis for Gene Expression Data Clustering

  • Conference paper
Independent Component Analysis and Blind Signal Separation (ICA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3889))

Abstract

We present the use of KICA to perform clustering of gene expression data. Comparison experiments between KICA and two other methods, PCA and ICA, are performed. Three clustering algorithms, including weighted graph partitioning, k-means and agglomerative hierarchical clustering, and two similarity measures, including Euclidean and Pearson correlation, are also evaluated. The results indicate that KICA is an efficient feature extraction approach for gene expression data clustering. Our empirical study showed that clustering with the components instead of the original variables does improve cluster quality. In particular, the first few components by KICA capture most of the cluster structure. We also showed that clustering with components has different impact on different algorithms and different similarity metrics. Overall, we would recommend KICA before clustering gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, C., Haiyan, H., Seth, B., Jun, L., Connie, C., Wing, W.: Clustering Analysis of SAGE Data Using a Poisson Approach. Genome Biology 5, R51 (2004)

    Article  Google Scholar 

  2. Eisen, M.B., et al.: Cluster Analysis and Display of Genome-wide Expression Patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  3. Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999) doi: 10.1016/S0378- 4371(00)00404-0

    Article  Google Scholar 

  4. Strehl, A., Ghosh, J.: Value-based Customer Grouping from Large Retail Data-sets. In: Proc. SPIE Conference on Data Mining and Knowledge Discovery, Orlando, vol. 4057, pp. 33–42. SPIE (April 2000)

    Google Scholar 

  5. Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  6. NCBI SAGE website (2005), http://www.ncbi.nlm.nih.gov/SAGE

  7. Dhillon, I.S., Modha, D.S.: Concept Decomposition for Large Sparse Text Data using Clustering. Machine Learning 42(1), 143–175 (2001)

    Article  MATH  Google Scholar 

  8. Zeng, X.-Y., Chen, Y.-W., Nakao, Z., Lu, H.: A New Texture Feature based on PCA Maps and Its Application to Image Retrieval. IEICE Trans. Inf. and Syst. E86-D(5), 929–936 (2003)

    Google Scholar 

  9. Sander, J., Ng, R.T., et al.: A Methodology for Analyzing SAGE Libraries for Cancer Profiling. ACM Transactions on Information Systems 23(1), 35–60 (2005)

    Article  Google Scholar 

  10. Kocsor, A., Csirik, J.: Fast Independent Component Analysis in Kernel Feature Spaces. In: Pacholski, L., Ružička, P. (eds.) SOFSEM 2001. LNCS, vol. 2234, pp. 271–281. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Hyvarinen, A.: New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit. In: Advances in Neural Information Processing Systems, vol. 10, pp. 273–279. MIT Press, Cambridge (1998)

    Google Scholar 

  12. Diamantaras, K.I., Kung, S.Y.: Principal Component Neural Networks: Theory and Applications. John Wiley & Sons, Inc., Chichester (1996)

    MATH  Google Scholar 

  13. Amari, S.: Natural Gradient for Over- and Under-complete Bases in ICA. Neural Computation 11(8), 1875–1883 (1999)

    Article  MathSciNet  Google Scholar 

  14. Cardoso, J.F., Laheld, B.H.: Equivariant Adaptive Source Separation. IEEE Trans. Signal Processing 44(12), 3017–3030 (1996)

    Article  Google Scholar 

  15. Choi, S., Cichocki, A., Amari, S.: Flexible Independent Component Analysis. Journal of VLSI Signal processing 26, 25–38 (2000)

    Article  MATH  Google Scholar 

  16. El-Hamdouchi, A., Willet, P.: Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval. The Computer Journal 32(3) (1989)

    Google Scholar 

  17. Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484 (1995)

    Article  Google Scholar 

  18. Cancer Genomics Publications Datasets (2005), http://www-genome.wi.mit.edu/cgi-bin/cancer/datasets.cgi

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, X., Xu, A., Bie, R., Guo, P. (2006). Kernel Independent Component Analysis for Gene Expression Data Clustering. In: Rosca, J., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds) Independent Component Analysis and Blind Signal Separation. ICA 2006. Lecture Notes in Computer Science, vol 3889. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11679363_57

Download citation

  • DOI: https://doi.org/10.1007/11679363_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32630-4

  • Online ISBN: 978-3-540-32631-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics