Skip to main content

Detecting Stable Clusters Using Principal Component Analysis

  • Protocol
Book cover Functional Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 224))

Abstractt

Clustering is one of the most commonly used tools in the analysis of gene expression data (1,2). The usage in grouping genes is based on the premise that coexpression is a result of coregulation. It is often used as a preliminary step in extracting gene networks and inference of gene function (3,4). Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3,5,6), including sensitivity to drugs (7), and can also detect artifacts of experimental conditions (8). Clustering and its applications in biology are presented in greater detail in Chapter 13 (see also ref. 9). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.

    Article  PubMed  CAS  Google Scholar 

  2. Quackenbush, J. (2001) Computational anaysis of microarray data. Nat. Rev. Genet. 2, 418–427.

    Article  PubMed  CAS  Google Scholar 

  3. Ross, D., Scherf, U., Eisen, M., et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227–235.

    Article  PubMed  CAS  Google Scholar 

  4. D’haeseleer, P., Liang, S., and Somogyi, R. (2000) Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16, 707–726.

    Article  CAS  Google Scholar 

  5. Alon, U., Barkai, N., Notterman, D., Gish, G., Ybarra, S., Mack, D., and Levine, A. J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750.

    Article  PubMed  CAS  Google Scholar 

  6. Alizadeh, A. A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511.

    Article  PubMed  CAS  Google Scholar 

  7. Scherf, U., Ross, D. T., Waltham, M., et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244.

    Article  PubMed  CAS  Google Scholar 

  8. Getz, G., Levine, E., and Domany, E. (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 94, 12,079–12084.

    Article  Google Scholar 

  9. Shamir, R. and Sharan, R. (2001) Algorithmic approaches to clustering gene expression data, in Current Topics in Computational Biology (Jiang, T., Smith, T., Xu, Y., and Zhang, M., eds.), MIT Press, Cambridge, MA, pp. 269–299.

    Google Scholar 

  10. Milligan, G. (1996) Clustering validation: results and implications for applied analysis, in Clustering and Classification (Arabie, P., Hubert, L., and Soete, G. D., eds.), World Scientific, River Edge, NJ, pp. 341–374.

    Google Scholar 

  11. Tibshirani, R., Walther, G., and Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc. 63, p411–423.

    Article  Google Scholar 

  12. Ben-Hur, A., Elisseeff, A., and Guyon, I. (2002) A stability based method for discovering structure in clustered data, in Pacific Symposium on Biocomputing (Altman, R., Dunker, A., Hunter, L., Lauderdale, K., and Klein, T., eds.), World Scientific, River Edge, NJ, pp. 6–17.

    Google Scholar 

  13. Levine, E. and Domany, E. (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comp. 13, 2573–2593.

    Article  CAS  Google Scholar 

  14. Fridlyand, J. (2001) Resampling methods for variable selection and classification: applications to genomics, PhD thesis, University of California at Berkeley.

    Google Scholar 

  15. Fowlkes, E. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–584.

    Article  Google Scholar 

  16. Bittner, M., Meltzer, P., Chen, Y., et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536–540.

    Article  PubMed  CAS  Google Scholar 

  17. Yeung, K., Haynor, D., and Ruzzo, W. (2001) Validating clustering for gene expression data. Bioinformatics 17, 309–318.

    Article  PubMed  CAS  Google Scholar 

  18. Milligan, G. (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325–342.

    Article  Google Scholar 

  19. Yeung, K. and Ruzzo, W. (2001) An empirical study of principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774.

    Article  PubMed  CAS  Google Scholar 

  20. Jackson, J. (1991) A User’s Guide to Principal Components, John Wiley & Sons, New York, NY.

    Book  Google Scholar 

  21. Jolliffe, I. (1986) Principal Component Analysis, Springer-Verlag, New York, NY.

    Google Scholar 

  22. Jolliffe, I. (1972) Discarding variables in principal component analysis I: artificial data. Appl. Stat. 21, 160–173.

    Article  Google Scholar 

  23. Jolliffe, I. (1972) Discarding variables in principal component analysis II: real data. Appl. Stat. 21, 160–173.

    Article  Google Scholar 

  24. Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21.

    Article  Google Scholar 

  25. Alter, O., Brown, P., and Botstein, D. (2000) Singular value decomposition for genomewide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106.

    Article  PubMed  CAS  Google Scholar 

  26. Raychaudhuri, S., Stuart, J., and Altman, R. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symp. Biocomp. 5, 452–463.

    Google Scholar 

  27. Hilsenbeck, S., Friedrichs, W., Schiff, R., O’Connell, P., Hansen, R., Osborne, C., and Fuqua, S. (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J. Natl. Cancer Inst. 91, 453–459.

    Article  PubMed  CAS  Google Scholar 

  28. Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., and Somogyi, R. (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339.

    Article  PubMed  CAS  Google Scholar 

  29. Milligan, G. and Cooper, M. (1988) A study of variable standardization. J. Classif. 5, 181–204.

    Article  Google Scholar 

  30. Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  31. Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data, Wiley Interscience, John Wiley & Sons, New York, NY.

    Book  Google Scholar 

  32. Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Ares, M., and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267. http://www.cse.ucsc.edu/research/compbio/genex.

    Article  PubMed  CAS  Google Scholar 

  33. Ben-Hur, A., Horn, D., Siegelmann, H., and Vapnik, V. (2001) Support vector clustering. J. Machine Learn. Res. 2, 125–137.

    Article  Google Scholar 

  34. Anderberg, M. (1983) Cluster Analysis for Applications, Academic, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Humana Press Inc.

About this protocol

Cite this protocol

Ben-Hur, A., Guyon, I. (2003). Detecting Stable Clusters Using Principal Component Analysis. In: Brownstein, M.J., Khodursky, A.B. (eds) Functional Genomics. Methods in Molecular Biology, vol 224. Humana Press. https://doi.org/10.1385/1-59259-364-X:159

Download citation

  • DOI: https://doi.org/10.1385/1-59259-364-X:159

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-291-9

  • Online ISBN: 978-1-59259-364-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics