Detecting Stable Clusters Using Principal Component Analysis

Ben-Hur, Asa; Guyon, Isabelle

doi:10.1385/1-59259-364-X:159

Asa Ben-Hur³ &
Isabelle Guyon⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 224))

1575 Accesses
60 Citations

Abstractt

Clustering is one of the most commonly used tools in the analysis of gene expression data (1,2). The usage in grouping genes is based on the premise that coexpression is a result of coregulation. It is often used as a preliminary step in extracting gene networks and inference of gene function (3,4). Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3,5,6), including sensitivity to drugs (7), and can also detect artifacts of experimental conditions (8). Clustering and its applications in biology are presented in greater detail in Chapter 13 (see also ref. 9). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.
Article PubMed CAS Google Scholar
Quackenbush, J. (2001) Computational anaysis of microarray data. Nat. Rev. Genet. 2, 418–427.
Article PubMed CAS Google Scholar
Ross, D., Scherf, U., Eisen, M., et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227–235.
Article PubMed CAS Google Scholar
D’haeseleer, P., Liang, S., and Somogyi, R. (2000) Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16, 707–726.
Article CAS Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, G., Ybarra, S., Mack, D., and Levine, A. J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750.
Article PubMed CAS Google Scholar
Alizadeh, A. A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511.
Article PubMed CAS Google Scholar
Scherf, U., Ross, D. T., Waltham, M., et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244.
Article PubMed CAS Google Scholar
Getz, G., Levine, E., and Domany, E. (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 94, 12,079–12084.
Article Google Scholar
Shamir, R. and Sharan, R. (2001) Algorithmic approaches to clustering gene expression data, in Current Topics in Computational Biology (Jiang, T., Smith, T., Xu, Y., and Zhang, M., eds.), MIT Press, Cambridge, MA, pp. 269–299.
Google Scholar
Milligan, G. (1996) Clustering validation: results and implications for applied analysis, in Clustering and Classification (Arabie, P., Hubert, L., and Soete, G. D., eds.), World Scientific, River Edge, NJ, pp. 341–374.
Google Scholar
Tibshirani, R., Walther, G., and Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc. 63, p411–423.
Article Google Scholar
Ben-Hur, A., Elisseeff, A., and Guyon, I. (2002) A stability based method for discovering structure in clustered data, in Pacific Symposium on Biocomputing (Altman, R., Dunker, A., Hunter, L., Lauderdale, K., and Klein, T., eds.), World Scientific, River Edge, NJ, pp. 6–17.
Google Scholar
Levine, E. and Domany, E. (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comp. 13, 2573–2593.
Article CAS Google Scholar
Fridlyand, J. (2001) Resampling methods for variable selection and classification: applications to genomics, PhD thesis, University of California at Berkeley.
Google Scholar
Fowlkes, E. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–584.
Article Google Scholar
Bittner, M., Meltzer, P., Chen, Y., et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536–540.
Article PubMed CAS Google Scholar
Yeung, K., Haynor, D., and Ruzzo, W. (2001) Validating clustering for gene expression data. Bioinformatics 17, 309–318.
Article PubMed CAS Google Scholar
Milligan, G. (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325–342.
Article Google Scholar
Yeung, K. and Ruzzo, W. (2001) An empirical study of principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774.
Article PubMed CAS Google Scholar
Jackson, J. (1991) A User’s Guide to Principal Components, John Wiley & Sons, New York, NY.
Book Google Scholar
Jolliffe, I. (1986) Principal Component Analysis, Springer-Verlag, New York, NY.
Google Scholar
Jolliffe, I. (1972) Discarding variables in principal component analysis I: artificial data. Appl. Stat. 21, 160–173.
Article Google Scholar
Jolliffe, I. (1972) Discarding variables in principal component analysis II: real data. Appl. Stat. 21, 160–173.
Article Google Scholar
Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21.
Article Google Scholar
Alter, O., Brown, P., and Botstein, D. (2000) Singular value decomposition for genomewide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106.
Article PubMed CAS Google Scholar
Raychaudhuri, S., Stuart, J., and Altman, R. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symp. Biocomp. 5, 452–463.
Google Scholar
Hilsenbeck, S., Friedrichs, W., Schiff, R., O’Connell, P., Hansen, R., Osborne, C., and Fuqua, S. (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J. Natl. Cancer Inst. 91, 453–459.
Article PubMed CAS Google Scholar
Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., and Somogyi, R. (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339.
Article PubMed CAS Google Scholar
Milligan, G. and Cooper, M. (1988) A study of variable standardization. J. Classif. 5, 181–204.
Article Google Scholar
Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data, Wiley Interscience, John Wiley & Sons, New York, NY.
Book Google Scholar
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Ares, M., and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267. http://www.cse.ucsc.edu/research/compbio/genex.
Article PubMed CAS Google Scholar
Ben-Hur, A., Horn, D., Siegelmann, H., and Vapnik, V. (2001) Support vector clustering. J. Machine Learn. Res. 2, 125–137.
Article Google Scholar
Anderberg, M. (1983) Cluster Analysis for Applications, Academic, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biochemistry, Stanford University School of Medicine, Palo Alto, CA
Asa Ben-Hur
Clopinet, Berkeley, CA
Isabelle Guyon

Authors

Asa Ben-Hur
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Guyon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratory of Genetics, NIMH/NHGRI, National Institutes of Health, Rockville, MD
Michael J. Brownstein
Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, St. Paul, MN
Arkady B. Khodursky

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Ben-Hur, A., Guyon, I. (2003). Detecting Stable Clusters Using Principal Component Analysis. In: Brownstein, M.J., Khodursky, A.B. (eds) Functional Genomics. Methods in Molecular Biology, vol 224. Humana Press. https://doi.org/10.1385/1-59259-364-X:159

Download citation

DOI: https://doi.org/10.1385/1-59259-364-X:159
Publisher Name: Humana Press
Print ISBN: 978-1-58829-291-9
Online ISBN: 978-1-59259-364-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics