Abstractt
Clustering is one of the most commonly used tools in the analysis of gene expression data (1,2). The usage in grouping genes is based on the premise that coexpression is a result of coregulation. It is often used as a preliminary step in extracting gene networks and inference of gene function (3,4). Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3,5,6), including sensitivity to drugs (7), and can also detect artifacts of experimental conditions (8). Clustering and its applications in biology are presented in greater detail in Chapter 13 (see also ref. 9). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.
Quackenbush, J. (2001) Computational anaysis of microarray data. Nat. Rev. Genet. 2, 418–427.
Ross, D., Scherf, U., Eisen, M., et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227–235.
D’haeseleer, P., Liang, S., and Somogyi, R. (2000) Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16, 707–726.
Alon, U., Barkai, N., Notterman, D., Gish, G., Ybarra, S., Mack, D., and Levine, A. J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750.
Alizadeh, A. A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511.
Scherf, U., Ross, D. T., Waltham, M., et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244.
Getz, G., Levine, E., and Domany, E. (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 94, 12,079–12084.
Shamir, R. and Sharan, R. (2001) Algorithmic approaches to clustering gene expression data, in Current Topics in Computational Biology (Jiang, T., Smith, T., Xu, Y., and Zhang, M., eds.), MIT Press, Cambridge, MA, pp. 269–299.
Milligan, G. (1996) Clustering validation: results and implications for applied analysis, in Clustering and Classification (Arabie, P., Hubert, L., and Soete, G. D., eds.), World Scientific, River Edge, NJ, pp. 341–374.
Tibshirani, R., Walther, G., and Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc. 63, p411–423.
Ben-Hur, A., Elisseeff, A., and Guyon, I. (2002) A stability based method for discovering structure in clustered data, in Pacific Symposium on Biocomputing (Altman, R., Dunker, A., Hunter, L., Lauderdale, K., and Klein, T., eds.), World Scientific, River Edge, NJ, pp. 6–17.
Levine, E. and Domany, E. (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comp. 13, 2573–2593.
Fridlyand, J. (2001) Resampling methods for variable selection and classification: applications to genomics, PhD thesis, University of California at Berkeley.
Fowlkes, E. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–584.
Bittner, M., Meltzer, P., Chen, Y., et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536–540.
Yeung, K., Haynor, D., and Ruzzo, W. (2001) Validating clustering for gene expression data. Bioinformatics 17, 309–318.
Milligan, G. (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325–342.
Yeung, K. and Ruzzo, W. (2001) An empirical study of principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774.
Jackson, J. (1991) A User’s Guide to Principal Components, John Wiley & Sons, New York, NY.
Jolliffe, I. (1986) Principal Component Analysis, Springer-Verlag, New York, NY.
Jolliffe, I. (1972) Discarding variables in principal component analysis I: artificial data. Appl. Stat. 21, 160–173.
Jolliffe, I. (1972) Discarding variables in principal component analysis II: real data. Appl. Stat. 21, 160–173.
Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21.
Alter, O., Brown, P., and Botstein, D. (2000) Singular value decomposition for genomewide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106.
Raychaudhuri, S., Stuart, J., and Altman, R. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symp. Biocomp. 5, 452–463.
Hilsenbeck, S., Friedrichs, W., Schiff, R., O’Connell, P., Hansen, R., Osborne, C., and Fuqua, S. (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J. Natl. Cancer Inst. 91, 453–459.
Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., and Somogyi, R. (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339.
Milligan, G. and Cooper, M. (1988) A study of variable standardization. J. Classif. 5, 181–204.
Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ.
Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data, Wiley Interscience, John Wiley & Sons, New York, NY.
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Ares, M., and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267. http://www.cse.ucsc.edu/research/compbio/genex.
Ben-Hur, A., Horn, D., Siegelmann, H., and Vapnik, V. (2001) Support vector clustering. J. Machine Learn. Res. 2, 125–137.
Anderberg, M. (1983) Cluster Analysis for Applications, Academic, New York.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Humana Press Inc.
About this protocol
Cite this protocol
Ben-Hur, A., Guyon, I. (2003). Detecting Stable Clusters Using Principal Component Analysis. In: Brownstein, M.J., Khodursky, A.B. (eds) Functional Genomics. Methods in Molecular Biology, vol 224. Humana Press. https://doi.org/10.1385/1-59259-364-X:159
Download citation
DOI: https://doi.org/10.1385/1-59259-364-X:159
Publisher Name: Humana Press
Print ISBN: 978-1-58829-291-9
Online ISBN: 978-1-59259-364-4
eBook Packages: Springer Protocols