Summary
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sidney, J, Grey, HM, Kubo, RT, and Sette, A, Practical, biochemical and evolutionary implications of the discovery of HLA class I supermotifs. Immunol Today, 1996. 17(6): 261–6.
del Guercio, MF, Sidney, J, Hermanson, G, Perez, C, Grey, HM, Kubo, RT, and Sette, A, Binding of a peptide antigen to multiple HLA alleles allows definition of an A2-like supertype. J Immunol, 1995. 154(2): 685–93.
Sidney, J, Grey, HM, Southwood, S, Celis, E, Wentworth, PA, del Guercio, MF, Kubo, RT, Chesnut, RW, and Sette, A, Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum Immunol, 1996. 45(2): 79–93.
Sidney, J, Southwood, S, Pasquetto, V, and Sette, A, Simultaneous prediction of binding capacity for multiple molecules of the HLA B44 supertype. J Immunol, 2003. 171(11): 5964–74.
Sette, A and Sidney, J, Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics, 1999. 50(3–4): 201–12.
Cruciani, G and Watson, KA, Comparative molecular field analysis using GRID force-field and GOLPE variable selection methods in a study of inhibitors of glycogen phosphorylase b. J Med Chem, 1994. 37(16): 2589–601.
van der Voet, H and Franke, JP, A discussion of principal component analysis. J Anal Toxicol, 1985. 9(4): 185–8.
Inoue, M and Kajiya, F, [Multivariate analysis in computer diagnosis. 3. Principal component analysis]. Iyodenshi To Seitai Kogaku, 1976. 14(1): 52–7.
Doytchinova, IA, Guan, P, and Flower, DR, Identifying human MHC supertypes using bioinformatic methods. J Immunol, 2004. 172(7): 4314–23.
Pate, ME, Turner, MK, Thornhill, NF, and Titchener-Hooker, NJ, Principal component analysis of nonlinear chromatography. Biotechnol Prog, 2004. 20(1): 215–22.
Kastenholz, MA, Pastor, M, Cruciani, G, Haaksma, EE, and Fox, T, GRID/CPCA: a new computational tool to design selective ligands. J Med Chem, 2000. 43(16): 3033–44.
Myshkin, E and Wang, B, Chemometrical classification of ephrin ligands and Eph kinases using GRID/CPCA approach. J Chem Inf Comput Sci, 2003. 43(3): 1004–10.
Terp, GE, Cruciani, G, Christensen, IT, and Jorgensen, FS, Structural differences of matrix metalloproteinases with potential implications for inhibitor selectivity examined by the GRID/CPCA approach. J Med Chem, 2002. 45(13): 2675–84.
Wold, S, Hellberg, S, Lundstedt, T, Sjostrom, M, and Wold, H, Proc. Symp. on PLS Model Building: Theory and Application. 1987, Germany: Frankfurt am Main.
Doytchinova, IA and Flower, DR, Toward the quantitative prediction of T-cell epitopes: coMFA and coMSIA studies of peptides with affinity for the class I MHC molecule HLA-A * 0201. J Med Chem, 2001. 44(22): 3572–81.
Johnson, SC, Hierarchical clustering schemes. Psychometrika, 1967. 32(3): 241–54.
Guess, MJ and Wilson, SB, Introduction to hierarchical clustering. J Clin Neurophysiol, 2002. 19(2): 144–51.
Glazko, GV and Mushegian, AR, Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns. Genome Biol, 2004. 5(5): R32.
Levenstien, MA, Yang, Y, and Ott, J,Statistical significance for hierarchical clustering in genetic association and microarray expression studies. BMC Bioinformatics, 2003. 4(1): 62.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this protocol
Cite this protocol
Guan, P., Doytchinova, I.A., Flower, D.R. (2007). The Classification of HLA Supertypes by GRID/CPCA and Hierarchical Clustering Methods. In: Flower, D.R. (eds) Immunoinformatics. Methods in Molecular Biology™, vol 409. Humana Press. https://doi.org/10.1007/978-1-60327-118-9_9
Download citation
DOI: https://doi.org/10.1007/978-1-60327-118-9_9
Publisher Name: Humana Press
Print ISBN: 978-1-58829-699-3
Online ISBN: 978-1-60327-118-9
eBook Packages: Springer Protocols