Journal of Molecular Biology
Volume 406, Issue 2, 18 February 2011, Pages 228-256
Journal home page for Journal of Molecular Biology

A New Clustering of Antibody CDR Loop Conformations

https://doi.org/10.1016/j.jmb.2010.10.030Get rights and content

Abstract

Previous analyses of the complementarity-determining regions (CDRs) of antibodies have focused on a small number of “canonical” conformations for each loop. This is primarily the result of the work of Chothia and coworkers, most recently in 1997. Because of the widespread utility of antibodies, we have revisited the clustering of conformations of the six CDR loops with the much larger amount of structural information currently available. In this work, we were careful to use a high-quality data set by eliminating low-resolution structures and CDRs with high B-factors or high conformational energies. We used a distance function based on directional statistics and an effective clustering algorithm with affinity propagation. With this data set of over 300 nonredundant antibody structures, we were able to cover 28 CDR–length combinations (e.g., L1 length 11, or “L1–11” in our CDR-length nomenclature) for L1, L2, L3, H1, and H2. The Chothia analysis covered only 20 CDR-lengths. Only four of these had more than one conformational cluster, of which two could easily be distinguished by gene source (mouse/human; κ/λ) and one could easily be distinguished purely by the presence and the positions of Pro residues (L3–9). Thus, using the Chothia analysis does not require the complicated set of “structure-determining residues” that is often assumed. Of our 28 CDR-lengths, 15 have multiple conformational clusters, including 10 for which the Chothia analysis had only one canonical class. We have a total of 72 clusters for non-H3 CDRs; approximately 85% of the non-H3 sequences can be assigned to a conformational cluster based on gene source and/or sequence. We found that earlier predictions of “bulged” versus “nonbulged” conformations based on the presence or the absence of anchor residues Arg/Lys94 and Asp101 of H3 have not held up, since all four combinations lead to a majority of conformations that are bulged. Thus, the earlier analyses have been significantly enhanced by the increased data. We believe that the new classification will lead to improved methods for antibody structure prediction and design.

Graphical Abstract

Research Highlights

► Clustering of CDR conformations from over 300 antibody structures. ► Modern clustering method using affinity propagation in dihedral angle space. ► Most of Chothia's 25 canonical conformations confirmed. ► A total of 72 clusters identified for use in antibody structure prediction and design.

Introduction

Prediction of the three-dimensional structure of antibodies is an important step in improving their affinity, stability, and suitability as therapeutics. Given the conserved structure of the frameworks of the heavy-chain variable (VH) domain and the light-chain variable (VL) domain, much of the attention in structural bioinformatics has focused on complementarity-determining regions (CDRs) involved in binding antigens. Studies by Chothia, Lesk, Thornton, and others in the 1980s and 1990s centered around the idea of identifying a small number of “canonical structures” for six CDR loops [H1, H2, and H3 of the VH domain; L1, L2, and L3 of the VL domain] of various lengths.1, 2, 3, 4 The central hypothesis, first stated in 1987,1 was that “most of the hypervariable regions in immunoglobulins have one of a small discrete set of main-chain conformations that we call ‘canonical structures,’” and that a small number of key residues could be used to predict to which conformational class a new CDR sequence might belong. In further studies, Al-Lazikani et al.,2 Martin and Thornton,3 Oliva et al.,5 Wilmot and Thornton,6 Shirai et al.,7 and Kuroda et al.8 defined canonical structures based on loop length and, in some cases, different conformations for certain loop lengths. Residues at some positions—in particular glycine, proline, aromatic residues, and hydrogen-bond donors and acceptors—were proposed to be responsible for differences in conformation. In their 1997 study, Chothia and coworkers found a total of 25 canonical classes due to the larger number of structures available.2

Chothia et al. used a manual clustering of antibody loops and sequences to define their canonical classes. Martin and Thornton in 1996 used a quantitative clustering approach for an automated classification scheme.3 They performed a cluster analysis in internal coordinate space, followed by a postcluster merging of groups of structures in Cartesian coordinate space [using root-mean-square deviation (RMSD)] to classify the observed CDRs. In some instances, they observed that although a loop might be closer in sequence to one of the Chothia canonical classes, it structurally belonged to another. They note this as a limitation to the more sequence-based analyses of previous studies.

There have been a number of studies that focused specifically on the structural motifs found in the structurally diverse heavy-chain H3 CDR.3, 9, 10, 11, 12, 13 Morea et al. divided the H3 hypervariable region into a “torso” region and the “head” of the loop.4 They found that the torso typically takes on one of two conformations, either bulged or extended β-sheet, and the possible conformations of the head region are then limited by the structure of the torso residues. Oliva et al. also divided H3 loops into groups based on structure.5 They defined loop conformations using a geometric alphabet, as described by Wilmot and Thornton.6 Shirai et al. identified, through inspection, a series of sequence–structure relationships that they then transformed into a set of rules to classify H3 structures.7 In particular, they believed that the presence or the absence of salt bridges in the ‘torso’ region, as defined by Morea et al., leads to either bulged or extended conformations in that region.4 Kuroda et al. later revised their list of H3 sequence–structure rules with the availability of more H3 structures.8

For non-H3 loops, the most recent comprehensive analyses of their conformations were performed in 1996–1998. With the large increase in the number of available antibody structures, we decided to revisit the analysis of the conformations of antibody CDRs to see whether the canonical classes based on 17 structures2 or fewer than 60 structures3 have held up and whether new ones may be identified. In this article, we update the classification of all six CDR regions based on the current Protein Data Bank (PDB). We filtered out low-resolution structures, loops with high B-factors or high conformational energies, and redundant sequences. A total of 337 unique heavy chains and 311 unique light chains were used to construct a structural database of antibody loops. Unlike Chothia's analysis, we found it most intuitive to group CDRs into CDR type (L1, L2, etc.) and loop length. We refer to these as “CDR–length combinations” or simply “CDR-lengths” for short. For instance, a common loop length for CDR L1 is 11, and we designate this as “L1–11.” We then applied clustering to the conformations of all loops of a particular CDR–length combination using an affinity propagation clustering method9 with a dihedral-angle distance function. We found that most of the canonical conformations found by Chothia et al. occur in many of the 300+ antibody structures now available. We have identified a total of 72 clusters of conformations, most of which are observed in two or more antibody structures. We provide a detailed comparison of our results to previous antibody loop classifications based on smaller data sets.

Section snippets

Data set

As described in Materials and Methods, we used manually curated multiple-sequence alignments to construct hidden Markov models (HMMs) of the VH and VL domains. We used these models to search the entire set of PDB sequences to identify all PDB chains with antibody variable domains. There were a total of 923 antibody PDB entries that contain at least one hypervariable loop with all backbone atom positions defined. Since the asymmetric units of many PDB entries contain more than one copy of the

Discussion

In this work, we have revisited the problem of clustering the structures of the six CDR loops of antibodies. A thorough analysis such as this has not been accomplished since the work of Chothia et al. and Martin and Thornton in 1996–1997. The number of antibody structures is at least 5-fold larger now than it was then (and 15-fold larger than the set used by Chothia). Because of this, we have been able to remove questionable structures (those of low resolution or high-energy backbone

Hidden Markov models of the V domains of heavy and light chains

PSI-BLAST32 was used to search a database of all sequences in the PDB (the nonredundant sequence file pdbaanr available on our PISCES Web site),33, 34 using the variable domain regions of the antibody structure in PDB entry 1Q9R.14 Only sequences above a 35% identity and E-values better than 1.0 × 10 20 were kept, such that only antibody domains remained (e.g., excluding T-cell receptors and other Ig sequences). The resulting heavy-chain and light-chain sequences were culled at 90% identity using

Acknowledgements

This work was supported by National Institutes of Health grants P20 GM76222 and R01 GM84453 (R.L.D., principal investigator) and National Institutes of Health training grant T32 CA009035.

References (40)

  • TramontanoA. et al.

    Framework residue 71 is a major determinant of the position and conformation of the second hypervariable region in the VH domains of immunoglobulins

    J. Mol. Biol.

    (1990)
  • ShiraiH. et al.

    Structural classification of CDR-H3 in antibodies

    FEBS Lett.

    (1996)
  • MoreaV. et al.

    Antibody structure, prediction and redesign

    Biophys. Chem.

    (1997)
  • MoreaV. et al.

    Antibody modeling: implications for engineering and design

    Methods

    (2000)
  • WilmotC.M. et al.

    Beta-turns and their distortions: a proposed new nomenclature

    Protein Eng.

    (1990)
  • KurodaD. et al.

    Structural classification of CDR-H3 revisited: a lesson in antibody modeling

    Proteins Struct. Funct. Bioinf.

    (2008)
  • FreyB.J. et al.

    Clustering by passing messages between data points

    Science

    (2007)
  • JamesL.C. et al.

    Antibody multispecificity mediated by conformational diversity

    Science

    (2003)
  • NguyenH.P. et al.

    Germline antibody recognition of distinct carbohydrate epitopes

    Nat. Struct. Biol.

    (2003)
  • TingD. et al.

    Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model

    PLoS Comput. Biol.

    (2010)
  • Cited by (298)

    View all citing articles on Scopus
    View full text