The structure of protein-protein recognition sites.

The ability of certain proteins to form specific stable complexes is fundamental to biological existence. The structures of some 15 protease-inhibitor complexes (l-19) and 4 antibody-protein antigen complexes (20-26) have been determined by x-ray crystallography, and we review the information that they provide on the structural basis of proteinprotein recognition, We describe first the general features of the recognition sites. We then discuss whether recognition sites are regions of high mobility, the nature of the conformational changes that occur on the association of proteins, and the implications of these structural results for the kinetics and thermodynamics of association. Analysis of the first few protein-protein complexes to have their atomic structures determined by x-ray crystallography suggested that sites involved in the association of folded proteins would usually have similar structural properties (27, 28). This view is largely confirmed by the many new structures now available.

The ability of certain proteins to form specific stable complexes is fundamental to biological existence. The structures of some 15 protease-inhibitor complexes (l-19) and 4 antibody-protein antigen complexes (20-26) have been determined by x-ray crystallography, and we review the information that they provide on the structural basis of proteinprotein recognition, We describe first the general features of the recognition sites. We then discuss whether recognition sites are regions of high mobility, the nature of the conformational changes that occur on the association of proteins, and the implications of these structural results for the kinetics and thermodynamics of association.
Analysis of the first few protein-protein complexes to have their atomic structures determined by x-ray crystallography suggested that sites involved in the association of folded proteins would usually have similar structural properties (27,28). This view is largely confirmed by the many new structures now available.

Structural Features
Structural features of the protein-protein recognition sites in the complexes for which the data are available are listed in Table I. Contact Residues-In the protease-inhibitor complexes lo-15 residues in the inhibitors form contacts with 17-29 residues in the proteases. This asymmetry arises from the shapes of the surfaces involved. An extended loop in the inhibitors fills the long groove formed by the active site and specificity pocket in the surface of the proteases (1-19); see Fig. la. In the antibody-antigen complexes, the interacting surfaces of both components are formed by several regions of peptide, and they are much flatter with similar numbers of residues on each side (20-26); see Fig. lb and Table I.
Although the shapes of the recognition sites in the complexes vary, the total number of residues involved in the recognition sites in the different complexes are similar: 34 + 7 residues (Table I).
Contact Areas-The surfaces that the individual proteins bury in the recognition sites are between 600 and 1000 A2 in area (Table I). These interface areas form some 20% of the total accessible surface area of the smaller proteins, such as pancreatic trypsin inhibitor (28), and some 5% of the surface of the larger proteins, such as carboxypeptidase (6). The total areas buried in the protease-inhibitor and antibody-antigen recognition sites are similar: 1600 + 350 A"; see Table I.
The Chemical Character of the Interfaces-The chemical character of the accessible surface (29) of an average protein is 55% non-polar, 25% polar, and 20% charged (30). Most of the recognition sites have chemical compositions fairly similar to this, except that the proportion of charged surface is usually somewhat lower. Only the surfaces buried by ol-chymotrypsin, eglin, and the Kazal inhibitors have a significantly higher proportion of non-polar atoms, about 70%. Overall, no particular amino acid composition is found for the protease, inhibitor, and antigen recognition sites. In the antibodies, however, nearly half the contact residues are aromatic: the 71 contact residues in the four complexes include 17 Tyr,9 Trp,3 His,and 2 Phe (20,23,31). This preference for aromatic residues is often found in the recognition sites of antibodies (31, 32).

Hydrogen Bonds and Electrostatic
Interactions-The recognition sites in protease-inhibitor and antigen-antibody complexes involve similar numbers of intermolecular hydrogen bonds: 8-13 and an average of 10 ( Table I). The only exception is the carboxypeptidase complex where there are just six hydrogen bonds. In this complex, however, a carbonyl oxygen in the inhibitor is coordinated to the zinc ion in the active site (6).
Proteases bind the inhibitors in the same manner as substrates. This involves hydrogen bonds between main chain atoms, and two-thirds of the hydrogen bonds across the interface are of this type. In the antigen-antibody complexes the hydrogen bonds mostly involve side chain atoms.
One-quarter of the protease-inhibitor hydrogen bonds and half those in the antigen-antibody complexes involve one charged group. Most of these charged groups are buried in the recognition sites. Half come from Arg residues and a quarter from Lys residues. Hydrogen bonds between two charged groups are rare.
In addition to direct hydrogen bonds, a number of polar atoms interact across the interface through bridging water molecules (4,5,7,12,13,15,16,20). Between 6 and 12 such water molecules are seen in well determined structures (7,13, 15). Usually they are found around the periphery of the contact; occasionally a few are buried within the interface.
Residue'Packing-Visual inspection of the recognition sites shows that cavities are rare. Quantitative calculations on the trypsin inhibitor complexes and the Fab HyHEL-5-lysozyme complex show that the interfaces are close packed: the contact residues occupy volumes that are the same as those they occupy in crystals of amino acids (28).
Steric Strain-In the trypsin-pancreatic trypsin inhibitor complex, the trigonal carbon in the "scissile" peptide of the inhibitor has been described as tetrahedrally distorted (1). This distortion is not seen in some other protease-inhibitor complexes (8). It is present in the very high resolution structure of the subtilisin-eglin complex, but it is small, and its energy cost has been estimated at only 0.5 kcal. mol-' (13).
Comparison with Interfaces between the Subunits of Oligomerit Proteins-Subunit interfaces in oligomeric proteins are usually much larger than those in the complexes discussed here. The smallest interfaces in dimeric proteins of known structure are of the same size as in the complexes, but the largest bury up to 5000 & per monomer (33-35). The larger interfaces not only stabilize the association of the subunits but also the tertiary structures of the individual subunits (33, 36). The interfaces in oligomeric proteins, particularly the large ones, are usually more hydrophobic than those involved in functional recognition. They also involve fewer hydrogen bonds(35).

Structure of Protein-Protein
Recognition Sites Fab is the fragment of the antibody molecule formed by the VL, Vn, Cr., and Cn, domains. Contact residues have atoms within van der Waals radii plus 0.5 A of atoms in residues across the interface. Interface areas are the difference between the solvent accessible surface areas (29)  subtiliiin-eglin-C (11, 12, 15) (a) and part of the antibody HyHEL-5-lysozyme eomolex (24) (bl. n-Helices are reoresented bv red cvlinders. strands of d-sheet bv vellow ribbons. and the linkins oentides are traced in blue. a. eelin-C is shown in thi top bari of the figure and sub&sin in the loueipart. The inhibitor's conta&s*to enzyme are made through-&dues in a loop that packs it& a long groove in subtilisin. This loop makes contacts with residues from seven different regions of the enzyme (11, 12, 15). In making contact via a single loop, eglin-C is unusual, because most of the other inhibitors have a small second region of contact with the proteases. b, lysozyme is shown in the top part of the figure and Vt.-VH dimer of the antibody HyHEL-5 is shown in the lower part. Lysozyme's contacts to the antibody are made through residues in two strands of &sheet, a loop and the end of a helix. They make contacts with six loops (hypervariable regions) in the antibody (24). Although the structures of these two recognition sites are very different, they result in similar amounts of surface becoming buried, 1500 A' in the first complex and 1600 A' in the second, and in similar numbers of intermolecular hydrogen bonds, 12 and 11, respectively. Mobility Mobility, or the lack of it, could be an important feature of protein-protein complexes. Efficient inhibition of proteases has been attributed to the presence of rigid preformed binding loops on the inhibitors (1, 28). As opposed to this, it has been proposed that antigenic sites or epitopes are mobile regions of the protein surface, more easily adaptable to the binding sites of antibodies (37, 38).
A measure of the amplitude of atomic movements about their equilibrium positions is given by the Debye-Waller temperature factors (B factors) derived from crystal structures.
In Table II, we compare B factor averages taken over all main chain atoms in the pancreatic trypsin (PTI),' the ovomucoid (OMJQZ), and the chymotrypsin (CI2) inhibitors (14,39,40) and in lysozyme (41) (2, 8, 9, 11, 14-16, 39, 40). B values are also known for antigen lysozyme (41) the same. In the kallikrein-PTI and the protease-OMTKY3 or C12 complexes, the same set of contact residues has lower B factors than residues not involved in contacts. Thus, the binding loops are neither more rigid nor more flexible than the rest of the polypeptide chain when free, but they do become partly immobilized in the complexes (14). Average main chain B factors are not significantly lower for residues of the three regions of the lysozyme surface recognized by the antibodies than for the rest of the molecule (Table II). Thus the mobile region hypothesis does not apply to these epitopes (20, 24, 26).

Conformational Change
The structures of the constituents of several proteaseinhibitor complexes have been determined at high in their unassociated forms. The comparison of the associated and unassociated structures shows that, although recognition sites are not regions of high mobility, protein-protein association does involve small conformational changes of low energy (4-11, 13-16, 20, 22, 24-26). These changes facilitate close packing and hydrogen bond formation at the interfaces.
For the enzyme in the ol-chymotrypsin-ovomucoid complex the only significant change is the rotation of two side chains (16). The most extensive changes are found in the loop that forms the binding site of chymotrypsin inhibitor 2. This loop extends out from the inhibitor like that in eglin-C (Fig. la). Association with subtilisin produces small changes throughout the main chain (the root mean square change in position 0 of its atoms is 0.56 A) and different conformations for five side chains (14). Binding of the streptomyces subtilisin inhibitor to subtilisin involves a shift of -2 A in a helix (10). Quite different changes are seen on the formation of the trypsinogen-inhibitor complexes. In trypsinogen the region that forms the specificity pocket is partially disordered. Proteolytic activation of the zymogen to trypsin results in this region becoming fully ordered (42). It is also ordered in the trypsinogen-secretory inhibitor and PTI complexes (4, 7). These complexes are much less stable (by a factor of lo6 in &) than those with trypsin, even though the structures of the interfaces are almost identical. The lower stabilities reflect the cost of the changes in conformation induced in the zymogen.
The structure of the unassociated antigen lysozyme is known (41), as is the unassociated form of Fab Dl.3 (21). Of the lysozyme residues that form the interfaces to the three tntibodies, only a few have small differences in position (~2 A) or in side chain conformation (20,24,25). The structure of unassociated Fab Dl.3 shows that the binding of the antigen produces at most only very small changes in tertiary and quaternary structure (21). Thus, antibody-antigen complexes behave like protease-inhibitor complexes.

Binding Kinetics
Kinetic studies have been performed on several of the complexes studied here. Association rate constants k,, have been found to be in the range 105-lo7 M-' s-' (43-48). Thus k,, varies little, and affinity changes appear mostly as variations of the dissociation rate constants koff from 1 s-l for low affinity complexes to a half-life of more than 3 months (10e7 s-') for those with high affinity (43). No kinetic data have been published for the antibodies discussed here; but other monoclonal antibody-protein complexes have association rate constants in the range lo"-lo6 M-' s-' (48).
These association rate constants should be compared with the rate of collision of molecules the size of trypsin, PTI, or lysozyme: about 10' M-' s-l based on diffusion coefficients of about 10m6 cm2 s-l in water. One in 100 of the collisions leads to association in the most efficient complex, and one in lo4 in the least efficient. As interfaces cover only 5-20% of each protein surface, this implies that most collisions at the target surface lead to stable association. It also suggests that initially a fraction of the possible interactions forms a loose complex which then isomerizes to the stable structure with correct packing at the interface. Evidence for a two-step mechanism has been found in some of the kinetic studies (44, 48).

Stability
Thermodynamic stability is given by the value of the dissociation constant Kd or the standard Gibbs free energy of dissociation: With the exception of the trypsinogen complexes, Kd values for the complexes discussed here are in the lo-'-10-l" M range, hence AGZ = 11-18 kcal.mol-'.
AGS is a balance of several large terms favoring or opposing complex formation. The major terms opposing protein association are the loss it produces in the translational, rotational, and internal degrees of freedom. The major terms favoring association are hydrophobic energy, gained from the surfaces buried in the recognition sites, and electrostatic energy, from the hydrogen bonds.
New theoretical estimates of the translational and rotational entropy lost on formation of a dimer (52,58,59), suggest it is about 15 kcal. mol-' (59). Most side chains on the surfaces of proteins have only limited flexibility (60) so their immobilization in recognition sites will only add a few kcal to the cost of association.
The contribution of hydrophobicity has been estimated from the correlation between the accessible surface of residues and the energies associated with their transfer from water to organic solvents (61, 62). For non-polar surfaces there is an energy gain of approximately 25 Cal/AZ. The effect of burying