Critical Functional Role of the COOH-terminal Ends of Longitudinal Hydrophobic Strips in a-Helices of T4 Lysozyme*

The sensitivity of bacteriophage T4 lysozyme func- tion to amino acid substitutions at defined positions in and around the longitudinal, hydrophobic strips of 9 a-helices was assessed after systematic replacement of each residue in the protein with a series of 13 amino acids. The hydrophobic strips were defined by identi- fying the longitudinal sectors in the helices with the highest mean residue hydrophobicities. Sensitivity to mutation (the percentage of replacements leading to loss of function) was calculated for each residue in the following positions: whole protein, helices, hydrophobic strips, other positions within the helices, and var- ious positions within the hydrophobic strips as well as their extensions beyond the helices. Substitutions at positions in the hydrophobic strips led more frequently to loss of function than substitutions in the protein as a whole. One subset, the COOH-terminal hydrophobic strip residues, is apparently critical; substitutions of these residues (but not of their NH2-terminal counter-parts) led at least as frequently to loss of function as substitutions of solvent-inaccessible residues, and nearly as frequently as substitutions of the most highly conserved residues.

The sensitivity of bacteriophage T4 lysozyme function to amino acid substitutions at defined positions in and around the longitudinal, hydrophobic strips of 9 a-helices was assessed after systematic replacement of each residue in the protein with a series of 13 amino acids. The hydrophobic strips were defined by identifying the longitudinal sectors in the helices with the highest mean residue hydrophobicities. Sensitivity to mutation (the percentage of replacements leading to loss of function) was calculated for each residue in the following positions: whole protein, helices, hydrophobic strips, other positions within the helices, and various positions within the hydrophobic strips as well as their extensions beyond the helices. Substitutions at positions in the hydrophobic strips led more frequently to loss of function than substitutions in the protein as a whole. One subset, the COOH-terminal hydrophobic strip residues, is apparently critical; substitutions of these residues (but not of their NH2-terminal counterparts) led at least as frequently to loss of function as substitutions of solvent-inaccessible residues, and nearly as frequently as substitutions of the most highly conserved residues.
Proteins are generally tolerant of amino acid substitutions. Studies of natural variants, as well as of proteins subjected to intensive mutagenesis, have revealed that many, possibly most, single amino acid substitutions are tolerated. Moreover, it appears that few, if any, residues in a protein are irreplaceable (1)(2)(3)(4)(5). If combinations of substitutions are permitted, even the hydrophobic core of a protein can be packed in many different ways (6).
Against this background of tolerance, certain positions in proteins stand out as intolerant of substitutions. These critical residues are ones whose replacement with other residues frequently results in a loss of function. For purposes of understanding and engineering proteins, it would be useful to be * This research was supported by Grant IM-582 from the American Cancer Society, Grant AI-24083 from the National Institutes of Health, and by a grant from the scientific council of the University of Massachusetts Medical School for the Molecular Graphics Facility. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. To whom correspondence should be addressed Dept. of Pharmacology, University of Massachusetts Medical School, 55 Lake Ave. North, Worcester, MA 01655. Tel.: 508-856-3327;Fax: 508-856-5080. able to pick out such critical residues in a protein of known structure, without having to carry out systematic mutational studies. A few general rules for doing this exist. For instance, residues identified by structural or chemical data as involved in substrate binding and/or catalysis usually are critical for function. In addition, residues with buried side chains tend to be sensitive to substitutions (2, 3). Finally, residues that are conserved among members of families of related proteins are more sensitive to substitutions than non-conserved residues (7).
In an attempt to elucidate further general rules that might be of use in picking out critical residues and to understand the structural basis of those residues' importance, we have investigated the applicability of a structure-predicting algorithm. This algorithm, described previously (8), identifies a longitudinal hydrophobic strip in a surface a-helix (see Fig.  1). Like some other methods for predicting a-helices, it is based on the observation that a-helices on the surfaces of proteins tend to be amphipathic (1). Typically, one longitudinal sector of the helix will have predominantly hydrophobic residues, which pack into the hydrophobic core of the protein.
In a previous study (9), it was observed that, in the helices of 55 proteins, Leu, Ile, Val, Phe, and Met were found at high frequency in the longitudinal hydrophobic strip, relative to other positions; they were not found at high frequency in the first "virtual" hydrophobic strip positions (identified by extension of the hydrophobic strip beyond the helices). It was suggested that the formation of the longitudinal hydrophobic strip was critical to helix propagation and that the absence of hydrophobic residues in strip positions beyond a helix dictated helix termination.
We now find that application of the hydrophobic strip-ofhelix algorithm to the a-helices of T4 lysozyme picks out a small set of residues, associated with the hydrophobic strip, that are critical to function. Their apparent functional importance equals or exceeds that of the least solvent-accessible residues and nearly equals that of the residues in T4 lysozyme FIG. 1. Sheet projection of the strip-of-helix hydrophobicity template. 0 positions in the linear sequence are seen to fall in one longitudinal quadrant of an a-helix. The strip-of-helix hydrophobicity index is the mean hydrophobicity of 0 positions in the fitting of the repeating template pattern to a known or putative helix maximizing the index. The template may then be extended beyond the helix to define adjacent (virtual helix) 0 and 0 positions. that are the most highly conserved among phage-encoded lysozymes. These critical residues are at the COOH-terminal ends of their respective helices' hydrophobic strips. The corresponding NH,-terminal residues are significantly less sensitive. These observations are discussed in terms of an apparent pattern in the structural organization of surface a-helices in T4 lysozyme.

EXPERIMENTAL PROCEDURES
For each of the a-helices of T4 lysozyme found in the crystal structure (10, l l ) , the longitudinal hydrophobic strip-of-helix was defined by fitting a circular template, . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . (joined at the ends), to the amino acid sequence in such a way as to maximize the mean hydrophobicity of residues at 0 positions (8).
In a previous study (3), amber mutations were systematically introduced into the bacteriophage T4 lysozyme gene borne by a bacteriophage P22 hybrid in which the normal P22 lysozyme gene is replaced by its T4 homolog and which consequently depends upon T4 lysozyme for its ability to form a plaque. The resulting amber mutants were tested for plaque formation on amber suppressor strains of Salmonella typhimurium. Plating a collection of amber mutants covering 163 of the 164 codons of T4 lysozyme (omitting the initiating AUG) on 13 suppressor strains that each insert a different amino acid (Ala, Arg, Cys, Gln, Glu, Gly, His, Leu, Lys, Phe, Pro, Ser, or Tyr) in response to the amber codon tested the effects of multiple single amino acid substitutions at every position in the protein (except the first). The suppressors employed in these studies included 4 naturally occurring (12) and 9 synthetic amber suppressors (13-16).
Of the resulting 2015 single amino acid substitutions in T4 lysozyme, 328 were sufficiently deleterious to inhibit plaque formation at 37 "C; 74 of the 163 residues were sensitive to 1 or more substitutions, 60 of these to 2 or more, 53 to 3 or more, 43 to 4 or more, and 31 to 5 or more. Previous experiments have shown that, to score as deleterious in this system, a mutation must reduce the total activity of lysozyme to less than 3% of that produced by the wild type hybrid phage (3, 17).
Cited solvent accessibilities refer to the calculated side-chain surface area that can be contacted by a sphere of radius 1.5 A, expressed as a percentage of the accessible surface area of the same residue in the unfolded state (ll).' Coordinates and temperature factors ( B values) of atoms in T4 lysozyme and other proteins were obtained from the Brookhaven Protein Data Bank.
The statistical significance of observed high frequencies of deleterious substitutions in groups of defined positions were assessed by the application of a one-tailed test. Frequencies were converted to standardized normal random variables according to the formula, where fo was the observed frequency and fe was the frequency expected on the basis of a random uniform distribution of deleterious substitutions, adjusted for the particular amino acid residue composition of the group. Absolute values of Z greater than 3.1 correspond to p < 0.001; greater than 2.3, p < 0.01; greater than 1.6, p < 0.05.
Expected frequencies of deleterious substitutions for groups of amino acid residues were calculated as the number-weighted averages of the individual residue types. Individual residue frequencies are defined for residue type X as the fraction of all substitutions of all X (for example, Ala) residues in T4 lysozyme that result in loss of function.

RESULTS
Application of the hydrophobic strip-of-helix-identifying algorithm to the 9 a-helices of T4 lysozyme generated the result shown in Fig. 2, where hydrophobic strip residues are marked by squares and others by circles. Extension of the hydrophobic strips beyond the ends of the helices then defined the first virtual hydrophobic strip positions. The frequency of occurrence of Leu, Ile, Val, Phe, or Met in hydrophobic strip positions was higher than at other positions in the protein, as one would expect. In contrast, the frequency of these residues' S. Dao-Pin and L. Weaver, personal communication. In that study, lack of extension of a longitudinal hydrophobic stripof-helix with Leu, Ile, Val, Phe, or Met correlated with the termination of the helix.
In the survey of Vazquez et al. (9), the crossing region of a helix was defined as the 3 residues closest to a neighboring helix in terms of minimal interhelical C,, distances. It was observed that helices generally crossed through their longitudinal hydrophobic strips-of-helix. Furthermore, the smallest residue in the longitudinal hydrophobic strip-of-helix was usually in the crossing region, i.e. helices crossed through in Hydrophobic Strip-of-helix "notches" in the longitudinal hydrophobic strip-of-helix. Charged residues which occurred in the longitudinal hydrophobic strip-of-helix were excluded from crossing regions. The sides of such notches in the longitudinal hydrophobic stripof-helix were composed of Leu, Ile, Val, Phe, and Met predominantly. It was suggested that the structure of the longitudinal hydrophobic strip influenced the rotational and longitudinal positioning of crossing helices.
The relative sensitivities to substitutions of all residues in the a-helices of T4 lysozyme are indicated in Fig. 2. Residues of the hydrophobic strip were generally sensitive; the COOHterminal strip residues were particularly sensitive. The sensitivities to substitutions of groups of residues in T4 lysozyme, including those occupying the structural positions described above, are compared in Table I. The protein as a whole scored 16, that is 16% (328/2015) of substitutions tested were found to be deleterious. Buried residues, as a group, were more sensitive to substitutions. Loss of function resulted from 38% of substitutions for residues with side chains which have less than 12% of their surface areas accessible to solvent; this sensitivity increased to 42% if the group was restricted to those residues with completely inaccessible side chains. These observations establish a criterion for the performance of a scheme to pick out critical residues. As shown previously, residue conservation, too, is correlated with sensitivity to substitutions; 47% of substitutions in the 14 positions in T4 lysozyme that are fully conserved among 5 known bacteriophage-encoded lysozymes resulted in loss of function (7). T4 lysozyme has 9 standard a-helices, which account for 59% of its amino acid residues; and 2 other helices as well, described as a,, and "distorted a" structures, which we did not evaluate (10, 11). In terms of sensitivity to substitutions, a-helical residues were typical of the protein as a whole; 16% of substitutions were deleterious. Within this group, though, residues designated as belonging to the hydrophobic strip were more sensitive (26%), while non-hydrophobic strip residues were less (13%). Within the hydrophobic strips, the NH2-terminal residues scored 26%, while the COOH-terminal residues scored an extraordinarily high 44%. The smallest residues in the hydrophobic strips scored 31%. The NH2-and COOH-terminal "shoulders" (those strip residues on either side of the smallest) scored 14 and 31%, respectively. The COOH-terminal and virtual COOH-terminal strip positions together were much more sensitive to substitutions than were their NH2-terminal counterparts. Residues in the hydrophobic strip were generally sensitive to substitutions; COOHterminal strip residues were extremely sensitive.
The character of acceptable substitutions at the hydropho-bic strip positions partly mirrored their restricted amino acid compositions. For instance, Leu and Phe substitutions were universally tolerated at the COOH termini of hydrophobic strips (not shown). An apparent exception to the strip-of-helix template predictive rules actually supported the general theory of the functional importance of that strip in stabilizing the helix, and thus in protein function. In one a-helix, consisting of residues 3-10, the COOH-terminal residue, Asp", was found to be especially sensitive to substitutions. Asp" is in a position in which it would have been designated part of the hydrophobic strip if it were not charged. Inspection of the structure in this particular case was instructive; the side chain of Asp'' is solvent-inaccessible and participates in a buried salt bridge.
Sensitivity scores of small groups of amino acid residues could be biased by the limited compositions of the groups. For example, if the Phe residues in T4 lysozyme were especially critical, then a small group that happened to include Phe might be expected on this basis to have a high sensitivity score. To compensate for such a bias, we compared the sensitivity scores of each group identified above with an "expected score based on the group's amino acid composition. For example, the group of 9 COOH-terminal hydrophobic strip residues consists of 4 Leu, 2 Phe, 1 Ile, 1 Val, and 1 Trp. The expected sensitivity score of this group was 24%, the highest such value found (Table I). However, the actual score of COOH-terminal hydrophobic strip residues exceeded the expected score by 20. The previously identified groups of functionally critical residues, buried and conserved, exhibited scores 18-25 above their respective expected values. Two groups of residues, corresponding to the smallest residues in the hydrophobic strip, and the virtual extensions of the COOH termini of the hydrophobic strips exhibited scores 13 above their expected values. The latter of these was a special case, for reasons probably having little to do with structural principles; the group included Glu", the key catalytic residue of T4 lysozyme. Without Glu", this group's score would have been only 5 above its expected value. Similarly, the group of smallest residues in the hydrophobic strip was skewed by inclusion of Trp'38, which, although large, was counted as a member of this group by virtue of being the only (and thus, coincidentally, the smallest) residue in the hydrophobic strip of its short a-helix. Without Trp138, the score of the group of break residues would have been only 8 above its expected value. All the other groups deviated from their expected scores by 11 or less. A statistical test (describedunder "Experimental Procedures"; see Table I) indicates that almost all subgroups of the hydrophobic strip residues were significantly more in Hydrophobic Strip-of-helix 17751 sensitive to substitution than one would have expected based simply on their amino acid compositions. A residue's designation as a member of any group other than COOH-terminal hydrophobic strip residues, though, was significantly less informative than its solvent accessibility in predicting sensitivity to substitutions.

DISCUSSION
The importance of large hydrophobic residues in stabilizing protein structure is generally recognized (see Ref. 18 for a review). It has been confirmed and measured in recent studies of mutant proteins (for examples, see . We therefore expected and found that residues identified as parts of the hydrophobic strips of a-helices are, on the average, more sensitive to substitutions than others in the protein. Indeed, the pattern of sensitivity to substitutions of residues in a protein appears to be a useful predictor of secondary structure (19).
The unexpected result of our study was the identification of a set of especially critical, hydrophobic amino acid residues in T4 lysozyme: COOH-terminal hydrophobic strip residues in a-helices. A number of substitutions for any of the 9 members of this group resulted in loss of function. The minimum number of deleterious substitutions (out of 12 or 13 tested a t each position) in this group was 4 (see Table 11).
The significance of this group of residues is apparent when one considers that only 74 of the 163 tested positions in T4 lysozyme were sensitive to any substitutions; only 43 were sensitive to 4 or more. Thus, the probability of picking, at random, a group of 9 residues in which all are sensitive to 4 or more substitutions is approximately 0.000006. The apparent functional significance of residues identified in this manner equals or exceeds that of residues identified on the basis of solvent inaccessibility and approaches that of the set of most highly conserved residues.
This test of the relative functional importance of groups of amino acid residues in T4 lysozyme was nearly bias-free. The same set of 13 amino acids was tested a t essentially every position. Of course, a consequence of the use of 13 particular amino acids was that at some positions (59), 13 substitutions were tested, while at others (104), 12 were tested. A perfect test would examine the effects of placing all 20 amino acids at each position. On the other hand, the set of 13 amino acids represented in the collection of amber suppressors employed (Gly, Ala, Ser, Cys, Pro, Leu, Gln, Glu, His, Phe, Tyr, Lys, and Arg) contained members of most of the generally recognized classes of amino acids. The use of amber suppression as the means of effecting single amino acid substitutions introduces complexities into the interpretation of the data. Amber suppressors differ in efficiency, and the efficiency of suppression varies among amber (UAG) codons due to what are called context effects. On the other hand, all amber mutants in this particular collection formed large, healthy plaques on a t least 1 amber suppressor strain (on several, in almost all cases); conversely, all of the amber suppressor strains plated many
Examination of the T4 lysozyme molecule reveals a structural correlate of the functional importance of COOH-terminal hydrophobic strip residues. These residues are more tightly packed than NH2-terminal hydrophobic strip residues. The average side-chain solvent inaccessibility of COOH-terminal hydrophobic strip residues is 99%; that of NH,-terminal hydrophobic strip residues is 79%. Moreover, the average temperature factors of atoms in these two groups of residues are 15.6 and 20.3, respectively. These observations are based on 7 of T4 lysozyme's 9 a-helices. The other 2 are omitted because one (residues 93-106) runs through the interior of the large domain of T4 lysozyme and is thus not a surface ahelix, while the other (residues 137-141) has only 1 residue that fits the description of a hydrophobic strip residue (as defined by either the algorithm or structural criteria), and so its NH,-terminal and COOH-terminal hydrophobic strip residues are the same.
The foregoing observations point to a feature in the structural organization of T4 lysozyme; surface a-helices are more tightly packed into the hydrophobic core at their COOHterminal ends than at their NH,-terminal ends. We hypothesize that this structural feature accounts for the high sensitivity of T4 lysozyme function to substitutions specifically at the COOH-terminal ends of the hydrophobic strips. The observation of such a structural feature in T4 lysozyme raises the question of whether other proteins are similarly organized. A small-scale survey suggests that this polarity in the anchoring of a-helices is not general. We compared the average temperature factors of atoms in the NH2 and COOH termini of the hydrophobic strip residues of 43 a-helices in 5 proteins (thermolysin, sperm whale myoglobin, erythrocruorin, carboxypeptidase A, and beef catalase). None of these proteins exhibited as great a polarity as T4 lysozyme in this measure; in 4 cases out of 5, the COOH-terminal residues had slightly higher average temperature factors than the NH2-terminal residues (data not shown).
Rules for picking out critical residues are not completely reliable. Many substitutions are tolerated at buried and conserved positions, as well as at COOH-terminal hydrophobic strip positions. When examined in detail, proteins with unexpectedly tolerated substitutions turn out to have altered their structures in subtle ways, creating a context into which the altered residue fits (for a review, see Matthews (22)). It is apparent that the acceptability of amino acid substitutions is governed by details of structural context which cannot be captured in a simple scalar quantiby like solvent accessibility.
The foregoing considerations may limit, but do not rule out, the utility of simple formulas in identifying key structural determinants of function. Even so, using a structure-predicting algorithm is perhaps not an obvious strategy for studying a protein for which a high resolution, crystallographically determined structure is available. However, at least in this case, the algorithm was helpful in picking out a pattern (a class of critical residues) that the investigators did not notice by studying the structure in its full complexity. The usefulness of certain simplifying schemes, such as topology diagrams, in picking out structural relationships is well established (23).
This analysis introduces potentially severe restraints on the engineering of therapeutic proteins to decrease their immunogenicity (24) or of vaccines to increase potency or broaden the range of major histocompatibility complex restriction (25). A scavenger site (S site) motif identified with the strip-of-helix hydrophobicity template (26) may lie in or

Function-Loss Mutations in
Hydrophobic Strip-of-helix near the T cell-presented epitope (T site) which may be identified by allele-specific motifs (27-31). Decreasing the hydrophobicity of residues in 0 positions of the longitudinal hydrophobic strip may decrease scavenging or processing of an excised fragment bearing a T site. Likewise, increasing the hydrophobicity of such residues in engineered immunogens may increase potency and possibly broaden the apparent range of major histocompatibility complex restriction, if priming of low responders leads to a memory cell population capable of clinical protection upon later challenge with wild type antigen (25). Our current study shows, however, that the residues one would change to alter potency of T cell presentation of an S site-associated T site are the very residues which are required to maintain structure for the function of the protein. In the case of a vaccine, such loss of function may be irrelevant. But in the case of a therapeutic protein (enzyme, lymphokine, hormone), loss of T cell immunogenicity carries a greater likelihood to alter local structure in protein folding. To attack this issue, toward the engineering of still functional proteins with aborted T cell immunogenicity, one would want to discriminate biophysical differences between folding and docking of helices in nascent proteins, and the scavenging of excised fragments during antigen processing.