Substitutions of proline 76 in yeast iso-1-cytochrome c. Analysis of residues compatible and incompatible with folding requirements.

Fine-structure genetic mapping previously revealed numerous nonfunctional cyc1 mutations having alterations at or near the site corresponding to amino acid position 76 of iso-1-cytochrome c from the yeast Saccharomyces cerevisiae. DNA sequencing of the alterations in four of these cyc1 mutations indicated that the normal Pro-76 was replaced by Leu-76. Revertants containing at least partially functional iso-1-cytochromes c were isolated, and the alterations were analyzed by DNA sequencing and protein analysis. Specific activities of the altered iso-1-cytochromes c were estimated in vivo by growth of the strains in lactate medium; compared to normal iso-1-cytochrome c with Pro-76, the following activities were associated with the following replacements: approximately 90% for Val-76, approximately 60% for Thr-76, approximately 30% for Ser-76, approximately 20% for Ile-76, and 0% for Leu-76. In order to develop an understanding of the factors that determine whether or not an altered iso-1-cytochrome c will function, we undertook a theoretical analysis which led to the conclusion that the activity of the proteins was dependent on both short- and long-range interactions. Short-range interactions were estimated from studies on known protein structures which gave the likelihood that various amino acids would be found in a local backbone configuration similar to the native protein; long-range interactions with the rest of the molecule were analyzed by considering the size of the side chain. We believe this approach can be used to analyze a wide variety of mutant proteins.

. Equally important is that the overall structure of yeast iso-1-cytochrome c can be inferred by comparisons to well-characterized structures of related cytochromes c (Dickerson and Timkovich, 1975;Dickerson, 1981a, 1981b). It is our belief that analysis of altered iso-1-cytochromes c from nonfunctional mutants and from functional revertants and the analysis of various amino acid replacements at various positions will shed light on the structural requirements and folding properties of the protein.
As shown in previous studies, amino acid changes in functional revertants generally can be determined by protein analysis. However, because of the problem of protein instabilities, nonfunctional iso-l-cytochromes c and even certain functional iso-1-cytochromes c cannot readily be isolated and purified. In order to extend our studies on missense mutations and because of the relative ease in analyzing DNA, we have developed expedient procedures for retrieving and sequencing DNA segments encompassing the CYCl locus.
In this investigation, we have systematically examined alterations at position 76 in iso-l-cytochrome c (corresponding to position 71 in vertebrate cytochrome c). The position is occupied by proline in all of the 89 eukaroytic cytochromes c so far analyzed; this invariant Pro-76 appears to be required because it is located at the beginning of a helical structure (Takano and Dickerson, 1981a).
The results of our analysis of mutant forms of iso-l-cytochrome c showed that the Leu-76 protein is nonfunctional, whereas the Val-76, Thr-76, Ser-76, and Ile-76 proteins were functional but less so than the Pro-76 protein. We have attempted to explain these results with an analysis based on the size of the various side groups and on the compatibility of the residues with a specific conformation of the region of the protein encompassing position 76.

EXPERIMENTAL PROCEDURES
Genetic Nomenclature and Yeast Strains-The symbol CYCl or CYCP denotes the wild-type structural gene for iso-I-cytochrome c in the yeast Saccharomyces cerevisiae. Mutations of this gene that cause reduced growth on lactate medium and that cause deficiency in either the amount or activity of iso-1-cytochrome c are designated by the mutant gene symbol cycl followed by the allele number, e.g. cycl-24, cycl-58, etc. Intragenic revertants of specific cycl mutants that grow on lactate medium and that have at least partially functional iso-I-cytochrome c are designated by the mutant strain from which they are derived and by capital letters to distinguish independent revertants, e.g., CYCl-24-A, CYC1-24-B, etc. The eycl mutants, which were obtained from normal laboratory strain D311-3A (MATa CYCI+ lys2 his1 trp2)  and the genetic techniques and media especially used with the cycl mutants have been described, including procedures for mutagenic treatments, reversion, and testing of revertants (Prakash and Sherman, 1973;. Standard yeast genetic procedures (Sherman et al., 1983) were used to construct the strains ER42-6D (MATa cycl-365 urd-52 SUP4-o canl-100 ilu3 his3-A1 lysl-l), D1113-4C (MATa urd-52 leu2-3,112 ilv3 sUP4-o canl-lOO), andD1113-10B (MATa urd-52 his3-Al leu.2-3,112 ilu3 SUP4-o canl-ZOO). The low-reverting alleles urd-52, leu-2-3,112 and his3-Al have been described by Botstein et al. (1979). The cycl-365 mutation is a deletion encompassing the CYCI, OSMI, and RAD7 loci (Singh and Sherman, 1978) and is subsequently referred to in this paper as cycl-A. SUP4-o is a UAA suppressor closely linked to the CYCl locus (Lawrence et al., 1975). The canl-100 allele is suppressible by UAA suppressors, including SUP4-0. Growthvurues-Growth of cells in 10 ml of liquid lactate medium  was determined in 125-ml side-arm flasks that were vigorously shaken at 30 "C, using the conditions previously described (Schweingruber et aZ., 1979).
Protein Analysis-The amounts of iso-1-cytochrome c in the mutants and the revertants were estimated by low-temperature (-196 " C ) spectroscopic examinations of intact cells (Sherman and Slonimski, 1964). Low-temperature spectrophotometric recordings made with a Cary model 14 spectrophotometer were used for more quantitative measurements (Sherman et al., 1968). Yeast strains were grown and the iso-1-cytochromes c were extracted and purified by methods described previously (Sherman et al., 1968). Amino acid analysis of 6 N HCl hydrolysates of the proteins was performed with the Beckman-Spinco Model 120C amino acid analyzer according to published procedures (Stewart et aZ., 1971;Stewart and Sherman, 1973). The values were calculated by dividing the observed molar amount of each amino acid by the molar amount of protein analyzed. For neutral and acidic residues, the amount of protein was calculated by dividing the sum of the observed molar amounts of neutral and acidic residues, except for methionine and half-cysteine (which had high variance) and except for tryptophan (which decomposes), by 79, which is the number of such neutral and acidic residues in one molecule of iso-1-cytochrome c. For basic residues, the amount of protein analyzed was calculated similarly, using a value of 23 residues/ molecule of normal protein. Peptide maps of chymotryptic digest of cytochromes c were prepared as previously described (Sherman et al., 1968;Stewart et al., 1971), with Whatman No. 3MM paper by electrophoresis in pyridine acetate, pH 6.5, and chromatography in n-butyl alcohol/pyridine/acetic acid/water (15:10:3:12), followed by detection with ninhydrin and with reagents selective for tryptophan, tyrosine, and histidine. Cyanogen bromide cleavage of cytochrome c in formic acid followed by digestion in 0.1 N HCl and fractionation of the digests by column gel filtration on Sephadex G-50 has been described (Stewart and Sherman, 1973;Liebman et al., 1977). Procedures for manual sequential Edman degradation and thin-layer chromatographic identification of the dansyl' and phenylthiohydantoin derivatives are detailed in Stewart and Sherman (1973).
Retrieving and DNA Sequencing of cycl Mutations-In order to sequence the cycl mutations used in this and other studies, we have developed expedient methods for retrieving and sequencing DNA segments containing the CYCl locus. The DNA sequences of most of the cycl mutations reported in this study were determined with a procedure that was developed by extending previously described procedures (Stiles et al., 1981;Ernst et al., 1981Ernst et al., , 1982 and by employing certain standard techniques. The steps used for retrieving cycl mutations are outlined in Fig. 1. The strains used in various steps and their relevant genotypes are presented in . Fig. 2. The procedure requires the introduction of either the plasmid pAB61 or pABl78 into strains containing cycl mutations (see below); pAB61 was used initially whereas pAB178 was used at a later stage of the study.
Because the selection and detection of pAB178 (or pAB61) are carried out in u r d strains, the first step requires the coupling of the cycl mutation to the urd-52 marker. Most of the original cycl mutants were isolated from the strain D311-3A (MATa CYCl+ lys2 his1 trp2) which lacks the appropriate u r d marker . A convenient method for coupling the markers involves first crossing each of the cycl mutants to the strain D1113-10B (MATa CYCP urd-52 canl-100 SUP4-0) (step 1 in Fig. 2). A similar strain, D1113-4C, is used with MATa cycl mutants. The diploid strains are sporulated, the sporulated cultures are plated on canavanine medium, and the canavanine-resistant colonies are subcloned and tested. The majority of the canavanine-resistant colonies are haploid strains containing the desired cycl alleles for the following reasons: canl-100 The abbreviation used is: dansyl, 5-dimethylaminonaphthalene-1-sulfonyl. cycl-x mutations. The translated portion of the CYCl locus is indicated by a filled-in box; the chromosomal region absent in the cycl -A chromosome is indicated by a dotted line; the adjacent chromosomal regions are indicated by open parallel lines; the replicating plasmids pAB178 are indicated by circles, and the corresponding integrated plasmids are indicated by straight lines; and the BamHI-restricted sites in the chromosomal region adjacent to the CYCl locus and in plasmids pAB178 and pABcyc1-x are indicated by B. The Cross, Integration, and Retrieval steps are described in the text and in the legend to Fig. 2. The relative sizes of the regions are not drawn to scale. is recessive, and the heterozygous diploid strains canl-lOO/CANl+ are sensitive to canavanine; canl-100 is suppressible, by the UAA suppressor SUP4-o and therefore canl-100 SUP4-o haploid strains are sensitive to canavanine; and SUP4 is closely linked to CYCl. The cycl mutant allele also can be conveniently distinguished from the CYCl+ wild-type allele by examining the level of iso-1-cytochrome c in the haploid strains. Because MAT and URA3 are unlinked to CYCl, approximately one-fourth of the cycl haploid strains also should contain the desired MATa urd-52 markers.
The two plasmids constructed for this study, pAB61 and pAB178, are shown in Fig. 3. The plasmid pAB61 was constructed by insertion of the TRPI-ARSI region into the plasmid pAB30 (Stiles et al., 1981). The plasmid YRp7 (Botstein et al., 1979) (10 pg) was digested with EcoRI, and the 1.45-kilobase pair EcoRI fragment containing the TRPl and ARSl genes was isolated by electrophoresis in a 0.7% agarose gel followed by electroelution into a dialysis bag (Zaret and Sherman, 1981). The isolated fragment and 50 ng of the pAl330 plasmid that was cut with EcoRI and treated with bacterial alkaline phosphatase were ligated together with T4 ligase. After amplification in Escherichia coli, the pAB61 plasmid was used to transform the yeast strain ER42-6D according to the method of Hinnen et al. (1978). This transformed strain with the replicating pAB6l plasmid was designated B-5680. The plasmid pAB178 was constructed from the parent plasmid YIp5 by fusion of the 627-base pair XhoI fragment containing ARS2 from pGT25 (Hsiao and Carbon, 1979;Tschumper and Carbon, 1980) into the Sal1 site (pAB108)' and by ligation of the 1.9-kilobase pair HindIII-EcoRI fragment from pAB61 into its respective site. The yeast strain transformed with the replicating S. Baim, unpublished results.  The second step, illustrated in Fig. 2, is crossing each of the MATa cycl-x u r d -5 cunl-100 strains to strain B-5865 ([pAB178] MATa cycl-A u r d -5 2 cad-100 SUP4-o), which contains the replicating plasmid pAB178, or to strain €3-5680 ([pAB61] MAT& eycl-A u r d -52 canl -100 SUP4-o), which contains the replicating plasmid pAB61. As shown in Fig. 3, the pAB178 and pAB61 plasmids contain the following relevant regions and corresponding functions: the A&S' and ori regions that allow replication in, respectively, yeast and E. coli; the URA3+ and amp' regions that allow selections of the plasmid in, respectively, u r d -yeast strains and camps bacterial strains; and the EcoRI-Hind111 region that allows integration adjacent to the chromosomal CYCl locus as shown in Fig. 4. Obviously pAB178 and pAB61 plasmids cannot integrate adjacent to the CYCI locus in cycl-A strains, such as B-5865 and B-5680, which lacks this EcoRI-Hind111 region. pAB178 and pAB61 plasmids contain the additional yeast segments encompassing the URA3 and ARS loci. However, the u r d -52 mutation (Botstein et aL, 1979) lowers the occurrence of integration of URA3 plasmids at the chromsomal URA3 locus. Thus, the pAB178 and pAB61 plasmids can be maintained in yeast as a replicating form or as a form integrated at either the CYCl or ARS regions.
The diploid strains [pAB178] cycl-xlcycl- A and [pABGl] cyc-x/ cycl-A are grown repeatedly in a nonselective complete medium until Replication in yeast E the plasmid becomes stably integrated (step 3 in Fig. 2). In contrast to cell populations with a replicating form of plasmid, populations with integrated forms are composed of almost all Ura+ cells. Approximately 1-cm' areas on YPD (1% Bacto-yeast extract, 2% Bactopeptone, 2% dextrose, and 2% agar) plates were inoculated with the diploid strains, and the plates were incubated overnight at 30°C. Thin layers of cells were transferred by replica plating the first master YPD plate to a second YPD plate and then replica-plating the second YPD plate to a third plate. The third plate was incubated overnight at 30 "C. The diploid cells were transferred in this manner for 5-10 cycles. In some cases, the integration of the plasmid was stimulated by irradiating the third YPD plate with 10 J m-' of ultraviolet light.
The diploid cells were replica-printed from the final YPD plate onto minimal medium lacking uracil. Approximately one-half of the replicas contained single Ura+ colonies. In cases where it was not possible to pick single colonies from the replica print, one further subcloning step was carried out on minimal medium lacking uracil. In order to establish that a chosen Ura+ diploid strain had pAB178 or pAB61 integrated stably into the genome, single colonies were isolated on YPD medium and tested for Ura+. When the pAB178 and pAB61 plasmids had integrated, then all single colonies grown on YPD were Ura+; if pAB178 and pAB61 had not integrated, then only 10-20% of the single colonies grown on YPD were found to be Ura+.
The sites of integration of the pAB178 and pAB61 plasmids can --" r-I .

Dlgesllon wlth BarnHl and hgatlon
be conveniently determined by step 4, as depicted in Fig. 2. The stable Ura' diploid strains are sporulated, and the sporulated cultures are streaked on canavanine medium. After incubation of the plates, the canavanine-resistant colonies are picked and tested on uracil-deficient medium. If pAB178 and pAB61 plasmids, which carried the URA3+ allele, integrated adjacent to the CYCl locus, then the vast majority of the canavanine-resistant colonies should be Ura' haploid strains with the genotype cycl -x pAB178 urd-52 can1 -100 or cycl-x pAB61 urd-52 canl-100. On the other hand, if pAB178 or pAB61 integrated at a site unlinked to CYCl such as the ARS regions, then the canavanine-resistant colonies should be composed of approximately one-half Ura+ and one-half U r d strains. In practice, approximately 60% of the stable Ura+ diploid strains contained integration of the pAB178 or pAB61 plasmids at the desired position adjacent to the CYCl locus. The diploid strains containing the appropriately integrated pAB178 or pAB61 plasmids were grown to stationary phase in 40 ml of YPD medium. DNA was isolated from the cells according to the rapid procedure described by Sherman et al. (1983), except isopropyl alcohol was used instead of ethanol for DNA precipitations. Without further purification, 50 pg of DNA in 300 pl of digestion buffer was cut with BamHI. The restricted DNA was extracted once with phenyl/ chloroform/isoamyl alcohol (12:12:1) and then precipitated with 0.1 X volume of 3 M sodium acetate plus 2.5 X volume of ethanol. Following centrifugation, the DNA pellet was washed three times with cold 70% ethanol, dried, and resuspended in 100 pl of water. A solution of 2 pg of the DNA in a total volume of 270 pl of water was made, to which were added 34 pl of 10 X ligation buffer, 34 pl of 0.5 mg/ml bovine serum albumin, and 3.4 p1 of 100 mM ATP. Ligation was carried out overnight at 14 "C using 400 units of T4 ligase. This solution was used directly for transformation by adding 90 p1 of the DNA to 10 pl of 10 X transformation buffer (100 mM Tris.Cl, pH 7.1, 100 mM CaC12, 100 mM MgC12) and 200 ~1 of freshly thawed, competent cells of E. coli strain HBlOl (Mandel and Higa, 1970), followed by plating on ampicillin m2dium. Up to 50 bacterial transformants were obtained with this procedure. The size of the plasmids in the transformants was estimated by the rapid procedure described by Barnes (1977); over 90% of the plasmids were approximately 10 kilobase pairs, the size expected for plasmids containing the desired cycl region. The plasmids were designated pABcycl-x.
DNA sequences were determined with the dideoxy terminator method (Sanger et al., 1977;Smith, 1980) using one or more of the synthetic oligonucleotides, shown in Fig. 5, as primers and denatured double-stranded DNA as templates (Smith et al., 1979). Primer B was synthesized by Szostak et al. (1977); primers A, C, D, and E were custom-made by BIOLOGICALS (Toronto, Canada) . The cycl plasmids were linearized by cutting wih Hind111 in a solution containing 100 mM Tris.HC1, pH 7.2, 5 mM MgC12, 2 mM 2-mercaptoethanol, and 50 mM NaCl. A total of 10 pl of the reaction mixture containing 1 pg of the cut plasmid and 50 ng of the oligonucleotide primer in sequencing buffer (Smith, 1980) was added to a 10-pl capillary micropipette; the micropipette was sealed, immersed in boiling water for 3 min, and then chilled in an ice bath. One pl of [cI-~'P]~ATP (approximately 400 Ci/mmol) was then added to the contents of the micropipettes, with subsequent distribution to four capillary tubes. The sequencing reaction and the running of the sequencing gels were carried out essentially as described by Smith (1980).
The DNA sequences of the mutations corresponding to amino acid position 76 were determined with primer E.  et aL, 1975). The cycl-11 and cycl-211 mutants were devoid of iso-1-cytochrome c, whereas the remaining six mutants appeared identical and contained nearly one-half the normal amount of iso-1-cytochrome c when grown at 22°C. The lack of growth of all of these mutants on lactate medium suggested that the iso-1-cytochromes c in the six missense mutants are completely or nearly completely nonfunctional.

Mutations at Position
The cycl-11 mutation was previously shown by DNA sequencing to contain a UAA nonsense codon corresponding to Pro-76 (Ernst et aZ., 1981). The lack of recombination of cycl-11 with the other members of the set indicated that they all have alterations at or near amino acid position 76. In addition, DNA sequencing demonstrated that the cycl-211 mutation was the result of an insertion of a G-C base pair at the Pro-76 site (Ernst et aL, 1985).
DNA sequencing of four of the cycl mutations revealed alterations at the site corresponding to amino acid position 76. The results, shown in Table I, established that the leucine codon UUA accounted for the cycl-24 and cycl-185 mutations, whereas the leucine codon CUA accounted for the cy1 -58 and cycl-110 mutations.
The five mutants cycl-24, cycl-58, cycl-104, cycl-110, and cycl-169 were subjected to mutagenic treatments with nitrous acid, UV, or diethyl sulfate, and the revertants were selected by growth on lactate medium at 30 "C. Low-temperature spectroscopic examination and genetics tests of the revertants (Sherman et aZ., 1983) allowed the identification of numerous intragenic revertants, all of which contained 80-100% of the normal amount of iso-1-cytochrome c; 29 of these, listed in Table I, were chosen for further studies.
Iso-1-cytochromes c prepared from these 29 revertants were examined by chymotryptic peptide mapping (Fig. 6), which revealed four types of revertant iso-1-cytochromes c. One type appeared normal and three were altered, as shown in Fig. 6, only in the overlapping peptides C-9 and C-9', which correspond to residues 73-79. Amino acid compositions of 23 of the proteins presented in Table  I1 disclosed five types of revertant proteins, one normal type and four types having a residue of proline replaced, respectively, by a residue of valine, threonine, serine, or isoleucine. As shown in Table I, there was a correspondence between the types of peptide maps and types of compositions; proteins with replacements of serine and isoleucine yielded, respectively, type 1 and type 3 peptide maps, whereas proteins with replacements of threonine and valine both yielded approximately the same type 2 peptide map. Because Pro-76 is the sole proline residue in the altered peptide, position 76 appeared to be the site of the replacements. Cyanogen bromide cleavage and gel filtration yielded the peptide containing residues 70-85 from representative altered iso-1-cytochromes c; confirmatory amino acid compositions of 6 N HCl hydrolysates (Table 111) and amino acid sequences (Table IV) of the peptides also establish that Pro-76 is replaced appropriately (Table I) by serine, valine, or isoleucine in the representatives that were analyzed. The CYCI genes of at least one representative of each type of revertant was cloned, and the alterations were determined by DNA sequencing. The results, presented in Table I, confirmed the assignments based on protein analysis. The mutational events, leading to the nonfunctional cycl mutations and the functioinal CYCl revertants are summarized in Table   v.     Table I.
The closed curves represent ninhydrin-stained peptides of normal iso-1-cytochrome c. The normal type and three abnormal types of peptide maps were observed with the 29 proteins. The abnormalities of types 1-3 consisted of loss of peptides C-9' and C-9 and gain of a corresponding pair of peptides indicated by the broken curves. The C-9' peptide and its three related abnormal peptides were stained initially yellow and then blue with the ninhydrin reagent, whereas the C-9 peptide and its three related abnormal peptides were stained only blue; C-9' and C-9 and the three pairs of abnormal peptides all stained brown with the Pauly reagent which is specific for tyrosine. The following types of peptide maps were obtained with the following types of proteins: type 1, Ser-76; type 2, Thr-76 or Val-76; and type 3. Ile-76.
Activity of the Altered Iso-1 -cytochromes c-Biological activities of the altered iso-1-cytochromes c can be estimated by the relative growth of the mutant strains in lactate medium (Schweingruber et al. 1978(Schweingruber et al. , 1979. Schweingruber et al. (1978Schweingruber et al. ( , 1979 previously demonstrated that more consistent results could be obtained with an isogenic series of diploid strains that were constructed by crossing each of the mutant haploid strains to a haploid strain lacking iso-1-cytochrome c. Interfering recessive mutations, possibly present in the cycl mutants and CYCl revertants, are not manifested in the diploid strains. The growth of the strains containing the altered iso-lcytochromes c was compared to growth of standard diploid strains containing approximately 100, 50, 10, and 0% of the normal activity of iso-1-cytochrome c (Schweingruber et al., 1979). The growth curves of the mutant and standard strains, shown in Fig. 7, and the concentrations of iso-1-cytochrome c, presented in Table VI, allow us to estimate the relative specific activities of the altered iso-1-cytochromes c which are listed in Table VI. For example, the heterozygous strain CYCl-58-A/cycl-239 has a growth curve denoted 1, similar to the growth curve of the heterozygous control strain cycl-84/ CYCl-239-K. Because the CYCl-58-A/cycl-239 strain contains 50% of the normal amount of iso-1-cytochrome c, and because the cycl-84/CYCl-239-K strain contains approximately 10% of a presumably fully active iso-1-cytochrome c (Schweingruber et al., 1979): the Ile-76 protein in the CYC1-58-A strain is inferred to have a specific activity of approximately 20%. The other growth curves, 2-4, fall between this curve 1 and curve 5, which corresponds to the heterozygous control strain CYCl+/cycl-239. These growth curves therefore indicate that the specific activities can be ranked as follows: Pro-76 (100%) > Val-76 > Thr-76 > Ser-76 > Ile-76 (20%).
Although there are no quantitative standards for the region between growth curves 1 and 5, the close behaviors of the Pro-76 and Val-76 strains on one hand and the Ser-76 and Ile-76 strains on the other hand suggest the following approximate specific activities: Pro-76, 100%; Val-76, -go%, Thr-76, -60%; Ser-76, -30%; and Ile-76,20%. The lack of growth of the cycl-24 and similar strains indicates that the Leu-76 iso-1-cytochrome c completely or almost completely lacks function.

DISCUSSION
DNA sequencing of nonfunctional cycl mutants and DNA sequencing of at least partially functional CYCl revertants, along with protein analysis of revertant iso-1-cytochromes c, have been used to identify the mutational changes leading to amino acid replacements at position 76. Surprisingly, half of the mutants and a substantial number of the revertants were formed by substitutions of two adjacent base pairs (Table V). These mutational events, along with multiple base pair changes of other cycl mutations, will be discussed in a separate paper.
The growth curves of the cycl mutants and CYCl revertants and the levels of iso-1-cytochromes c in the intact cells have been used to estimate the specific activities of the abnormal proteins. These results, summarized in Table V, indicate that the following activities are associated with the following residues at position 76: 100% for Pro-76, -90% for Val-76, -60% for Thr-76, -30% for Ser-76, -20% for Ile-76, and 0% for Leu-76.
Although we have directly determined that only Leu-76 replacements cause nonfunction, we can reasonably infer from other mutational studies that other specific replacements also would result in nonfunctional iso-1-cytochromes c. Ernst et al. (1981) reported that the cycl-11 mutation resulted in a UAA nonsense codon corresponding to amino acid position 76 and that only Ser-76 replacements were recovered among the functional intragenic revertants. We wish to emphasize that A. T + C . G transversions giving rise to serine replacements are rare mutational events compared to other single base pair substitutions of UAA (and UAG) mutants Sherman et al., 1979). The low frequency of reversion of cycl-11 and the finding revertant proteins exclusively with Ser-76 replacements suggest that Tyr-76, S. Baim, D. Pietras, D. Eustice, and F. Sherman B-3541 B-3542 B-1432 B-3551 B-3561 B-1501 B-1503 B-1508 B-  Glu-76, Gln-76, and Lys-76 replacements, which could arise by single base pair substitutions, would be nonfunctional. Furthermore, as summarized in Table I, only Ile  revertants have been obtained from the missense mutants; the lack of recovery of Phe-76 and Arg-76 revertants, which could arise by single base pair substitutions of UUA and CUA codons, respectively, suggests that these replacements would result in a nonfunctional iso-1-cytochrome e. Thus, we suggest that iso-1-cytochromes c would be nonfunctional not only with Leu-76 but also with Tyr-76, Glu-76, Gln-76, Lys-76, Phe-76, and Arg-76. The replacements at position 76 have served as a basis for a theoretical analysis of requirements at specific sites for proper folding of the protein.

Amino acid C Y C l -C Y C l -C Y C l -C Y C l -C Y C l -C Y C l -C Y C l -C Y C l -C Y C I -C Y C l -C Y C l -
The details of this theoretical analysis are presented in the "Appendix," whereas the physical reasoning underlying the analysis is briefly summarized here. Two factors are considered to govern the ability of a given amino acid replacement to function in the protein. The first is the ability of the altered amino acid to impart the native conformation of the backbone found in its immediate neighborhood in the normal protein.
(This ability is referred to as the short-or medium-range interaction). The second factor is the ability of the altered side chain to fit into the space occupied in the native structure by the original side chain, a factor corresponding to one aspect of the long-range interaction. Both of these factors, as stated, are simplifications of the detailed interactions responsible for the correct folding and function of the protein. In order to explore the utility of such simplified concepts, we have developed quantitative parameters for characterizing each of the factors. The ability of a given residue to impart the appropriate local conformation is measured by p, a parameter arising from a detailed statistical analysis of protein structures (see "Appendix"). The ability to fit into the overall protein structure is measured by r,, the radius of gyration of the amino acid side chain. Fig. 8 shows the values of p and r,, which correspond to the occurrence of each of the 20 amino acids in position 76 of iso-1-cytochrome c and in the native local configuration of the backbone (see the "Appendix" for details). It will be seen that those amino acids which give some degree of functionality cluster in the lower right region of the plot, near the native residue, proline. Those residues which are nonfunctional when substituted for Pro-76 fall in the upper left region of the plot. The physical significance of this result is that functional residues at position 76 tend to both induce the native conformation in their neighborhood and have a small side chain. Residues with a larger side chain and/or less tendency to

TABLE I11 Amino acid compositions of the shortest cyanogen bromide peptide from the iso-1-cytochromes c
The normal composition is from Liebman et al. (1977) and was obtained under conditions that did not resolve the two forms of lysine. The theoretical composition, from the normal amino acid sequence, does not account for the conversion of methionine by cyanogen bromide to homoserine and homoserine lactone.

Amino-terminal sequences of the shortest cyanogen bromide fragment
from the iso-1-cytochromes c Lines immediately below the residues indicate that phenylthiohydantoins were identified, except for trimethyllysine where they indicate that no phenylthiohydantoins were obvious. Lines in the second level below residues indicate that dansyl derivatives were identified. The CYCl sequence is from Stewart and Sherman (1973). Altered residues are denoted in italics. The corresponding strain numbers are listed in Table I Ser-Glu-Tyr-Leu-Thr-Asn-Pro-Lys-Lys-Tyr-(CH3)3

-K
"-- impart the correct conformation generally give nonfunctional mutants. On the basis of the experimental results, a boundary can be drawn between regions which are expected to be functional and nonfunctional; this is shown by the broken line in Fig. 8. We are thus able to predict the functionality of those 76 in Iso-1-cytochrome c
Can be the indicated base pair change or recombination with the CYC7 gene (see Ernst et al., 1981).  Table VI were grown in lactate medium at 30 "C, and the turbidity of the cultures was measured with a Klett-Summerson colorimeter. The single-line curves designate single strains, and the filled-in areas designate the range for the similarly behaving strains indicated in Table VI. residues for which experimental results are not yet available.
Note that these considerations readily explain the difference in functionality between the Ile-76 (20%) and Leu-76 (0%) mutants. Although their side chains are virtually the same size, the 2 residues have significantly different values of p, the tendency to induce the observed structure. Ile-76, with a larger value of p, gives some functionality.
It is to be expected that the considerations we have outlined may not be adequate to classify the activity of mutant proteins having alterations at sites where the original residue enters into specific interactions, rather than nonspecific interactions as is the case with Pro-76. Although p will still be the proper measure of local interactions, a parameter other than r, may be the proper classifier for long-range interactions in such cases. Thus, plots of the type described in terms of various other long-range parameters will reveal, for any given set of mutations, precisely which types of long-range interactions are crucial for proper protein function. In forthcoming work, this approach will be developed in detail. The diploid strains were constructed by crossing one of the MATa haploid strains, which constituted D311-3A (CYCI), or one of its derivatives as shown in Table I, to one of the MATa haploid strains, which constituted B-2111 (cycI-239), or one of its derivatives, CYCl-239-N or CYCl-239-K; the MATa haploid strain CYCI+ contains the normal wild-type iso-1-cytochrome e, and the derivative cycl-84 is an amber mutant lacking iso-1-cytochrome c. The mutant cycl-239 is a frame-shift mutant lacking iso-1-cytochrome c, the revertant CYCI-239-N contains the normal sequence and normal amount of iso-1-cytochrome c, and the revertant CYCl-239-K contains iso-lcytochrome c with the abnormal sequence -His-Leu-at positions 5 and 6. The level of iso-1-cytochrome c in the CYC1-239-K haploid strain is only approximately 20% of the normal level; it is assumed that the specific activity of the CYCI-239-K iso-1-cytochrome c is equal to that of the normal protein (Schweingruber et al., 1979;see Footnote 3).
* The amounts of iso-1-cytochrome c were estimated by low-temperature spectroscopy. The heights of the C, peak were compared to the heights in strains having known amounts of iso-1-cytochrome c.
e The 0-6 refer to the growth curves shown in Fig. 7. Estimated from the per cent iso-1-cytochrome c and the growth in lactate medium compared with the growth of diploid strains, B-5841, B-5842, and B-5843. For the missense mutant protein, cycl-24, which is low in amount during growth at 30 "C, estimates of specific activity are based on failure of haploid strains to grow on solid lactate medium at 22 "C. when they contained over 50% of the normal cytochrome c level (see the text).

APPENDIX
Protein folding is governed by the interplay between shortand long-range interactions. Short-range interactions are those between a given residue and its near neighbors along the backbone, whereas long-range interactions are conventionally described as those involving interactions of residues separated by more than four units along the backbone. It has been established that the principal contributions to the energy of the native conformation arise from short-range interactions (Nemethy and Scheraga, 1977). These local interactions are mainly responsible for the assumption of the correct conformational state, i.e. the occurrence of the (9, $1 dihedral angles in the observed region of the (4, $) plane. Long-range interactions fine tune the conformation and contribute to the precise values of (4, $J) that occur in the native conformation. Although the long-range contributions to the (4, $J) values of a given residue may not be large, the global effect of these interactions may be very large, since small local errors resulting from the absence of long-range interactions would accumulate over the length of the backbone to give a predicted conformation very different from the native structure (Burgess and Scheraga, 1975).
In trying to understand the consequences of single amino acid replacements in a protein, the effects of both types of interaction must be considered. It is clear that the replacements preserving the highest degree of function in the protein will be those that cause the least change in both categories of interaction.
The Pro-76 series of iso-1-cytochrome c mutations provides an opportunity to relate these theoretical considerations on protein folding to actual data. We wish to propose a physical criterion for functionality which is capable of explaining the data currently available on the Pro-76 replacements and which makes specific predictions as to the functionality of mutants not yet observed. The method utilizes a graphical representation of the interplay between local and long-range interactions, requiring a discussion of the quantitative characterization of these two interactions. The local interactions are discussed first. Because these determine the conformational state of the backbone, we would like to construct a measure of the tendency for a small segment of backbone containing a given amino acid to assume a specific conformational state. This is done using concepts developed in a series of papers (Rackovsky and Scheraga, 1978, 1980, 1981, 1984 on the differential geometric representation of protein structure. The differential geometric representation operates on the four-@ length scale. It describes the conformation of a four-C" segment of the virtual bond backbone by using two parameters, K~, the curvature, and ri, the torsion, at residue i . d y convention, ( K~, q ) describe the structure of the segment Cr-l-.
(See Rackovsky and Scheraga (1984) for an introduction to the differential geometric approach and a sum- mary of its applications.) Statistical analysis of the ( K , T ) distribution of a sample of known protein x-ray structures has revealed the existence of six well-defined structural types of four-C" units, which are denoted A, and E, (x = R, 0, L). A and E denote four-C" helicalbend and extended structures, respectively. The subscripts R, 0, and L denote the parity (handedness) of the four-C" unit: right-handed, nearly flat, and left-handed, respectively.
Further analysis indicated that each of the 20 amino acids produced a characteristic distribution of conformations of four-C" units in which it was incorporated and that this distribution also depended on whether the amino acid in question was in the second or third position in the fonr-C" unit. It was possible to construct a scale for each four-C" conformational region, in which each amino acid was ranked according to its tendency to produce the four-C" structure in that region, relative to the average for all amino acids. The ranking parameter is called the average relative fractional occurrence (Rackovsky and Scheraga, 1982). Those amino acids that are much more likely than the average to produce a given type of structure were regarded as probable nucleators for that structure (Rackovsky and Scheraga, 1982).
We will use these average relative fractional occurrence scales to define a measure of the preference of a given residue for a specific local conformation. A residue at position i can be thought of as occupying the second position of four-C" unit i, (C;-l. . . C&,), which occurs in conformational region Si, and the third position of unit i-1, which occurs in region Si-1. We define p(x,Si-lSi), the tendency of the two four-C" units containing amino acid x to occur in the specified regions, by 76 in Iso-1 -cytochrome c Here P(x,m,Sj) is the average relative fractional occurrence of amino acid x when it is at the mth position of thejth four-C" unit which is in conformation Si. Note that in Equation 1 amino acid x is located at position i of the protein. Pmax (m,Sj) represents the maximum value of P(x,m,Sj) as a function of x. From this we conclude 0 I /L(X,Si-l&) 5 1. (

)
A value of p(x,Si-&) = 1 or 0 would indicate that amino acid x is very likely or very unlikely, respectively, to occur at the third position of a five-C" unit whose two nested four-C" units are in conformational regions Si-1 and Si (see Fig. 9). Thus, p measures the tendency to observe a specified structure on a five-C" scale due to the presence of a given residue, x; it reflects the conformation consequences of the interaction of residue x with its near neighbors.
The long-range interactions are now discussed. Although many of the long-range contacts in proteins are specific in nature, depending on special properties of both side chains involved, one would like to find the simplest, most general property that characterizes a given amino acid. The most elementary requirement for a particular residue to occur in a given structure is that its side chain fit into that structure.
This depends on the size and conformational flexibility of the side chain, properties which are reflected in its radius of gyration. We therefore select ri, the radius of gyration of the side chain of amino acid x (Levitt, 1976), as the long-range interaction parameter.
The situation to be analyzed is one in which the structure of the naturally occurring protein is known or can be conjectured with confidence from homology data. This information, by giving x, Si-1, and Si, provides the necessary calibration for the approach in the form of a point on the (p,rg) plot corresponding to the native, fully functional protein. As implied by the above discussion, our hypothesis on the requirements for functionality of an altered protein is that it produces the least possible perturbations in both short-and long-interactions. This translates into the statement that functional and nonfunctional altered proteins should fall into different regions of the (p,rg) plane.
The appropriate data for iso-1-cytochrome c mutations are shown in Fig. 8. Although there is no x-ray structure available for iso-1-cytochrome c from S. cereuisiue, the sequence of this cytochrome c is highly homologous to that of the tuna cytochrome cy particularly in the 75-85 region (corresponding to 70-80 in the tuna numbering system), which is rigidly conserved in all cytochromes c. Since the structure of this region is invariant in a wide variety of species, the tuna x-ray structure was used to determine Si-1 and Si. Pro-76 (corresponding to Pro-71 in tuna) is present at the amino-terminal end of a short length of a-helix. Therefore, Si = AR. Also, the preceding four-C" segment is extended, so that Si-1 = Eo. The values of p(x,EdR) are calculated using data from Table I of Rackovsky and Scheraga (1982) and the formula where 1.58 and 1.74 are P"" (2412) and Pm"(3,E0), respectively.
It is apparent from Fig. 8 that our hypothesis is satisfied for mutations at the Pro-76 site. There is a clear concentration of points which correspond to functional proteins in the neighborhood of the native protein, i.e. of the point corresponding to proline, and a distinctly separate group of points corresponding to nonfunctional proteins. This is a physically reasonable result because proline is the most likely of all the i + l i -1 FIG. 9. Schematic representation of the two four-C"units that are considered dx,Sidi).
i + 2 " . / i -2 amino acids to form E d R structure, which we have shown to be slightly disfavored relative to the overall frequences of occurrence of Eo and AR (Rackovsky and Scheraga, 1981); in addition, proline possesses a relatively small, constrained side chain. In the tuna structure, this side chain, although at the surface of the molecule, is so oriented that it interacts with a number of residues, either through their backbones or side chains. It is therefore to be expected that residues capable of replacing proline and maintaining function will cluster roughly in the lower right-hand region of the plot. Residues which are either too bulky or not likely to maintain the backbone structure, i.e. those which fall in the upper lefthand region of the plot, are not expected to result in functional proteins. This is what is observed.
On the basis of the known functional and nonfunctional mutants, it is possible to draw an approximate boundary between the regions of functionality and nonfunctionality in the ( p , rg) plane. By plotting the points corresponding to amino acids not yet studied, the functionality of the corresponding mutants can be predicted. This procedure, the results of which are also plotted in Fig. 8, leads to the prediction that alanine and cysteine should give functional mutants at position 76, whereas methionine, arginine, tryptophan, histidine, asparagine, and glycine should not. Glycine differs from the other amino acids predicted not to be functional in that its "side chain" is too small, rather than too large; in addition, it is the least likely amino acid to be found in the EoAR five-Ca conformation.
The fact that the functionality data are so well classified and explained by the criteria outlined above gives important insight into the role of Pro-76 in the folding of iso-l-cytochrome c. This analysis suggests that the principal role of this residue is to direct the local folding of the backbone into the proper conformation. From the long-range point of view, Pro-76 is essentially a space filler whose effect is not dependent on specific interactions. Thus, any residue with roughly the same space requirements can replace proline, as long as its short-range propensities are appropriate. It is therefore understood, for example, why the valine and isoleucine mutants show substantial activity, while leucine, whose side chain is roughly the same size as that of isoleucine, but which is not nearly as likely to be found in the EoAR configuration, is completely nonfunctional. It is also clear from the ( p , rg) plot that there is a compensatory relationship between the short-(Ki, Ti) in si and long-range parameters. An increase in a side-chain size, as with isoleucine, can be compensated for by an increase in p(x, EOAR), whereas a less favorable p value can be accommodated by a decrease in r,.