A comparison of the heme binding pocket in globins and cytochrome b5.

Of the 85 three-dimensionally characterized residues of cytochrome b5, 51 are found to be structurally and topologically equivalent to the globin fold. When these proteins have been superimposed, the heme irons are found to be less than 1.4 A separated and the heme normals are inclined by less than 9.5 degrees. The proximal histidine of the globins and two adjacent helices are equivalent to the sixth iron ligand and adjacent helices of cytochrome b5. Larger differences in structure are observed on the distal side of the heme, coincident with the most changeable part of the globin structures. The heme itself is rotated by 53 degrees about its normal but such a change is energetically minimal and conservative as the heme side groups are not directly involved in the function of the molecules. The beta-sheet of cytochrome b5 is inserted into a corresponding cavity of the globins forming an additional lining to the heme pocket. The roughly 50 residues missing at the carboxy end of the known cytochrome b5 fragment could correspond in part to the H helix in the globins. While it would seem probable that these similarities represent divergent evolution from a primordial heme-binding protein, the possibility of structural convergence to a functionally satisfactory protein cannot be excluded.

From the Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907 SUMMARY Of the 85 three-dimensionally characterized residues of cytochrome bg, 51 are found to be structurally and topologically equivalent to the globin fold. When these proteins have bee? superimposed, the heme irons are found to be less than 1.4 A separated and the heme normals are inclined by less than 9.5". The proximal histidine of the globins and two adjacent helices are equivalent to the sixth iron ligand and adjacent helices of cytochrome bg. Larger differences in structure are observed on the distal side of the heme, coincident with the most changeable part of the globin structures. The heme itself is rotated by 53" about its normal but such a change is energetically minimal and conservative as the heme side groups are not directly involved in the function of the molecules. The P-sheet of cytochrome b5 is inserted into a corresponding cavity of the globins forming an additional lining to the heme pocket. The roughly 50 residues missing at the carboxy end of the known cytochrome b5 fragment could correspond in part to the H helix in the globins. While it would seem probable that these similarities represent divergent evolution from a primordial heme-binding protein, the possibility of structural convergence to a functionally satisfactory protein cannot be excluded.
There are now close to 50 known protein structures (1). Some of these belong to families where the three-dimensional structure has been retained and only the specificity has been altered, such as the group of serine protcascs exemplified by chgmotrypsin (2). In other cases there occur domains within the structure with a given function which are part of a longer polypeptidc chain. Examples are the nucleotidc-biltdir~~ proteins (3, 4) and the calcium-binding proteins (5). In still other cases, there exist structural similarities where a functional relationship is not so clear: for example, the superoxide dismutase fold and the immunoglobulin domain.1 In those cases where there is both a strong correlation between amino acid sequences and a common * This work was supported by National Science Foundation Grant GB 29596x and National Institutes of Health Grant GM 10704.
$ Present address, Department of Physics, Southern Illinois University, Edwardsville, Illinois B2025. 1 I). C. Richardson, 1974, private communication. functional role, there can be little doubt that the two different cnzymcs or protein domains arose from a common precursor. The primordial gene was duplicated permitting each to cvolvc independently and separately, possibly being subscqucntly fused with another gene. In those cases where there is but a structural resemblance associated with a less ~vell defined common function, then convergent evolution may be the reason for the similarities. In general if the number of similar characters far csceetls the dissimilarities, it is improbable that each has couvcrged intlependently, thus suggesting a divergent evolutionary process. Schulz and Schirmer (6) have quantized this argument for the case of analyzing the divergence of structure among nucleotidebinding proteins.
This paper shows that there is a structural relationship between a subunit of hemoglobin and the cytochromc bs fragment each sharing the function of hcme billding. No tlefit~ite conclusions can, however, be drawn concerning their divergent or convergent evolution.

EXPERIMESTAL PROCEDURES i\SI) RESULTS
The p chain of horse oxyhemoglobin (7) was used for the pnrposes of the comparisons shown in this paper. Coordinates were kindly supplied by Dr. M. F. Perutz (MRC Laboratory of Molecular Biology, Cambridge, England). The structural nomenclature used is that described for myoglobin by Dickerson (8). As the differences in the tertiary structure among the known globins (O-11) are quite small compared to those between the globins and cytochrome b;, the specific use of the horse hemoglobin p chain as standard will not significantly affect the results reported here.
Coordinates for the structure of calf liver cytochrome by were obtained from Dr. F. S. Mathews (Washington Universitv School of Medicine, St. Louis, Missouri 63110). The amino acid sequence (12,13) from residue 3 to 87 is shown in Table I together with the  secondary structural elements 01~ ,016 and p,  0;. Residues 1 to 2 and 88 to 93 were not visible in the electron density map (14,15). In addition there are roughly 50 amino acids at the COOH terminus which have not yet been sequenced and represent that portion of the molecule, anchored in the membrane, which can easily be cleaved during extraction (lG, 17).
The initial comparison was made by visual inspection of stereoscopic diagrams. The 29 residues which were thought to be comparable are shown as "Fit 1" in Table I. Using a modification of the method of Rao and Rossmann (18) and an orientation matrix obtained from "Fit 1," 52 residues could now be equivalenced sequentially (Table I, "Fit 2"), which implies a similarity of fold (see "Appendix").
Further refinement of the orientation and relative position of the 2 molecules yielded 51 convergent residues given as "Fit 3" in Table I.
The superposition of the proteins with "Fit 3" resulted in a rather accurate superposition of the heme groups, which had not been included in the refinement. The iron atoms wereseparated by 7526 Thus a weight of 10 (rather than the computed probability in the range of 1 to zero) was given to the superposition of the C, atoms of F8 and His 63 in hemoglobin and cytochrome b:, respectively.
The result is shown as "Fit 4" in Table I. To enhance further the relationship of the hemes, a weight of 10 was placed on the superposition of the heme i on atoms ("Fit 5"). This, however, still left the iron atoms 1.3 ti apart.
In "Fit 6," by using a weight of 100 for the iron atoms, the refinement converged to a superposition of 48 residues and an iron atom separation of 0  The above results are based upon an initial superposition obtained by visual inspection. While it would seem improbable that the sequential superposition of roughly 50 residues of the known 85 residues in cytochrome bj is a chance event, yet a more systematic approach would be desirable. However, since it would not be easy to explore the six variables (three rotational angles and three translational distances) involved in the superposition of two structures, a well chosen line was selected so as to superimpose the 1 molecule onto the other after a rotation of K about this line. For each value of K the C, atoms of the proximal histidines (F8 in hemoglobin 13 and 63 in cytochrome bs) were superimposed, thus determining the three translational parameters. The results for this one-dimensional search, with associated distances between heme iron atoms, are shown in Fig. 5. It will be seen that, when the sequential superposition of residues is at a maximum, then the distance between the heme iron atoms is at a minimum. DISCUSSION A comparison of the globins with cytochrome bg was previously attempted by 0~01s and Strittmatter (12) based entirely upon a comparison of amino acid sequences. Although they found some weak homologies, these are totally different from those found here by three-dimensional comparison of structure. In contrast, however, the amino acid sequence of cytochrome bsv2 has been shown to be reasonably homologous to that of sperm whale myoglobin (19) and without an obvious analogy to the cytochrome bs sequence. Yet Itagaki and Hager (20) have shown that cytochrome bh62 and cytochrome bs exhibit similar physical and functional properties thus suggesting the relationship: The protein comparisons shown in Table I and Figs. 1 and 2 are remarkable not only for the close superposition of the heme groups, but also in conserving the functionally critical parts of the structures.
The proximal histidines (cytochrome bs 63, hemoglobin 6 F8) and the two linked helices (CQ and CY~ of cytochrome bg, E and F of hemoglobin p) are structurally equivalent. Furthermore, the cytochrome bs helix cyya corresponds to the hemoglobin fl helix E on the distal side, yet there is no equivalence between the distal histidine of hemoglobin p (E7) and the fifth Fe ligand of cytochrome bg (His 39). The necessary presence of the extra oxygen in the globin destroys structural equivalence on the distal heme side. Furthermore, the CD corner, which corresponds structurally to the position of histidine 39 in cytochrome bs, is the position of maximum change between the c~ and fl chains of hemoglobin. The D helix is absent in the o( chain of mammalian hemoglobins, and in the single chain glycera hemoglobin (9). This independently suggests that structural change is easily accommodated on the distal side of the heme group.
Although the orientation of the heme groups is maintained to within 9.5", there is a 53" rotation of the heme groups relative to each other in the 2 molecules. The heme side chains have neither been implicated in the mechanism for hemoglobin (21) nor in cytochrome b5 (22). The only functional necessity is to Labels for the hemoglobin residues are smaller than those for cytochrome. Superposition corresponds to "Fit 5" of Table I. FIG. 4. The heme environment. The hemoglobin fi heme and side chain are shown in dark while the cytochrome bs heme and side chain are shown in light outline. Labels for the hemoglobin residues are smaller than those for cytochrome. Superposition corresponds to "Fit 6" of Table I. keep the propionic acid groups mostly without and the vinyl groups within the heme pocket. Apparently, therefore, the importance of the heme orientation relative to the protein backbone, particularly with respect to the proximal histidine, is in the conservation of the heme binding function making the rotation of the heme less critical. The rotation of the hcme will be primarily controlled by the hydrophobic interactions within the heme pocket and hydrogen bonds between the polar propionate groups and the protein main chain or side chains.
One significant difference between cytochrome bb and myo-globin is the insertion of an anti-parallel P-pleated sheet (residues 21 to 32) between helices LYE and CQ in cytochrome bs. This sheet is situated in an empty region between helices E and G in the corresponding globin structure forming an extra lining to the heme pocket (Figs. 1 and 3). Corresponding residues lining the heme pocket for cytochrome bs (22) and in the hemoglobin /3 chain (9) are shown in Table II  Another large difference is the deletion of half of helix G and all of helix H from cytochrome bs. However, around 50 COOHterminal residues bound to the endoplasmic reticulum are cleaved in the extraction of cytochrome b5 (17,(23)(24)(25). This has recently been confirmed by a CD study (26) which shows that cytochrome bs is a two-domain protein. The soluble domain is cleaved from the membrane while the other remains firmly bound.
It is also shown that there is a link of around 15 residues associated with the soluble part which uncoils after cleavage, and that the membrane domain is likely to turn around in the bilayer and emerge on the same surface to expose the negative charge at the COOH terminus. The link peptide must thus be helix cy6 which might correspond to the end of helix G in the native protein, while the membrane-bound domain might correspond in part to helix H which turns around and runs in an anti-parallel direction to helix G.
While considering a possible evolutionary relationship between liver cytochromc bs and the globins, the established homology between liver cytochrome bs and yeast cytochrome bp (yeast ~-(+)-lactate dehydrogenase or EC 1.1.2.3) should also be discussed. The latter is a tetramer with each subunit (WV 58,000) containing a single polypeptide chain associated with one heme and one flavin (FAIN) moiety (27). The heme-binding fragment is obtained as a tryptic hydrolysate of the active molecule (28). It has been sequenced and found to bear a reasonable homology to the cytochrome bc fragment (29). Cgtochrome bs is a membrane-bound microsomal molecule which catalyzes the transfer of electrons from an KADH-linked FAD containing reductase to a non-heme cyanide-sensitive factor. On the other hand, cytochrome bz is a mitochondrial protein which catalyzes the transfer of electrons from the FJJK-containing reductase (which is in fact part of the same polypeptide chain) to cytochrome c. It has been suggested (3) that many nucleotide-binding proteins have similar structures. Thus the cytochrome bs reductase-cytochrome bs system may bear not only a functional but also a structural resemblance to the, yeast cytochrome bz molecule. The role of cytochromc bz in the mitochondrial respiratory chain might thus suggest a possible common functional origin for the globin oxygen carriers and the cytochrome b electron carriers, occurring at least 1.5 X log years ago (30).
Finally it is necessary to consider divergent as opposed to convergent evolution of the heme-binding protein represented by cytochrome bs and the hemoglobin fl chain. On the basis of the criteria described in the "Appendix," 51 of the 85 (60%) structurally known residues of cytochrome bs are equivalent to those of the hemoglobin fl chain. Similar comparisons of the nucleotide binding domains of lactate dehydrogenase and glyceraldehyde-3phosphate dehydrogenase show 92 of 144 (640/,) residues in lactate dehydrogenase to be structurally equivalent to glyceraldehyde-3-phosphate dehydrogenase' Thus the percentage of structurally equivalent residues is essentially the same, although the complexity of the globin structure is significantly smaller. Divergent evolution of the nucleotide binding domains has been proposed (3, 4, 31). However, as divergent evolution must be established by the presence of the number of similar independent characters (e.g. amino acids) in relation to the number of total characters, the comparison with nucleotide-binding proteins cannot alone establish divergence. Similarity of function can also be taken as another character set (6). In this respect the remarkable similarity in the binding of the prosthetic group to cyto-in some detail (as opposed to the outline given earlier) due to frequent requests for this information.
The first step is to devise an initial rotation matrix ICJ and a translation vector d which orients the 2nd molecule (identified by the subscripts 2) similarly to the 1st molecule (identified by the subscripts 1). Thus the new position $2 (z'~~/'~z'.J of a point 22 (r2y2zz) Table I shows slightly larger values of between 1.24 and 1.37 minimum base changes per codon for the hemc-binding proteins considered here. However, Table III shows that the greater the certainty of equivalencing residues the lower is the minimum base change per codon, approaching the value found among the nucelotide binding domains of dehydrogenases. Xo good estimates exist, however, on how much the random value (e-1.45) might be lowered by the convergence of two protein structures to a similar fold from different ancestors, as a consequence of requiring certain amino acid types as structural requirements in the folding process.
In summary, the similarity of protein structure of cytochrome bs and the globins together with the striking similarity of the heme position and orientation show the presence of a divergent or convergent evolutionary process. The proportion of structurally equivalent residues and the relatively low minimum base changes per codon approach the situation found in the NAD binding domains of dehydrogenases. Nevertheless, as the structure of the globins is somewhat simpler, the case for divergence, while substantial, cannot be considered proven. The procedure, described here, is a modification of the method of Rao and Rossmann (18). The modified procedure is described The next step is to find the three Eulerian angles (B1, t&, 0,) which best fit the nine linearly determined rotation matrix elements. Using the Eulerian rotation matrix given by Rossmann and Blow (32), it can be shown that: where cij(obs) and cij(calc) refer to their "observed" and "calculated" values. The observed values refer to the linear determination above while the calculated values depend upon the evaluation of trigonometric expressions (32) with the current Eulerian angles. The three normal equations (i = 1 to 3) can be derived from the nine observational equations and shifts A6'j (j = 1 to 3) may be computed. The ith normal equation is of the form shown in Equation 1. This procedure is applied iteratively until values of ABi are less than a preset value (e.g. 0.01"). Expressions for evaluation of the differentials dcij(calc)/d0i are easily obtained from the expressions for cij (calc).
Reasonable values of the three Eulerian angles and three translational components are now available, but it is still necessary to refine these directly in order to minimize the sum of the square of the distances between equivalenced atoms. It is necessary to minimize The weighting factor, w, may be taken as unity in every case, arbitrarily increased 011 certain equivalenced atoms to assure their good superposition, or set equal to the probability, P, with which the given pair of atoms are equivalenced. N represents the number of equivalenced atoms.
Successive cycles of nonlinear least squares are then used to refine the shifts in the six parameters {i (i = 1 to 6). The ith normal equation has the form shown in Equation 2. For example, the derivative dz'.J1%9~, given that In practice it was found that sometimes as many as 25 cycles of equivalence determinations were needed in comparing structures as different as the hemoglobin p and cytochrome bs chains before convergence had been reached. However, the basic set of equivalences was invariably found after 1 or 2 cycles. The remaining cycles only added or dropped just a few equivalences. Furthermore, the same doubtful equivalences were invariably the cause of the many small changes before the above criterion of convergence had been satisfied. An appreciation of these changes can be obtained by inspecting Table I where only slightly different results are obtained by altering the weighting system.
The possible equivalence of 2 given residues in the molecules being compared was expressed as a probability, P, dependent upon three essentially independent, estimates. These were: 1. The distance, dij, between C, atoms i and j in the 1st and 2nd molecule. This gives rise to the probability d?
which estimates the probability of spatial superposition. 2. The scatter, SM, given by the root mean square deviation from the mean of the distance di-1, +l,di, j,di+l, j+l. When this scatter is low the polypeptide main chain in the 2 molecules must be oriented similarly. When the best rotation angles and translation components equivalenced residues. Thus have been determined with respect to the presumed equivalent atoms (or alternatively a rotation matrix and translation vector S2 has been otherwise given), it is then critical to determine whether p3 OF exp -Lother atoms or residues can be equivalenced and whether the previous set of equivalences was the most reasonable. After a 2E; revised set of equivalences has been obtained, the above IIOII-which also estimates the probability of similar orientation of linear least squares procedure is again applied in order to mini-residues. mize, as before, the sum of the squares of the distances, &, be-Pa mostly applies to the main chain atoms and Cb. The joint tween equivalenced atoms. This will result in modified Eulerian probability will then be given by angles and translational components which will modify the set of structurally equivalenced residues. The procedure is repeated The probabilities for the ith residue of the 1st molecule interacting with the 1, 2, . . . , jth residue of the 2nd molecule were then sorted into descending order. Equivalenced residues were, tentatively, assumed to be those with the largest probability relating the ith and jth residues. Such assignments, however, take no account of the topological and genetic (in the case of divergent evolution) requiremer& of similar folds. That is, the equivalencing of residues must be progressive along each polypeptide chain. More precisely, if Rli R 2j represents equivalence of residues where the subscripts refer to the molecule and residue numbers, then R l,l+n R 2,j+m only if both n 2 1 and m >_ 1. Clearly the previous tentative assignments, based upon maximum probabilities alone, do not necessarily conform to the above "progression rule." Some equivalences must thus be removed, or others of lower probabilities added, before a satisfactory equivalenced set has been found.
L'R~ns" of equivalences are now identified from the tentative assignments. A "run" is defined as a series of the form: Rli = II 2j' Rl,i+l 3 R2,j+lf Rl,i+2 '2,j+2'*'." 'l,i+n E 'Z,j+n For each run the total i=i+n T = c 'i i=i is evaluated. Adjacent runs can then be compared in pairs. Both runs are accepted if they obey the progression rule; but if that is not the case, that run with the smaller total, T, is rejected. Emphasis is thus given to the longer and more similar folds of the protein. This process of rejection is continued until every run, and hence every residue, obeys the progression rule. Finally the runs are extended at either end using the largest available probabilities among the largest five probabilities consistent with the progression rule. The extensions are themselves terminated when no further acceptable probabilities can be found or when the downward extension of one run meets the upward extension of another. The purpose of the run extensions is to include those parts of the protein fold where there may be amino acid homology in the absence of precise structural equivalence, particularly at bends in the polypeptide chain or at deletions or insertions.