The Human Embryonic Myosin Heavy Chain COMPLETE PRIMARY STRUCTURE REVEALS EVOLUTIONARY RELATIONSHIPS WITH OTHER DEVELOPMENTAL

We have isolated a single 6021-nucleotide cDNA fragment encoding the full length of the myosin heavy chain (MHC) isoform initially expressed in developing human limb muscle. The corresponding transcript is expressed in fetal, but not adult, human muscle, and the corresponding gene maps to human chromosome 17. Comparison of the full length nucleotide sequence with that of the orthologous rat gene transcript reveals 74, 90, and 80% similarities in the 5'-untranslated, coding, and 3'-untranslated regions, respectively. To precisely quantitate the degree of nucleotide sequence divergence between the human embryonic and other developmentally regulated MHC gene transcripts, we utilize the algorithm of Perler et al. (Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R. & Dodgson, J. (1980) Cell 20, 555-566) and make use of the codon-for-codon register attainable in alignments of the MHC rod encoding cDNA fragments. The results allow reconstruction of the order and relative timing of certain gene duplication events involved in the evolution of the multimembered mammalian MHC loci. By this analysis, the principal sarcomeric MHC gene expressed in the 14-day chick embryo is shown to be more distantly related to the mammalian embryonic MHC genes than to those expressed peri- and postnatally. Attention is focused on regional patterns of MHC sequence conservation, ordered with reference to the topology of our phylogenetic tree. We present a composite map depicting the deduced evolutionary age of various primary structural subdomains of the human embryonic MHC.

and make use of the codon-for-codon register attainable in alignments of the MHC rod encoding cDNA fragments.
The results allow reconstruction of the order and relative timing of certain gene duplication events involved in the evolution of the multimembered mammalian MHC loci. By this analysis, the principal sarcomeric MHC gene expressed in the 14-day chick embryo is shown to be more distantly related to the mammalian embryonic MHC genes than to those expressed peri-and postnatally. Attention is focused on regional patterns of MHC sequence conservation, ordered with reference to the topology of our phylogenetic tree. We present a composite map depicting the deduced evolutionary age of various primary structural subdomains of the human embryonic MHC.
The myosin heavy chain (MHC)' gene family encodes a series of protein isoforms whose developmental regulation plays a crucial role in normal muscle morphogenesis and specialization (Hoh and Yeoh, 1979;Whalen et al., 1979,198l;Bandman et al., 1982;Nadal-Ginard et al., 1982 al., 1986;Stockdale, 1986a, 1986b;Weydert et al., 1987;Rubinstein et al., 1988). Primary structural differences among the MHC isoforms are thought to relate directly to the differences in ATPase rate, force generation, and thermodynamic efficiency observed among individual muscle fibers and within the same fiber at different stages of development (Reiser et al., 1985a(Reiser et al., , 1985b(Reiser et al., , 1988Alpert and Mulieri, 1986;Sweeney et al., 1986;Park et al., 1987;Acker et al., 1987). The range of electrophoretic mobilities exhibited by myosins or their proteolytic fragments provided early evidence for the existence of differentially expressed genes (Whalen et ab, 1979;Hoh and Yeoh, 1979). Presently, monoclonal antibodies provide the best probes for the resolution of MHC gene expression at the single fiber level (Miller et al., 1985;Narusawa et al., 1987;Taylor and Bandman, 1989), while cDNA probes for specific mRNA transcripts are useful in the study of tissue extracts (Nadal-Ginard et al., 1982;Kropp et al., 1987). Further refinement of these techniques and their extension to other species will require improved primary structural representations of the various MHC isoforms. Recombinant DNA-based research strategies have recently contributed an appreciable amount of primary structural information about both sarcomeric MHCs and their distantly related nonsarcomeric counterparts Periasamy et al., 1984;Strehler, et al., 1986;Saez and Leinwand, 1986;Warrick et al., 1986;Hammer et al., 1987;Yanagisawa et al., 1987;Casimir et al., 1988). Our laboratories are interested in the relationship between developmentally regulated isomyosin transitions and the corresponding diversification of myosin structure and function. To extend the scope of our general investigations, we have recently isolated and characterized cDNA fragments encoding a MHC isoform expressed early in the development of human skeletal muscle (Eller et al., 1989). Here we present the complete primary structure of the encoded polypeptide deduced from a full length cDNA sequence. These data allow the quantitation of divergence times among a large variety of previously characterized MHC gene fragments.
Comparisons between aligned amino acid sequences of distantly related MHCs yield information about regional differences in functional constraint.
With this information at hand, we construct a detailed map of amino acid sequence conservation as it relates to predicted higher order structural regions in the MHC S-l domain.

Isolation
and Characterization of Cloned cDNA-Monoclonal antibody 2B6 (Gambke and Rubinstein, 1984), which specifically binds the embryonic MHC, was used to screen a plating of 2 X 10' plaques from our Xgtll 18-week human fetal muscle library. Following im-  , 1984) and utilized to screen our XgtlO human fetal muscle library (Stedman et al., 1988). HEMHC-2 was the largest cDNA insert recovered on this screen (10" phage plaques from the amplified library). This EcoRI linkered cDNA was subcloned and found to share the HEMHC-1 restriction map over 3.5 of its 6 kb in length (Fig. 1). HindIII, PstI, and SstI subfragments of HEMHC-2 were subcloned at quasirandom into the same vector. The nucleotide sequence of the double-stranded plasmid DNA was determined by the dideoxy chain termination method (Sanger et al., 1977).
Northern Blots-Northern blots were prepared and hybridized as previously described (Stedman et al., 1988 Garnier et al. (1978) and Chou and Fasman (1978) as modified by Novotny and Auffray (1984

RESULTS
Isolation of Cloned cDNA- Fig. 1 depicts cDNA fragment HEMHC-1, isolated from the human fetal muscle Xgtll library following immunodetection of the expressed P-galactosidase fusion polypeptide with monoclonal antibody 2B6 (Gambke and Rubinstein, 1984). HEMHC-2 was the largest cDNA fragment isolated on a subsequent screen of the XgtlO library with "'P-labeled subfragment A of HEMHC-1. The presence of two internal EcoRI sites in HEMHC-2 required controlled partial endonuclease restriction of the parental XgtlO recombinant to generate intact cDNA for subcloning into plasmid. The restriction map of HEMHC-2 precisely overlaps that of HEMHC-1, implying 5'-extension along an HBMIIIC-2  I  II  II  IIIIII,,,  ,,I  I, identical transcript. This conclusion was confirmed by partial nucleotide sequence determination in the rod-encoding portion of cDNA HEMHC-2 (data not shown). Expression of the Human Embryonic MHC Transcript-A 25base oligomer (HEMHC-01) complementary to the HEMHC-1 sense strand in the 3'untranslated region, an area of maximal interisoform heterology, was synthesized and end-labeled to high specific activity. Following hybridization and wash at 52 "C in 0.9 M NaCl (approximately 5 "C below the empirically determined denaturation temperature), HEMHC-01 detects an approximately 6-kb mRNA species unique to human fetal muscle ( Fig. 2A). The autoradiogram in Fig. 2B was exposed from the same Northern blot following rehybridization with the 3.0-kb subfragment D of HEMHC-1. Here, the probe spans regions of sufficient interisoform homology to detect MHC mRNA in preparations from human adult skeletal (lane 2), ventricular (lane 3), and atria1 (lane 4) muscle, as well as from human fetal muscle (lane I). This further documents the specificity of the oligomer HEMHC-01.
Physical Linkage among Human MHC Gene Family Members-Chromosomal assignment for homologous human genomic sequences detected by subfragments B and C of HEMHC-1, at moderate and high stringency, respectively, was accomplished utilizing a Southern blot of DNA preparations from a panel of hamster-human hybrid fibroblast cell lines (Fig. 3). Subfragment B hybridized at least six BglII fragments localizing to human chromosome 17, while subfrag-ment C detected only one of these BglII fragments. The hybrid panel illustrated does not allow discrimination between chromosome 5 and chromosome 17 sequences. Reutilization of the probes on a discriminatory subpanel (data not shown) allowed assignment to chromosome 17, in agreement with previously published results for skeletal MHC cDNA-detected sequences (Leinwand et al., 1983;Edwards et al., 1985). It has been recently reported that two tightly linked cardiac MHC genes map to human chromosome 14 (Saez et al., 1987). The localization of all BglII fragments detected by subfragment B of HEMHC-1 to human chromosome 17 suggests that this skeletal muscle cDNA probe retains specificity for those MHC genes primarily expressed in skeletal muscle. Since the additive molecular weight of the fragments detected with the 1.3kb subfragment B is approximately 30 kb, and published vertebrate sarcomeric MHC gene sequences indicate an approximate 3:l intron-to-exon size ratio, it is most likely that this result represents coding sequence homology between several linked members of this MHC gene subfamily.
Nucleotide Sequence Determination-The total length of cloned fragment HEMHC-2 is 6021 nucleotides: 74 nucleotides in the 5'-untranslated region, 5820 in the coding region, and 127 in the 3'-untranslated region. The nucleotide sequence has been previously recorded (Eller et al., 1989). In the present work we provide the derived amino acid sequence of HEMHC-2 (Figs. 5 and 6) as the basis for evolutionary comparisons with other published MHC sequences. Simple comparison of the aligned human and rat (embryonic MHC) Sl coding sequences reveals homologies of 90.2 and 98.8% at the nucleotide and amino acid levels, respectively, while the corresponding values for the rod domain are 89.6 and 97.7%. The calculated amino acid sequence divergence for the full length MHCs from these species is only 1.85% or about oneninth of the 16.5% previously reported for a comparison of the rat embryonic MHC sequence with an MHC from 14-day embryonic chick (presumed to represent an orthologous gene comparison) (Molina et al., 1987). Divergence distances similarly calculated for cytochromes and globins isolated from primate, rodent, and avian species yield ratios of approximately 1:3, consistent with paleontological evidence dating the avian/mammalian split and the mammalian radiation at 300 and 85 million years before the present, respectively (Romero-Herrera et al., 1973;Perler et al., 1980 (and  Lanes correspond to B&IIdigest.ed DNA from 10 different human-hamster hybrid cell lines (A-J), one hamster (K), and t.hree human cell lines (M-O). Lanes A-O are from an autoradiogram after hybridization with "'P-labeled subfragment C, then washed at 65 "C in 15 mM NaCl. Lanes D', K', and 0' correspond to lanes D, K, and 0 after elution of subfragment C, rehybridization with '"P-labeled subfragment B, and washing at 65 "C in 15 mM NaCl. For details of the human chromosome content of each hybrid cell line, see Nussbaum et al. (1985) and Stedman et al. (1988). The human specific Bglll fragments detected by both subfragments B and C segregate with perfect concordance only with human chromosome 17. ences therein)). To gain further insight as to the source of the apparent discrepancy in divergence ratios (I:3 uersus 1:9), we undertook the quantitation of nucleotide sequence divergence rates among available MHC cDNA sequences. Many of the published sequences correspond to only the carboxyl-terminal portions of the MHC rods, precisely the domain in which codon register is best preserved. Among the MHC rod-coding sequences that can be unambiguously aligned codon-for-codon (a list which includes, with minor exceptions, all muscle sequences thus far published) it is possible to apply the pairwise comparison algorithm of Perler et al. (1980) without introducing artifact in the process of gap element manipulation (Feng and Doolittle, 1987). Table IA shows the percentage corrected divergence at replacement sites among 10 sets of paired MHC rod cDNA sequences. The GCG GAP program and dot matrix comparisons between paired sarcomeric and nonsarcomeric MHC amino acid sequences were used to identify the best rod cDNA sequence alignments.
Based on these data (not shown, available on request) our aligned full length rod sequences all begin just carboxyl-terminal to the highly conserved Sl-terminating proline residues and retain this register to their respective carboxyl termini. These alignments required the insertion of gap elements to accommodate only the previously noted differences in "skip residue" positions among the nonsarcomeric MHCs (McLachlan and Karn, 1983;Warrick et al., 1986;Yanagisawa et al., 1987). Because of the nonequivalent lengths of the various MHC rod sequences, all were truncated at RE, rat embryonic sarcomeric (Strehler et al., 1986); CE, chicken embryonic sarcomeric (Molina et al., 1987); RN, rat neonatal sarcomeric (Periasamy et al., 1984); HA, human adult ("IIa") sarcomerit (Saez and Lienwand, 1986); HZ, human slow ('I$") sarcomeric (Saez and Lienwand, 1986); NS, nematode (uric 54) sarcomeric ; CSm, chicken smooth muscle (Yanagisawa et al., 1987); DC, slime mold cytoplasmic (Warrick et al., 1986); SC, newt "cardioskeletal" (Casimir et al., 1988 Fig. 5A). cDNA sequences encoding the Sl domains of six of these MHCs have been similarly compared in Table IB. The nucleotide sequence alignments utilized to generate this data set are based codon-for-codon on the amino acid sequence alignments shown in Fig. 7. Although each gap element in the sequence alignments corresponds to at least one insertion/ deletion event in the divergence of the genes represented, these events are not considered in our quantitative comparison of the nucleotide sequence divergences. We estimate that the error resulting from this simplification is less than 1% with this set of nucleotide sequences (i.e. we see evidence of greater than 100 nucleotide substitutions per insertion/deletion event). The phenogram in Fig. 4 illustrates the possible evolutionary relationships among these MHC genes based on the data in Table IA and the tree generation algorithm of Fitch and Margoliash (1967), as modified by Doolittle (1986). It is important to note that this approach is based entirely on the quantitation of pairwise differences. This simplification facilitates the incorporation of data from partial length cDNAs (the four rows at the bottom of Table IA). The meaningful application of alternative algorithms, especially those that mathematically optimize evolutionary parsimony (Fitch, 1977;Penny and Hardy, 1986), must await the sequence analysis of full length cDNAs for the various MHCs. At present we regard the overall tree topology as reliable but the distal branch arm lengths as tentative (especially among the partial fragments of genes HA, HZ, RN, and SC).
In Fig. 4, a vertical stripe corresponding to -300 million years has been placed in the immediate vicinity of nodes G and H, offering one provisional normalization of the distal part of the phenogram. This assumes orthology of the HZ and SC gene pair, as suggested by Casimir et at. (1988 Table I. through node F, corresponding to the rat-human ancestral split. Patterns of Sequence Conservation in t/z Rod Domain-Differences in the rates of replacement site sequence divergence between Sl and rod domains are apparent from Table  I, A and B, and may reflect different levels of functional constraint within different portions of the encoded polypeptide. In an effort to further localize MHC-coding regions subject to extraordinary selective pressure, we focused further attention on our MHC amino acid sequence alignments. Fig. 5A shows the entire human embryonic rod sequence, arranged in the manner of McLachlan and Karn (1983) to highlight the 7-and 28-amino acid repeats that characterize this largely a-helical portion of the polypeptide. The numbers of amino acid residues found to differ in five pairwise comparisons with other published vertebrate rod sequences are tabulated to the right of the sequence. This analysis reveals the extraordinarily high level of sequence conservation in three regions of the vertebrate rod, centered around repeats 20, 34, and 35. The uniqueness of the structure of these subdomains is apparent in Fig. 5B, where the individual amino acids are replaced by symbols representing hydrophobic or charged side groups. When charge or hydrophobicity alone is considered, even the comparison between the human embryonic and the nematode sarcomeric sequences shows extensive conservation in these subdomains. Repeats 20 and 34 are both dramatically different from the consensus structure shown at the bottom of the diagram.
The conservation of charge distribution among the rod sequences is extraordinary, as demonstrated by the numerical columns to the right of the sequence diagrams. Although 42.5% of the residues in the HEMHC rod are charged, fewer than 8% of the substitutions in a comparison with the chicken embryonic sarcomeric (CE) rod result in changes in the charge distribution (not shown). Furthermore, in a similar comparison involving the more distantly related nematode (uric 54) (NS) rod sequence, the mutation rate among charged residues is approximately three times as high in the S2 region (top 10 lines of column A) as in the remainder of the rod. This effect disappears in comparison between sarcomeric and nonsarcomeric rods, as shown for the human embryonic sarcomeric/ chicken smooth muscle (HE/CSm) comparison in column B of Fig. 5B.

Secondary
Structure Predictions for the Sl Domain--In contrast, substitution site mutations appear at dramatically different position-dependent rates throughout the Sl domain. The deduced amino acid sequence of the human embryonic MHC Sl domain, its simplification by amino acid subclass (by the algorithm shown), and its further simplification based on the hydropathicity scale of Kyte and Doolittle (1982) are aligned in Fig. 6, rows a, b, and c, respectively. Colored backgrounds are used to highlight subdomainal patterns of differential sequence conservation. In the first row there are seven independent regions where primary structure is rigidly conserved among all MHCs compared (blocks of eight or greater contiguous residues against a white background). The patterns of conservation depicted in the second and third rows buttress five of these regions, drawing attention to the conservative nature of most of the flanking amino acid substitutions. Amino acids shown against a blue background on the second row depict positions of nonconservative substitution among even the closely related vertebrate MHCs. Secondary structural domains predicted by the independent methods of Chou and Fasman (1978) and Garnier et al. (1978)  A notable feature of the MHC amino acid sequence alignment on which Fig. 6 is based (see Fig. 7) is the concentration of insertion/deletion mutations in the two regions corresponding to the protease-sensitive sites of rabbit skeletal MHC Sl. Elsewhere in the aligned Sl domain sequences, the scarcity of such "gap elements" implies the existence of selective pressure to conserve the relative positions of the highly conserved subdomains backlighted in white in Fig. 6, row a. The largest region of low level primary sequence conservation corresponds to HEMHC residues 290-455. In our search for intergeneic similarities over this region, it was noted that both hydropathicity and secondary structural propensities were conserved among distantly related MHC amino acid sequences.
In Fig. 7, we have aligned the amino acid sequences of the Sl regions of five MHCs (human embryonic skeletal, chicken skeletal, nematode sarcomeric, chicken smooth muscle (nonsarcomeric), and slime mold (cytoplasmic)), and we have subjected all five to secondary structural analysis by the methods of Chou and Fasman (1978) and Garnier et al. (1978).
The following discussion only applies to regions where both the Chou and Fasman and the Garnier analyses showed a consensus. In pairwise comparisons between these aligned MHCs, many regions of agreement in predicted secondary structure correspond to regions of relatively little primary structural homology (e.g. a-helices Al, A4, A5, and A6). In total, 16 a-helical and 9 P-sheet domain positions are in agreement among all the sequences analyzed.

Relationship to Other Cloned DNAs
Genomic and complementary DNA fragments encoding MHCs from a wide range of cellular sources have recently been characterized (Hammer et al., 1987;Karn et al., 1983;Molina et al., 1987;Strehler et al., 1986;Warrick et al., 1986;Watts et al., 1987). The cytoplasmic MHCs synthesized by two unicellular organisms, Saccharomyces cerevisiae (Watts et al., 1987) and Dictyostelium discoideum (Warrick et al., 1986), are encoded by single copy intron-lacking genes. To our knowledge, clone HEMHC-2 represents the first solitary cDNA fragment encoding the full length of a MHC of muscle origin. The importance of this fact lies in the desirability of expressing sarcomeric MHC coding sequences in nonmuscle cells (de Vos et al., 1988;Maruyama and MacLennan, 1988 complete amino acid sequence of the HEMHC S, domain deduced from the nucleotide sequence in Fig. 2. b, simplification of sequence in a by side chain type, where: A = A,G,P,S,T; C = C; D = D,E,N,Q; F = F,W,Y; H = H,K,R; and I = I,L,M,V. c, unidimensional depiction of regional hydropathicity based on the sequence in a, scale of Kyte and Doolittle (1982) normalized to I = 1, and local averaging over a window of five amino acids. Hydropathicity: (h) > 0.55 = *; 0.54 > (/I.) > 0.46 = -; 0.45 < (h) = %. d, secondary structure as predicted by the method of Chou and Fasman (1978). e, secondary structure as predicted by the method of Garnier et al. (1978). The broken line between rows d and e highlights regions of complete agreement. In a and b the level of conservation observed in sequence alignments (see Fig. 7 (on the order of 40%) of the amino acids in Sl are rigidly constrained, while the remaining positions can tolerate varying degrees of conservative amino acid substitution, the overall divergence rates might appear to plateau (as a corrected divergence on the order of 60% is approached). Our alternative hypothesis posits that approximately the same proportions of highly and poorly conserved amino acids exist in the Sl and rod domains of all sarcomeric MHCs but that the nature of thick and thin filament register in the myofibril imposes a further functional constraint on the rod sequences of only the sarcomeric MHCs. Thus, only among sarcomeric MHC gene sequences will overall divergence be found to occur at similar rates in Sl and rod-encoding domains. In any event, inconsistency in domainal mutation rates will influence calculations of relative divergence times principally among the most distantly related MHC genes compared. Fig. 4 suggests that the mammalian slow skeletal (cardiac @) gene diverged from a common ancestor of the mammalian embryonic, neonatal, and adult fast genes more than 500 million years ago (node D). These data further suggest that the genomic reorganization events that brought about the physical segregation of the cardiac/slow and skeletal/fast MHC gene clusters (Saez et al., 1987) ocurred between nodes D and E in the phenogram, approximately 400-600 million years ago. The data summarized in Fig. 4 indicate that the avian MHC gene expressed at the highest level in 12-14-day chick embryonic  muscle is most likely to be orthologous to the human adult IIa MHC gene (HA). Furthermore, the human embryonic/IIa gene divergence long predated the separation of avian and mammalian species. At least seven differentially regulated MHC genes have been identified in the chicken genome , all of which exhibit mutual sequence divergence levels (over the first three coding exons) markedly less than that found in the human-chicken embryonic MHC cDNA sequence comparison. (The coding sequence of one of these genes corresponds to the CE sequence in Table I.) This suggests that the chicken MHC gene family has expanded in number since the divergence depicted at point E in Fig. 4. There is insufficient data at present to correlate the timing of these gene duplication events with that of the mammalian IIa (HA) and neonatal (RN) MHC gene divergence (point H or I in Fig. 4) A consensus between the Chou and Fasman (1978) and the Gamier et al. (1978) predictions for each MHC was obtained, exactly as illustrated in Fig. 5 for HEMHC. Numbering is based on the HEMHC amino acid sequence. Two marker residues labeled by the ATP analogs N-(4-azido-2-nitrophenyl)-2-aminoethyl diphosphate (NANDP) and [5,6-H]UTP are highlighted by the letters N and U, respectively (Okamoto and Yount, 1985;Atkinson et al., 1986). The reactive sulfhydryls SHl and SH2 are labeled accordingly (Yanagisawa et al., 1987). The consensus u-helical and b-sheet domains referred to in Fig. 7 are enumerated below the sequence alignments. Color code: green, a-helix; orange, P-strand; and purple, p-turn.
involving approximately one-third of the reported MHC coding sequence  complicates the reconstruction of the duplication events in the evolutionary history of the chicken locus. However, events of the latter sort may help explain why quantitative comparisons involving the chicken embryonic cDNA sequence generated the anomalous length of the distal branch leading to point CE in Fig. 4 Warrick et al., 1986) has led to the hypothesis that a primordial MHC gene sequence arose through serial duplications of a 28-codon block, followed at some later point by at least four duplications of a 197-codon block ((28 x 7) + 1 "skip residue").
Furthermore, this higher order periodic distribution of charged residues within the rod domain has been taken as evidence for a 98-residue stagger between interacting myosin rods in the thick filament (McLachlan and Karn, 1983). Finally, the higher level of repetition seen on comparison matrices in a quadrant corresponding to the S2 portion of the nematode MHC B amino acid sequence has been taken as evidence for a comparatively greater level of sequence divergence (from the "primordial" 28-residue repeat) in the LMM region (McLachlan, 1983). The data shown in Fig. 5B demonstrate greater positional and numerical conservation among charged residues in the LMM and hinge than in the short S2 subdomains of distantly related sarconeric MHCs. This finding suggests the emergence of a further functional constraint upon only the LMM and hinge subdomains after the divergence of genes encoding sarcomeric and nonsarcomeric MHCs. Sequence conservation among these residues would operate to preserve both the 197residue periodicity and the potential for a %-residue optimal stagger. Optimal stagger distance represents a candidate sarcomeric MHC-specific functional constraint.
(The necessity for register between positively charged residues in one myosin rod and negatively charged residues 98 amino acids displaced in a second should manifest as selective pressure against evolutionary drift of charged residue distribution toward that of a consensus 28-residue repeat.) A similar argument should apply for regions with critical roles in the initiation of myofibrillogenesis. Repeats numbered 20 and 34 in Fig. 5A are found in this comparison of five vertebrate sarcomeric MHC sequences to undergo amino acid substitutions at one-fourth to one-sixth the average rate among repeats, suggesting that these subdomains have unique roles in the overall function of the corresponding myosin rods. Sl Domain-Evolutionary comparisons of this sort yield evidence of a different series of functional constraints upon the primary structure of the MHC Sl domain. Although regional differences in amino acid sequence conservation are readily appreciated in the MHC Sl domain, it has been shown that there is comparatively little conservation of charged amino acid numbers and positions among distantly related MHC sequences (Warrick and Spudich, 1987). The color scheme used in Fig. 6 draws attention to primary structural domains that were established at various time points in the evolution of the human embryonic MHC sequence. For instance, all of the letters which appear on a urhite background correspond to primary structural domains common to all functional descendants of the ancestral MHC encoded by gene A in Fig. 4, while letters which appear against a brown background represent domains that were established as MHC gene C evolved to MHC gene D.
Most of the regions of rigid primary structural conservation are flanked by blocks of amino acids that have undergone only conservative substitutions.
These moderately conserved flanking domains generally correspond to regions of rigid conservation among the closely related vertebrate sarcomeric (vs) MHCs, suggesting that selective pressure against nonconservative substitutions dramatically slows the rate at which any substitutions are fixed. There are regions where primary structural conservation exists among vsMHCs, while no detectable propensity for conservative substitution exists among more distantly related sequences (e.g. amino acids, 13-29 and 729-737). Such regions represent protein structural domains with roles possibly unique to the function/regulation of the vsMHCs. Among those residues at positions where variability is seen even among vsMHCs, Fig. 6 allows the identification of that subset for which the least conservative substitutions have been documented (residues against a blue background in Fig. 6, row b). In the absence of any detailed tertiary structural information regarding a native MHC globular head, these nonconservatively substituted sites represent the best candidates for roles in fiber type-specific (acto)myosin ATPase rate control.
Computer-assisted methodologies for extrapolating from protein secondary to tertiary structural predictions are presently restricted to helix-rich (Cohen et al., 1986) and parallel a/@ (Cohen et al., 1983) domains. Based on the assumption of topographic similarity among functionally similar but distantly related proteins, evolutionary comparisons have been utilized to strengthen secondary structural predictions, usually in support of tertiary structural models (Crawford et al., 1987;Yanagisawa et al., 1987). Fig. 7 shows the consensus among secondary structural predictions for sequences as distantly related as those of the human embryonic sarcomeric and the slime mold cytoplasmic MHCs. All of these MHCs stoichiometrically bind light chains, and the resulting myosins all possess actin-activated ATPase activities. However, contractile activity is differently regulated among each of the three classes of myosin represented in these comparisons. Furthermore, there is evidence that the topography of light and heavy chain interactions differs among these myosins (Okamoto et al., 1986). Therefore, some of the predicted secondary structural differences shown in Fig. 7 may correspond to tertiary structural regions that differentially mediate regulatory signals in cytoplasmic, smooth, and skeletal muscle MHCs. That proportion of the Sl domain for which the predictions in Fig. 7 are in complete agreement can be taken as the most conservative data base for use in the formulation of MHC tertiary structural models.
Finally, it is of interest to note that among vertebrate sarcomeric MHCs for which complete amino acid sequences of this region are available (see Fig. 5), only residues 136-138 exhibit variability within the immediate primary structural proximity of residues bound by ATP analogs (Okamoto and Yount, 1985;Atkinson et al., 1986;Sutoh, 1987). As additional full length MHC cDNA sequences become available, this map should be of use in the correlation of domainal sequence microheterogeneity with the actin-activated ATPase activities of the various tissue and developmental stage-specific isomyosins.