Structural and Transcriptional Analysis of a Chicken Myosin Heavy Chain Gene Subset *

Recently we have isolated a large number of chicken myosin or myosin-like heavy chain genes. Seven of these genes were placed into a subset based upon their hybridization patterns. In the present study, the sequence of the 6’ end of one of the myosin heavy chain (MHC) genes, N127, was determined and compared with the 5‘ end sequences of the other six MHC genes in the subset. The comparison revealed that the three exons encoding the amino termini of the protein are highly conserved. The sequence analysis shows that a localized correction event occurred in and around a domain of the nucleotide-binding site, as the exon encoding this site and the preceding intron are very highly conserved among the seven genes. The sequence of the promoter and 5”untranslated region of N127 is presented. The analogous regions for N124 and N125 have now been sequenced and are also presented. As is the case for all the other known MHC genes, the 6“untranslated regions are split by large introns. The promoter and 5”untranslated regions are compared with two previously characterized chicken MHC genes (N116 and N118) to determine the sequence similarities and differences that might underlie the differential expression of the family’s members. To confirm and extend previously published results of the expression of these genes, t r a ~ r i p t s ~ c i f i c probes generated from the 5’ region of six of the seven genes were used to determine in which muscle(s) the corresponding mRNAs were present. The data show that despite the very close structural homologies, each of the genes for which a unique probe could be prepared exhibits a unique pattern of expression.

Recently we have isolated a large number of chicken myosin or myosin-like heavy chain genes. Seven of these genes were placed into a subset based upon their hybridization patterns. In the present study, the sequence of the 6' end of one of the myosin heavy chain (MHC) genes, N127, was determined and compared with the 5' end sequences of the other six MHC genes in the subset. The comparison revealed that the three exons encoding the amino termini of the protein are highly conserved. The sequence analysis shows that a localized correction event occurred in and around a domain of the nucleotide-binding site, as the exon encoding this site and the preceding intron are very highly conserved among the seven genes.
The sequence of the promoter and 5"untranslated region of N127 is presented. The analogous regions for N124 and N125 have now been sequenced and are also presented. As is the case for all the other known MHC genes, the 6"untranslated regions are split by large introns. The promoter and 5"untranslated regions are compared with two previously characterized chicken MHC genes (N116 and N118) to determine the sequence similarities and differences that might underlie the differential expression of the family's members.
To confirm and extend previously published results of the expression of these genes, t r a~r i p t -s~c i f i c probes generated from the 5' region of six of the seven genes were used to determine in which muscle(s) the corresponding mRNAs were present. The data show that despite the very close structural homologies, each of the genes for which a unique probe could be prepared exhibits a unique pattern of expression.
The myosin heavy chain (MHC)' isoforms of vertebrates are encoded by multigene families (1)(2)(3)(4) consisting of at least * This work was supported in part by National Institutes of Health Grant HD21893, United States Department of Agriculture Grant CRCR-1-1729, and by a grant from the Missouri Affiliate of the American Heart Association. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

503467.
$Supported by National Institutes of Health Training Grant HL07382.
Established Investigator of the American Heart Association. To whom correspondence should be addressed Dept. of Pharmacology, University of Cincinnati College of Medicine, 231 Bethesda Ave., Cincinnati, OH 45267-0575. seven genes (1) and possibly more (5,6). Different members of the family are expressed in specific tissues and at specific times during development (7-11); thus, the tissue-and developmental stage-specific expression of the MHC genes offers an excellent system for studying gene regulation.
The mechanisms involved in the expression of the MHC genes remain largely obscure. Available data indicate that the expression of a particular MHC isoform can be regulated by various hormonal (9,(11)(12)(13) and physiological (14) signals and/or by different innervational (15)(16)(17) patterns. Although it is likely that gene expression in this family is regulated at both the transcriptional and post-transcriptional levels (18-23), activation of the transcriptional apparatus seems to play the primary role in the expression of the genes (22).
It is known that basal promoter elements for RNA polymerase I1 are found within the 5"flanking region of a gene and are involved in transcriptional regulation (reviewed in Ref. 24). One element, often referred to as the Goldberg-Hogness or "TATA" box, is located approximately 31 bases upstream from the start of transcription (25). This element is responsible for the correct transcriptional initiation at the cap site (24,(26)(27)(28). Further upstream, between -50 and -105, another basal promoter element is found which may have either a "CAAT" consensus sequence (29, 30) or a GC-rich (GGGCGG or CCGCCC) sequence (31). This element has been shown, by deletion and linker scanning mutations, to play a role in controlling the frequency of initiation (30-36). The GC-rich region can also bind proteins that have the ability to up-regulate gene expression (reviewed in Ref. 37). A third, cis-acting element affecting transcription, the enhancer, may be located either upstream or downstream from the cap site and seems to exert its effect (relatively) independently of distance and/or orientation from the cap site (38,39). Enhancers have been shown to be tissue-specific (40,41) and to mediate steroid hormone activation (42). At a minimum, in order to understand the developmental expression of the MHC genes, it is necessary to know the overall sequence organization of the gene being expressed and be able to detect that gene's specific transcript. Also, it is particularly important to determine the sequences of different promoters, so that the similarities and differences among these sequences can be catalogued and hopefully correlated with the genes' expression. We have addressed these problems by focusing our attention on a group of seven chicken MHC genes (5, 6, 43, 441, which were placed into a subset based upon their structural homologies. Here, we present data characterizing the 5' end of a member of this subset and compare its tr~scription&l pattern to other subset members. In addition, we compare the promoter and 5"untranslated regions of five of these genes in an initial attempt to locate structural regions that may be involved in the tissue-and developmental stage-specific expression of the MHC genes.

EXPERIMENTAL PROCEDURES
The procedures used in isolating and mapping the genomic clone have been described previously (5).
Isolation of DNA Templutes for Sequencing by the Rapid Deletion Method-The procedure developed by Dale et al. (45) for generating sequential overlapping clones for DNA sequencing was employed. 80 ng of a 20-base primer, which is complementary to a region of the M13 phage that encompasses the EcoRI site of the cloning region as well as the 10 bases immediately upstream of the EcoRI site, was annealed to 4 pg of single-stranded template DNA. The insert sizes ranged from 0.3 to 4.5 kilobase pairs. After annealing, the DNA was digested with 10 units of EcoRI at 45 "C for 1 h. The digestion products were loaded onto a 0.7% agarose gel and electrophoresed for 2 h to confirm that at least 95% of the DNA had been cleaved by the enzyme. The DNA was then digested in two separate reactions with either 1.5 or 4 units of T4 DNA polymerase I at 37 "C. Aliquots from each reaction were taken at various time points (2-70 min) and placed at 65 "C for 10 min to inactivate the enzyme. The aliquots were combined, and a 7-20-base deoxyguanosine "tail" was added using 1 unit of terminal deoxytransferase and 3 pl of 10 mM dGTF' . The 20base primer, which has a small poly(dC) tail, was annealed to the DNA. The DNA was ligated using 1 unit of T4 DNA ligase at room temperature for at least 1 h. The reaction was terminated by the addition of 1 pl of 0.5 M EDTA and the volume increased to 30 pl with water. Finally, 1 pl of a 1/10 dilution of the ligation mixture was used to transform 0.3 ml of JMlOl competent cells.
The single-stranded DNA templates were isolated using the procedure provided by Amersham Corp. (46) with the following modifications. The template DNA was p~i p i~t e d by adding 0.5 volume of 7.5 M NHIOAc (pH 5.4) and 2 volumes of absolute ethanol. The solution was incubated at -70 "C for 15 min or at -20 "C overnight. The DNA was collected by centrifugation at 4 "C for 10 min. The pellet was washed with 80% ethanol, dried, and resuspended in 50 pl of water.
The approximate sizes of the resultant deletions were determined by separating the DNA templates on a 0.6% agarose gel for 3-4 h at 120 mA in Tris borate buffer (45). Bacteriophage with and without the insert were used as markers. Template DNAs displaying sequential deletions of approximately 150 bases were sequenced using the dideoxy sequencing method (47).
All sequences were analyzed on an IBM XT using the programs developed by Queen and Korn (48).
RNA Isolut~on, SI Nuclease, and Primer Extensions Analyses-These procedures were performed as described previously (5,43). Rot Blots-3 pg of total RNA from each of the muscle sources was dissolved in 50 pl of 10 mM Tris (pH 7.5), 2 mM EDTA. 30 pl of 37% formaldehyde (Fisher) and 20 pl of 20 X SSC (1 X SSC = 0.15 M sodium chloride, 0.015 M sodium citrate) were added and the RNA heated at 60 "C for 15 min, after which 700 pi of ice-cold 20 X SSC was immediately added. 400 pl of this was applied to nitrocellulose which had been rinsed in 20 X SSC; the remaining 400 pl was used to make serial dilutions using 20 X SSC as diluant.
The paper was dried at 80 "C for 40 min, rinsed briefly in 1 X SSC, and washed at 37 "C for 30 min in 50 mM Tris (pH 7.5), 10 mM MgC12. 40 pg of RNase-free DNase (Worthington) which had been purified further by passage over agarose-5'-(4-aminophenylphos-phory1)uridine 2'(3')-phosphate (Miles Laboratories) was added to 7 ml of this buffer and the paper treated for 30 min at 37 "C. After a 10-min incubation at 70 "C, to inactivate the nuclease, the buffer was changed and the paper incubated again at 65 "C for 10 min. The paper was then prehybridized in 6 X SSC, 0.1% sodium dodecyl sulfate for 30 min, after which the 32P-labeled oligomer was added to a final concentration of 1 X lo6 cpm/ml. Hybridization and autoradiography were carried out as described previously (43).

RESULTS
Characterization and Sequencing Strategy-The EcoRI restriction map of the bacteriophage encoding N127 is shown in Fig. 1. N127 encodes the 5' end of a MHC gene belonging to the subset of seven clones characterized previously (6). The EcoRI fragments of N127 that contain the sequences encoding the amino-terminal end of the protein were determined by hybridization using the 423-base pair PstI/Sau3A fragment of N116, which was shown previously to encode the 5' end of an adult MHC gene (43). The EcoRI fragments were sub- The reasons for these subgroups are unknown but are consistent with the hypothesis that this gene family arose through serial gene duplication events. The Exons-The first two exons shown in Fig. 2 encode part of the 5'-untranslated region of the gene and have diverged to a much greater extent than the sequences which encode the translated portions of the mRNAs. The reason for the lack of conservation is unknown but may reflect the differential expression of the genes in terms of specific sequences underlying tissue-or stage-specific expression. Alternatively, this lack of conservation may reflect a lack of selective pressure on a relatively unimportant part of the gene sequence. Exon 2 appears to have been deleted from N116.
Finally, exon 5 is completely conserved, except for two silent site changes in N124. This exon is known to contain part of the ATP-binding domain (43,52,531, and the almost absolute conservation probably reflects a 1ocalQed correction mechanism which has occurred to maintain sequences critical for the proteins' function. The Introns-The first two introns occur within the 5'untranslated regions of the genes, with the exception of N116 where only one intron is present. We have determined the entire sequence of these introns (except for NlOl and N122); only the canonical intron junction sequences and intron lengths are shown in Fig. 2 because of the introns' sizes. Very little sequence Conservation is seen within the first two introns; however, the homologies that could be detected are shown in Fig. 6 and Table I. The sequence conservation of intron 3 strongly emphasizes the aforementioned subgroups: N101/N122, N118/N125, N124/N127, and N116. Numerous insertions/deletions are evident among the four subgroups, but within any particular subgroup, very few differences can be observed.
Intron 4 shows a high degree of conservation among all seven genes. T o date, this is the only MHC intron where we have noted this high sequence conservation. Whether the intron serves a particular function or is conserved because of its proximity to the "corrected" fifth exon referred to above is unknown presently. It is interesting to note that within the coding or translated region the three intron positions (introns 3, 4, and 5 ) are conserved among the seven genes; these positions are also conserved in all vertebrate MHCs for which sequence information is available (43,54,55).
Transcriptional Pattern of N127"The close homologies apparent in the members of the subset are striking and raise the possibility that some of the genes do not encode unique polypeptides, but rather are merely alleles of one another or reflect the polymorphisms known to be present in the MHC gene family (56). In order to explore these possibilities, we have expanded upon our previous transcriptional analyses, which were carried out on N116, N118 {43), and N127 (5) using various methodologies (e.g. N127's transcriptional pattern was determined by Sl analysis ( 5 ) ) . We have shown previously that the Et'-untranslated regions of some of the genes have diverged such that transcript-specific probes can be prepared and used to determine in what tissue and at what developmental stage a particular gene is being expressed (43, 44). Fig. 2 shows a comparison of this region for N127 and five of the subset's members (the analogous region for N122 has not been characterized). The sequences used to generate the various transcript-specific probes are underlined.
The oligonucleotide probe for N127 was end-labeled with 7'4 polynucleotide kinase and [y3'P]ATP and hybridized first, to a "Rot blot" containing mRNA isolated from embryonic, neonatal, and adult muscles. The results are shown in Fig. 3A. The probe hybridized strongly to RNA derived from 18-day in ouo cardiac muscle (CAR, lane 6), adult gastrocnemius (GAS, lane 15), and adult posterior latissimus dorsi (PLL), lane 16). The probe also hybridized to mRNA from other muscle tissues; however, the level of hybridization was at or near background levels. Therefore, we were unable to discern at this point if, in fact, N127 was being expressed only in 18-day in ouo cardiac, adult PLD, and adult gastrocnemius muscles. In addition, the hybridization to the 18-day cardiac muscle conflicted with our previous data for this gene's transcript (5).
To confirm these results and resolve the conflicts between these sets of data, a primer extension experiment, which is more sensitive than the Rot blot, was performed. The radioactively labeled oligonucleotide probe was hybridized to the mRNA samples and extended to the 5' ends of the mRNAs using reverse transcriptase. As is the case for the Rot blot, the results define where the homologous transcripts are expressed but, in addition, reveal the length of the 5"untranslated region and thus are diagnostic only for that particular transcript. In addition, hybridization conditions can be adjusted such that the reactions can be driven to completion using excess probe and can, as a consequence, reveal very low transcript levels. The data are shown in Fig. 3B. Only four samples had extended products, all of which were 92 bases in length. The positive RNAs were neonatal gastrocnemius (lane 9), neonatal PLD (lane IO), adult gastrocnemius (lane 15), and adult PLD (lane 16). It should be noted that the signals in lanes 9 and 10 were not detectable during a normal time of exposure for the autoradiogram (10 h, -70 "C using DuPont Cronex screens and Kodak XRP-5 film). The data in these lanes are a result of a 96-h exposure. Lane 6, which corresponds to the 18-day in ouo cardiac RNA, had no extended product and showed no signal, even when exposures of >7 days were carried out. However, there was a high level of background throughout the entire lane, indicating that the hybridization observed for this RNA in the Rbt blot experiment was due to nonspecific binding of the probe. Alternatively, the transcript may be present in very low abundance in cardiac tissue, and the primer extended product is masked by the high "noise" that this RNA sample generated.
To confirm that N127 is expressed only in the gastrocnemius and PLD muscles, the end-labeled oligonucleotide probe was annealed in excess to the mRNAs under hybridization conditions such that the reactions were driven to completion. The annealed products were treated with S1 nuclease and electrophoresed on a 12% polyacrylamide 8 M urea gel to detect which RNAs are able to hybridize to, and subsequently protect, the oligonucleotide probe. Previously in this type of analysis, a signal was not detectable in the neonatal tissues. Thus, three times as much neonatal RNA was used (lanes 9 and IO). The data are shown in Fig. 3C and are unambi~ous; N127 is expressed during the neonatal and adult stages in the gastrocnemius and PLD musculature. These results extend and confirm the data concerning the expression of N127 ( 5 ) .
Comparison of the Genes' Transcription-As mentioned above, the transcriptional patterns of other genes within the subset of seven have been determined previously using transcript-specific probes (5, 6, 43) and various methodologies. We thought it useful, based on the above analyses for N127, to repeat the transcriptional analyses using the same methodologies in order that the transcriptional patterns might be easily compared. In addition, the transcriptional pattern of NlOl and N124, which has not been analyzed previously, was determined. Although a transcript-specific probe that could distinguish the two could not be prepared because the genes show a high degree of homology in the 5'-untranslated region (see Fig. e), an oligonucleotide probe for N101/N124 was made. On the basis of the above analysis with N127, we chose to use a combination of Rot blots and primer extensions to carry out a comparative analysis of the genes' expression.
The data are shown in Fig. 4 and illustrate that, despite the close homologies in the exons which encode the amino termini, the genes appear to be unique and are expressed differentially in a tissue-and developmental stage-specific manner.
Of all the genes, N118 appears to be expressed most promiscuously; it is expressed throughout development in a number of different muscle types, although it is expressed at its highest levels during the in ouo stages of development. Adult expression is quite restricted; low levels are present in the adult cardiac and gizzard muscle. The Rot blot (Fig. 4A) and primer extension analysis (Fig. 4B) are in excellent agreement with one another and confirm that this gene is expressed in both skeletal muscle and, in some stages of development, cardiac muscle.
A partial structural analysis of N125 has been described previously (44). A transcriptional analysis has recently been done (73). The data in Fig. 4 show that N125 is expressed predominantly during neonatal development and is probably largely restricted to the fast fibers, as the highest concentrations are present in the pectoralis and PLD muscles. Expression in the adult muscles tested seems to be restricted to the gastrocnemius. Again, the data presented in the Rot blots and primer extensions are completely consistent.
N116 shows a very restricted pattern of expression. The data confirm and extend our previous Northern analysis of the gene's transcription (5). It is expressed predominantly in the adult fast-white musculature, although a small amount of expression in the pectoralis muscles isolated from the 18-day ~ in. ouo and neonatal RNAs can be observed in the primer ' extension analysis (Fig. 4B, lunes 4 and 8). No detectable signals in the corresponding lanes of the Rot blot are present, indicating that the concentration of the transcripts in these tissues is below this assay's detection level.

~
A Rot blot of N127 is shown here for comparative purposes: ' it simply confirms the data present in Fig. 3 and illustrates that the highest concentration of transcripts is present in the adult gastrocnemius and in the adult PLD muscles. It should be noted, however, that in the primer extension analysis for this clone (Fig. 4B) in which all lanes in the autoradiogram were developed for an equal period of time (10 h), no detectable signal was present in the neonatal tissues (lanes 9 and 10). This simply underscores the very low levels of transcripts present in these tissues, and extensive development times are needed to observe even a faint signal in these lanes. We were unable to observe any expression of N101/N124 in the Rot blot. However, low level transcription of the gene(s) was detectable upon prolonged exposure (see legend to Fig. 4) of the autoradiogram resulting from the primer extension analysis. Although the 5"flanking region of NlOl has not been determined, the extended product for N124 should be 93 bases (Figs. 2 and 4B), and this is the size of the fragment observed in Fig. 4B. However, the physiological significance of such a low transcript concentration, undetectable even upon prolonged exposure of the Rot blot, is questionable. In any case, comparison with the other Rot blots and primer extensions indicates that the transcript from N124 (and pos- sibly N101, which could share a similar 5' end to N124) does not play a major structural role in the composition of the tested muscles' sarcomeres. These data confirm and extend our previous analysis (5,6, 43). Although our previous study (43) showed N118's expression to be restricted to the embryonic fast-white muscles, the data were obtained using the (relatively) insensitive Northern blot technique and assaying 14-day in ouo and adult RNAs. The low concentration of transcripts present in the chick embryonic cardiac muscle at day 9 (lane 3, Fig. 4, A and B) is probably maintained through the 14th day and was not detected in the Northern blot. Similarly, the very low levels observed in the adult cardiac tissues (lane 18, Fig. 4, A and B) would also not be detected by Northern analysis.

Identification of the Transcriptional Initiation Sites
and Promoter Elements-The primer extension experiments for N127 (see Fig. 3B) indicate that the extended product is 92 bases long. Taking into account the position where the primer annealed to the mRNA (see Fig. 2), the entire 5"untranslated region is 105 bases in length. Previous data have shown that the 5"untranslated regions of the MHC genes are interrupted by introns (43,44,52,53); therefore, to determine where transcription of N127 begins, the primer extension product was isolated from an 8% polyacrylamide gel and sequenced according to the procedure of Maxam and Gilbert (58). This sequence was compared with the gene's sequence (Fig. 2) and the transcriptional start site determined.
Using similar techniques, the transcriptional initiation sites have been determined previously for N116, N118 (43) Fig. 3. Panel B, primer extension analyses were carried out as described previously (43); equal amounts of RNA (15 pg) were used in each of the lanes. The autoradiograms for N118, N125, N116, and N127 were the results of a 10-h exposure at -70 "C with DuPont Cronex screens and Kodak XRP-5 film. The exposure for N101/ N124 was carried out for 72 h using identical conditions. The organization of the 5"untranslated regions is shown schematically on the right. The numbers adjacent to the boxes on the right side of the diagram correspond to the size of the extended primer.
N125 (73). In addition, the putative start site for N124 has been identified by a computer search. The structures of the five regions are shown in Fig. 5. (The analogous regions for NlOl and N122 have not yet been characterized.) The CAAT and TATA promoter elements as well as the cap sites (+1) are indicated. The promoters are aligned to maximize the homologies, which are striking. On the basis of these homologies, the promoters have been placed into three groups. Group 1 contains N116, N127, and N124, group 2 contains N125, and group 3 contains N118. It is interesting to note that these groupings are reflected in the timing of the developmental expression of the genes. N116 and N127 are expressed only in the adult stage, and each has three possible CAAT elements (Fig. 6). N125 is expressed mainly in the neonatal stage and has two CAAT elements, while N118, which is expressed predominantly in the embryonic state but is seen throughout development in various muscle tissues, has only one.

DISCUSSION
The results presented here show that the organization of the seven chicken MHC genes at the 5' ends is conserved. In the coding region, all of the genes are interrupted by introns at the same position, while in the 5'-untranslated regions, all of the MHC genes studied to date are interrupted by introns at highly conserved positions with the exception of N116. It would appear that the second untranslated exon has been deleted in N116; alternatively, this exon was inserted into a gene(s) that subsequently underwent a duplication(s).
We favor the first hypothesis, because other MHC genes from different species also have two exons in the 5'-untranslated region (11,59). The 5"untranslated regions of the genes have diverged and have proved to be useful for generating tran-script-specific probes. The data show clearly that (with the possible exception of N101) the genes are not polymorphic variants or alleles of one another because each exhibits a unique pattern of transcription.
All of the gene's promoters analyzed contain a CAAT and TATA box at the expected locations, and the sequences in and around these regions are highly conserved. However, when the promoter sequences of the cardiac a-MHC genes from rat (59) and rabbit (60) are compared with the five chicken promoter regions shown, no homologies were observed. Furthermore, a 400-base region, which is 80% homologous between the rat and rabbit MHC 5' sequences (59), is not found in any of our chicken genes. Recently, a rat embryonic skeletal muscle MHC promotor was determined (55). When the sequence from -303 to -1 of the rat promoter was compared with the chicken promoters, little sequence conservation could be observed. The highest degree of homology detected was a 1 2 for 12 match between the rat promoter and clone N118 and was centered around the TATA box.
Are the promoter subgroupings the same as the coding region subgroupings (see Figs. 2 and 5)? The answer to this question is quite complex. The coding region subgrouping paired N118 with N125 and N124 with N127; however, in the promoter subgroupings, N118 and N125 are in different subgroups, while N124 and N127 are still grouped together. The reason for the change in subgroupings is currently unknown, but may reflect a need to generate numerous MHC genes all under different developmental controls. For example, it may be that, in the N118/N125 pair, the parental gene was duplicated, and then one of the genes in the pair was placed under different developmental control by changing the promoter's structure. Perhaps in the N124/N127 pair, following duplication, one of the promoters was changed in such a way as to alter the tissue-specific expression of the gene without changing the developmental stage-specific expression. This hypothesis predicts that if N124 is expressed, its transcripts should be found only in an adult tissue. However, to unravel the complex patterns of gene duplication, which have led to the current structure of the chicken MHC genes, it will be necessary to acquire more sequence data for the other family members (5, 6).
We have analyzed the 5"flanking regions of the genes in terms of short homologies. Fig. 6 schematically depicts the organization of the genes' 5' ends and the locations of short regions (>15 bases) of homology (>75%) which are shared between two of the genes. The nucleotides present in each of the homologies are shown in Table I. Various sequences which have been hypothesized to function in different aspects of transcriptional control (61)(62)(63)(64)(65)(66) are represented in these homologies. For example, regions 20 and 20a contain alternating purine/pyrimidine residues that may represent Z-DNA (62), although what role (if any) this structure plays in regulation remains to be determined (62)(63)(64). Adenine-and thymidinerich regions are also present in the 5' ends of the genes (see Fig. 6). These areas presumably represent bends or kinks in the DNA (65) which may be sites of protein binding or areas of disfavored nucleosome formation. In yeast, such sites represent upstream promoter elements of constitutively expressed genes (66). The exact role of these regions in the MHC genes remains to be determined.
The 5' ends of the genes were searched for proposed musclespecific sequences (67)(68)(69). Only one was detected at 100% homology, CATTCCT this sequence is found in a number of muscle-specific genes (e.g. actin, troponin I, troponin T, and myosin light chain) from a variety of species. In the MHC genes, the sequence is located at -193 in N116, -418 in N125, -487 in N118, and -610 in N127. The sequence is also located at +70 in N124, +lo3 in N118, +373 in N125, +433 in N127, and +2052 in N116. Therefore, if this sequence is involved in the transcriptional control of the contractile protein genes, it seems that its effect is not dependent upon an absolute location relative to the transcriptional start site.
Other regions besides the 5"flanking and untranslated regions may also be involved in the expression of the MHC genes. Intragenic regions have been implicated in the regulation of other genes (40,70). However, the short regions of homology catalogued herein are logical candidates for cisacting regulatory elements (71, 72) and provide a starting point for the functional elucidation of sequences that are involved in the controlled expression of the MHC genes.