The Alkali Light Chains of Human Smooth and Nonmuscle Myosins Are Encoded by a Single Gene TISSUE-SPECIFIC EXPRESSION BY ALTERNATIVE SPLICING PATHWAYS*

Human smooth muscle and nonmuscle cells express closely related myosin alkali light chains which are different from the isoforms present in striated muscle tissues. To date no information on the amino acid se- quence of these mammalian nonstriated muscle iso- forms has been available. We have isolated full-length cDNA clones encoding the nonmuscle (lym4) and smooth muscle (GT6) myosin light chains (MLCs) from cultured human lymphoblasts and heart aorta smooth muscle cells, respectively. Here we present the com- plete nucleotide sequences for both cDNA clones, together with the deduced amino acid sequences for the peptides. Both cDNAs contain the same open reading frame for 151 amino acids with 5 amino acid differ- ences located in the C terminus. These differences are encoded by a block of 44 nucleotides which is present only in the smooth muscle (SM) mRNA. To identify the human gene coding for the two MLC isoforms, we have isolated and sequenced the nonmuscle (NM)/SM MLC gene, together with several intronless pseudogenes. A single functional gene was found containing 7 exons which are utilized for the coding information of the SM MLC mRNA. In contrast, the NM MLC mRNA does not contain sequences encoded by exon 6 which corresponds to the 44 nucleotides expressed in SM mRNA. This genomic configuration suggests that both the smooth muscle and nonmuscle MLCs in man are generated from the identical primary transcript by alter- native splicing pathways taking place

Human smooth muscle and nonmuscle cells express closely related myosin alkali light chains which are different from the isoforms present in striated muscle tissues. To date no information on the amino acid sequence of these mammalian nonstriated muscle isoforms has been available. We have isolated full-length cDNA clones encoding the nonmuscle (lym4) and smooth muscle (GT6) myosin light chains (MLCs) from cultured human lymphoblasts and heart aorta smooth muscle cells, respectively. Here we present the complete nucleotide sequences for both cDNA clones, together with the deduced amino acid sequences for the peptides. Both cDNAs contain the same open reading frame for 151 amino acids with 5 amino acid differences located in the C terminus. These differences are encoded by a block of 44 nucleotides which is present only in the smooth muscle (SM) mRNA. To identify the human gene coding for the two MLC isoforms, we have isolated and sequenced the nonmuscle (NM)/SM MLC gene, together with several intronless pseudogenes. A single functional gene was found containing 7 exons which are utilized for the coding information of the SM MLC mRNA. In contrast, the NM MLC mRNA does not contain sequences encoded by exon 6 which corresponds to the 44 nucleotides expressed in SM mRNA. This genomic configuration suggests that both the smooth muscle and nonmuscle MLCs in man are generated from the identical primary transcript by alternative splicing pathways taking place in a tissue-dependent manner.
Myosin is a principal component of muscle proteins and is also essential in the maintenance of cellular structures in all other types of cells. In the contractile apparatus of various muscle tissues myosin interacts with actin to allow contraction under hydrolysis of ATP. The protein consists of six * Supported by Deutsche Muskelschwundhilfe e.V. and Deutsche Forschungsgemeinschaft. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence ($ reported  subunits comprised of two heavy chains (MHC)' and four light chains (MLC). It characteristically has a long coiled ahelical tail and two heads. Each head contains an actin binding site and ATPase activity. The formation of the head structure involves the NH2-terminal half of the two MHCs and one pair of light chains (1). The association of the SOcalled alkali light chains with the MHC head region has been documented (2)(3)(4) and is believed to be involved in the interaction with actin (5)(6)(7). In fast vertebrate muscle it has been shown that the MHCs alone are sufficient for magnesiumdependent ATPase activity which is enhanced by binding to actin (8,9). Recently, it has been demonstrated in chicken smooth muscle that the MLCs might also participate in the ATP binding (10). Many different isoforms of myosin are present in various muscle tissues (see Ref. 11 for review). They are encoded by multigene families and are expressed in tissue-specific and developmental stage-specific manner (12)(13)(14)(15)(16).
In skeletal muscle it has been shown that transcriptional control and alternative splicing mechanisms contribute to the production of different myosins (see Ref. 17 for review). For example, in fast fibers both mechanisms are responsible for the synthesis of MLCl and MLC3 from one genetic locus (18)(19)(20). In cardiac muscle atrial and ventricular MLC isoforms are encoded by distinct genes (50, 52).
Considerably less is known about the structure and regulated synthesis of myosins during the development of smooth muscle and nonmuscle tissues. Protein studies using twodimensional electrophoretic techniques have indicated three myosin isoforms (GM1, GM2, and GM3) in the developing chicken gizzard (21). Each of the three proteins contains a combination of two different MLCs, either L23 and L20 or L17 and L20 in which L20 constitutes the regulatory phosphorylated MLC and L23 or L17 is the alkali MLC (22). The isoform L17 is primarily found in the gizzard of the adult animal whereas L23 is considered to be an embryonic isoform (21). Recently, it has been shown that a common MLC mRNA coding for L23 is expressed in embryonic smooth muscle, embryonic skeletal and cardiac muscle, and in both embryonic and adult brain (23). For smooth muscle MLCs, protein sequence data exist only for the regulatory and alkali MLCs of the ascidian (24), chicken gizzard smooth muscle (15,25), and the regulatory MLC of rat (26). In general, studies of contractile proteins of mammalian smooth muscle have been descriptive. By using the dog and guinea pig as an animal model, it has been reported (27) that there were two electrophoretically distinct SM isoforms (G1 and G2). They share The abbreviations used are: MHC, myosin heavy chain; MLC, myosin light chain; SM, smooth muscle; NM, nonmuscle; SDS, sodium dodecyl sulfate; kb, kilobase(s); Py, pyrimidine. 9009 the same light chains (Lm and LIT) and contain different heavy chains (230 kDa in GI and 200 kDa in Gz). Primate smooth muscle isoforms have been studied in the human myometrium (28). Physiological and hormone-induced hypertrophy as well as changes in contractility were found to be accompanied by modifications in both actin and myosin.
Human and monkey SM myosins were compared in the pregnant and nonpregnant uterus and showed two distinct species with different isoelectric points, one of which became dominant at the end of pregnancy (28). In higher eucaryotes cytoplasmic or nonmuscle (NM) myosins function in intracellular vesicle movement, cell locomotion, cytoplasmic streaming, and in cytokinesis (29-32) specifically in contractile ring formation during cell division (33). In Drosophila studies on cytoplasmic myosin provided circumstancial evidence for a myosin heavy chain gene which is different from skeletal muscle (33). For Dictyostelium discoideum a cDNA encoding the essential MLC has been recently cloned (34). In the chicken it has been suggested that alternative splicing regulates the production of smooth and nonmuscle myosin light chains (35).
As an approach to obtaining more information on the structure and function of alkali MLCs from vertebrate smooth muscle and nonmuscle tissues in general and their tissue distribution and regulated expression in humans in particular, we have purified cDNA clones coding for these MLC subunits.
We report here the isolation and structural analysis of human NM and SM myosin light chain clones derived from lymphoblast and heart aorta cDNA libraries, respectively. In addition, we present sequence information on the human NM/SM gene which indicates by its structural homology to both the NM and SM cDNA sequences that alternative splicing of exon 6, affecting the 3' end of the coding region, regulates the production of either SM or NM myosin light chains.

MATERIALS AND METHODS
Screening of Genomic and cDNA Libraries-A human genomic library prepared in the cosmid vector PJB8 was plated and screened on nitrocellulose or Genescreen Plus filters with the chicken cDNA clone MLCu (36) and the human cDNA clone hA5-13 (37). The probes were labeled with 32P by nick translation (38) to a specific radioactivity of 1 X 10' cpm/pg of DNA. Between 1 and 2 X lo6 cpm of probe was used for hybridization for 18 h at 65 "C in buffer containing 6 X SSC (1 X SSC, 0.15 M sodium chloride, 0.015 M sodium citrate), 5 X Denhardt's solution (39), 0.1% SDS, 50 mM sodium phosphate, pH 6.4, and 20 pg of sonicated and denatured salmon sperm DNA. Washing of the filters was performed at low to moderate stringency in 2 X SSC, 0.1% SDS for 0.5 h at 55 "C. The positive clone SL-1 was purified to homogeneity and chosen for further analysis. cDNA libraries were constructed in X GTll from human skeletal muscle (37) and lymphoblasts (40) as described previously. For the isolation of MLC-related clones screening was performed with 32P-labeled 5-kb BamHI fragment of the cosmid SL-1 under nonstringent conditions with final washes at 2 X SSC, 0.1% SDS at cDNA clone GT10/1-4 (37). Positive clones were purified to homo-50 "C for 30 min. Successive screenings were performed with the geneity by repeated plating. A human cardiac aortic (SM) cDNA library, cloned in X GT11, was purchased from Clonetech and screened with the previously isolated putative SM cDNA clone GTlO/ 1-4 (37) which was radioactively labeled by random priming (reaction kit from Amersham Corp.) to a specific activity of 1-3 X 1 0 ' cpm/pg of DNA. Filters were washed at high stringency (0.1 X SSC, 0.1% SDS for 30 min at 65 "C). Five positive clones were isolated after several rounds of screening, and the clone GT6 was chosen for sequence analysis.
Restriction Endonuclease Mapping and DNA Blot Analysis-Restriction endonucleases were purchased from Boehringer, Mannheim, FRG, and Bethesda Research Laboratories. Restriction analysis of the isolated genomic and cDNA clones was performed using the conditions recommended by the manufacturers and according to standard procedures (41). Southern blot analysis of cosmid DNA or human genomic DNA isolated from normal leukocytes was performed on nitrocellulose as described (42). Hybridizing regions of each clone were mapped with appropriate cDNA probes and subcloned into pUC18 or M13 mp 18/19 vectors by standard procedures (43). Nucleotide Sequence Analysis-Genomic and cDNA clones were sequenced by the dideoxynucleotide chain termination method with [%]ATP (600 Ci/mmol) according to Sanger et al. (44). All fragments were subcloned either in the pUC plasmids or in M13 phage vectors (43). The synthetic oligonucleotide GTCACCCCGACAGGA-TATGCCTCACAAACG was sequenced according to Maxam and Gilbert (45) following 5' end labeling with [3ZP]ATP (3000 Ci/mmol).
Isolation of RNA and Northern Blot Analysis-Total cellular RNA was prepared from HeLa cells and human uterus tissue by homogenization and precipitation in 7 M urea, 3 M LiCl as described elsewhere (46). For Northern blot analysis 20 pg of RNA was denatured with glyoxal, run on 1% agarose gels, and transferred to Pall filters (Pall-Biodyne, Glen Cove, NY) according to the supplier's recommendation and as described by Thomas (47). Filters were hybridized with the 30-mer oligonucleotide which had been synthesized on an automated synthesizer (Applied Biosystems) according to the C-terminal coding sequence of clone GT6. The oligonucleotide was labeled at the 5' end with [32P]ATP (3000 Ci/mmol), and hybridization was performed in 6 X SSC, 5 X Denhardt's solution, 0.1% SDS, 50% formamide at 4 'C overnight. Washing was done in 2 X SSC, 0.5% SDS at 40 "C for 10 min. After exposure on x-ray film, the blot was washed free of signals and rehybridized with nick-translated lym4 cDNA probe and washed in 0.1 X SSC, 0.5% SDS for 30 min at 65 "C.

Isolation and Characterization of the Human Smooth Muscle a d Nonmuscle MLC Gene and Corresponding cDNA Clones-
Without the availability of homologous probes, our search for the human NM/SM MLC gene(s) was based on the assumption that a combination of MLC probes from chicken and man would be sufficiently close in structure to pick up the gene. To this end, we screened a human genomic library prepared in the cosmid vector pJB8 with the heterologous chicken MLCn cDNA clone (36) and the partial human cDNA clone hA5-13 coding for the skeletal MLCH (37). This approach yielded the cosmid clone SL-1 containing approximately 40 kb of human DNA that hybridized to the applied probes under conditions of low stringency but not under highly stringent conditions (Fig. lA). Specifically, a 5-kb BamHI restriction fragment of the isolated clone SL-1 and also, to a lesser extent, a 9-kb BamHI fragment cross-hybridized to both cDNA probes. These two BamHI fragments were subcloned in plasmid pUC8 for further analysis (Fig. 1, B and C). As an attempt to identify which MLC isoform would be encoded by this genomic sequence, we isolated corresponding human cDNAs which differed from the screening probe hA5-13. A X G T l l library of human skeletal muscle was screened with the 5-kb BamHI fragment of the cosmid clone SL-1. The cDNA clone GT10/1-4 described previously (37) hybridized to this probe even under stringent conditions (0.1 X SSC, 68 "C). In contrast, hybridization with well defined human cDNA probes for the skeletal muscle MLC, (48) and the embryonic or atrial isoform MLClemb (49) did not sustain high stringency washes. By these results it was concluded that the isolated gene on the cosmid SL-1 was homologous to GT1O/ 1-4 and less related to the structures of the other MLC cDNA clones. The nucleotide sequence of the partial cDNA clone GT10/1-4 was determined and has been published previously (37). Comparison of its deduced amino acid sequence with known MLC structures suggested that the protein encoded by clone GT10/1-4 and presumably also by the isolated gene was highly related t o a myosin light chain isolated from chicken gizzard and chicken fibroblasts (15,35). With this observation in mind, we decided to isolate fulllength cDNA clones coding for bona fide nonmuscle or smooth muscle MLCs. X G T l l libraries constructed from cultured  (40) and human cardiac aorta were screenedusing GT10/1-4 DNA and the 5-kb BamHI fragment of SL-1 as probes. The full-length clone isolated from the lymphoblast library was designated lym4 and was considered to be of nonmuscle origin. Similarly, the longest cDNA clone isolated from the smooth muscle library was purified and designated as GT6.
Determination of Nucleotide Sequences of Clones lym4 and GT6-Map positions for several restriction sites were determined in both cDNA clones GT6 and lym4 (Fig. 2). The complete cDNA inserts of the X recombinants as well as appropriate restriction fragments were subcloned in M13 vectors according to the sequencing strategy outlined in Fig.  2. All sequences were determined on both strands by at least two independent sequencing runs on each fragment. As shown in Fig. 3, the cDNA insert of clone lym4 consists of 646 nucleotides plus poly(A) tail. The sequence of 151 amino acids is encoded by 453 nucleotides preceded by 31 nucleotides of 5"untranslated sequence and followed by 170 noncoding nucleotides downstream of the termination codon. The signal for polyadenylation is found approximately 20 nucleotides upstream of the poly(A) addition site. The smooth muscle clone GT6, although somewhat shorter at the 5' end, contains a remarkably homologous sequence, with the noticeable exception of 44 additional nucleotides inserted into the GT6 sequence between the nucleotides G-446 and A-447 according to lym4 numbering. Both clones contain the identical open reading frame for the same number of amino acids. In clone GT6, however, the codon for the amino acid Glu-143 located near the C terminus is disrupted by a distinct sequence block encoding the 9 C-terminal amino acids and also the translational stop signal of the smooth muscle MLC. The inclusion of this block of 44 nucleotides leads to five differences in the C-terminal amino acid sequence between GT6 and lym4. It also results in a longer 3"untranslated sequence of 203 nucle-   otides in the smooth muscle MLC mRNA as compared to the nonmuscle MLC (Fig. 3). The complete structural homology of both cDNA clones upstream and downstream of the additional sequence stretch in GT6 suggests that both mRNAs are transcribed from the identical gene and the GT6-specific sequence is introduced by an alternative splicing event of a separate exon. To investigate this idea, we determined the nucleotide sequence of the putative NM/SM MLC gene located on the cosmid clone SL-1. The localization of exons and introns was derived by comparison of the gene sequence with both cDNA sequences.

P O l y l A )
Structural Organization of the H u m a n NMISM Gene-The nucleotide sequence of the human NM/SM gene was determined as outlined in Fig. 1 and shown in Fig. 4. Although we have not precisely mapped the transcriptional start site of the NM/SM MLC gene, we tentatively marked the beginning of exon 1 by its sequence homology to the isolated cDNAs and also to conserved sequences in two independently isolated processed pseudogenes (see below). The gene contains 7 exons and resembles the general structural organization of the MLC3 genes of fast skeletal muscle (18-20). As seen in other MLC3type genes, the first exon encodes only the first methionine and the 5"untranslated leader sequence. The introns are relatively small; in particular introns 3,4, and 5 are not much

A A G A C T A G G G T T G G G C C G A G A G T C G G A G P -TCCCAGGTlXTGGGAGACGGGCAGGATTGGGGACGAGAGGCA
Hat GGAAGAGACGGGTCGGGGGGCGAGGAGAAGGCAGGGGTAGGAGGWGGGAATCTGAGG

G G T A C C T C A~T C A C C T T C C~T G l X T T~C C C C C T G C A G~
1 y G l u L y n H e  longer than the corresponding exons. The smooth musclespecific 44 nucleotides observed in clone GT6 between nucleotides 436 and 481 are identical to a sequence located 305 nucleotides downstream of exon 5. This block of 44 base pairs (underlined in Fig. 4) is framed by the perfect splice consensus sequences, (Py)n GCAG and GTACG, thereby defining it as exon 6. When exon 6 is utilized, the complete sequence of exon 7 is turned into 3"noncoding sequence since the first translational stop codon is located on exon 6. In this configuration then the sixth intron disrupts the 3"noncoding region which is also the case in the genes coding for the fast skeletal muscle MLC1/3 but is otherwise extremely rare. When exon 6 is spliced out, the translational stop signal encoded on exon 7 is used and no additional intron interrupts the 3'-noncoding sequence. The genomic organization of the human NM/SM MLC gene strongly argues for the generation of the two different MLC isoforms by the alternative splicing of exon 6.

The Functional NM/SM MLC Gene Is Accompanied by
Several Pseudogenes in the Human Genome-To investigate the total number of NM/SM MLC genes in the haploid human genome, DNA was digested with different restriction enzymes and analyzed by the Southern blot technique using lym4 cDNA as the gene-specific probe. As demonstrated in Fig. 5, an array of several signals in all three restriction digests indicated the existence of more than one gene. The strongest bands correspond to the functional gene which we have isolated on clone SL-1 and completely sequenced. The weaker signals presumably indicate the existence of somewhat less homologous sequences which nevertheless constitute members of the NM/SM MLC gene family. To understand the relationship of these different sequences, we have cloned some and analyzed them by physical mapping and by partial nucleotide sequence analysis. In all cases the coding sequence detected by hybridization to lym4 cDNA was contained on a short contiguous stretch of DNA comprising no more than 600-1000 base pairs. Three of these isolates were sequenced and compared to the SL-1 gene structure. As shown in Fig. 6, two of the isolated clones contained highly homologous sequences to the functional NM/SM gene (between 85 and 95%) beginning with exon 1 sequence; the third clone had sequence homology starting in exon 3 (not shown). The most interesting observation, however, was the fact that all sequenced isolates contained the continuous mRNA sequence without the interruption of intervening sequences. The upstream sequence preceding the putative exon 1 was completely diverged in both clones and did not contain any features of the upstream region in SL-1. One of the clones has lost the AUG initiation codon and contains no open reading frame. Based on these results we would like to suggest that the majority, if not all, of the additional NM/SM-like sequences in the human genome constitute processed pseudogenes with no obvious function.
Tissue-specific Expression of MLC mRNA in Smooth Muscle and Nonmuscle Cells-To confirm the proposed expression of both MLC isoforms in the different tissues, an oligonucleotide representing the first 30 nucleotides of exon 6 was synthesized and used as a smooth muscle-specific probe. RNAs from HeLa cells as a nonmuscle source and from human uterus tissue as a smooth muscle were isolated and analyzed on Northern blots. As shown in Fig. 7, when the entire cDNA insert of clone lym4 was used as a hybridization probe, a signal was detected in both cell types indicating an RNA of approximately 700 nucleotides in length. In contrast, when the same Northern blots were hybridized to the oligonucleotide specific for exon 6, only a very faint signal was obtained in RNA from HeLa cells, whereas in RNA from uterus the same hybridization signal appeared that was seen with the total cDNA probe. This result indicates that the 44-base pair insert represented by exon 6 is only expressed in smooth muscle RNA. The resolution on the Northern blot was not sufficient to detect the minor difference in length between the two mRNAs. If hybridization of the oligonucleotide probe would have been due to contaminating nonmuscle cells in the uterus smooth muscle tissue, one would have expected to see strong

Alternative Splicing of Human Myosin Light Chain Gene
ACTGlGGGCTAGGAGTCMGCAGCCCGAGCTMClTC'ITACATCTCMGTCACCC

Exonl
IIIIIIII I I I l l I l l II I IIIIIIIIIIIIIIII

Exon2
I I IIIIIIIIIIIII IIIIIIIIIIIIIIIII I I I I I I I I I I I l l I I I I I I I I  specific oligonucleotide were used as indicated. The same blot was hybridized first to the oligonucleotide probe (specific activity 5 X lo6 cpm/gg) and after removal of the hybridization signals to the cDNA probe lym4 (specific activity 1 X loR cpmlpg). Exposure on x-ray film was overnight a t -80 "C for both probes.
hybridization of the oligonucleotide also to the HeLa cell RNA. This observation confirms that similar but not identical MLC mRNAs are expressed in human nonmuscle and smooth muscle tissues and the synthetic oligonucleotide can serve as a smooth muscle-specific probe.

DISCUSSION
The genes coding for the various alkali MLC proteins constitute small gene families in vertebrates (17). The mammalian alkali MLC family includes probably four different functional genes, the MLCu for fast skeletal muscle, the MLCI. for slow skeletal muscle and for the ventricle, the MLClemb for embryonic skeletal muscle, embryonic ventricle, and adult atrium, and the NM/SM MLC for nonmuscle and smooth muscle tissues. Whereas the genes encoding the striated muscle MLC isoforms have been studied in mouse (19), rat (20), chicken (18), and humans,' no information is available on the MLC genes coding for nonmuscle and smooth muscle isoforms in higher organisms.
We report here the isolation, characterization, and sequence analysis of a functional human MLC gene which is expressed in nonmuscle and smooth muscle cells. In addition, we describe and characterize two putatives, processed MLC pseudogenes present in the human genome. Furthermore, we demonstrate by cDNA cloning and detailed sequence analysis that two distinct mRNAs are expressed in nonmuscle and smooth muscle tissues, respectively, which differ only by 44 nucleotides comprising the 9 codons for the C-terminal amino acids and 15 nucleotides of 3"untranslated sequence. Both cDNAs are presumably transcribed from the identical gene and are regulated by a post-transcriptional mechanism. This conclusion is based on the following observations. (i) Both cDNAs contain the identical 5'-and 3"untranslated regions which generally differ if transcripts originate from different genes. (ii) The human genome most likely contains only one functional gene for the nonmuscle and smooth muscle MLCs. The additional gene sequences constitute processed pseudogenes. (iii) The SM-specific 44 nucleotides encoding the 9 C-terminal amino acids, the translational stop, and part of the 3'-noncoding region are encoded on the separate exon 6. This suggests that both cDNAs are generated from the same gene by either excluding or including this exon during an alternative splicing event in nonmuscle and smooth muscle cells. A similar mechanism has been suggested for the generation of MLC isoforms in fibroblasts and gizzard smooth muscle of chicken (35). This was based on the comparison of two cDNA structures but without the analysis of the gene sequence. In contrast to our results in humans in which the exon 6 constitutes the specific sequence of the smooth muscle isoform and is absent in the nonmuscle MLC, in chicken the reverse situation seems to be realized. According to Nabeshima et al. (35), the putative exon 6 sequence represents the fibroblast specific C-terminus of the protein and is absent in gizzard MLC mRNA. Although we cannot resolve the puzzling difference between the two organisms a t present, we would like to suggest that inclusion of exon 6 is an obligatory event in mammalian muscle cells, since all alkali MLC mRNAs studied so far include this sequence. It is interesting to note that the utilization of exon 6 leads to the occurrence of an intron in the 3"untranslated region which has been reported very infrequently for other genes. Examples for it are the fast muscle MLCu genes of mouse, rat, and chicken (18-20), the MLCl,,b gene of mouse (50), and the small t-antigen of SV40 (51).
It has been reported previously that the synthesis of MLC, and MLCs in fast skeletal muscle involves alternative splicing events of two distinct primary transcripts which presumably are present in the same myotube (18)(19)(20). Unlike this situation, the alternative splicing event leading to smooth muscle or nonmuscle MLC must occur in different cell types using * S. Lenz, P. Lohse, U. Seidel, and H.-H. Arnold, unpublished results.