Cloning and expression of a smooth muscle caldesmon.

Caldesmon is a smooth muscle and nonmuscle regulatory protein that interacts with actin, myosin, tropomyosin, and calmodulin. Two overlapping clones, isolated from a chicken oviduct cDNA plasmid library and a chicken gizzard cDNA lambda NM1149 library, were used to generate a 4108-base pair sequence coding for one caldesmon. Expression of the coding sequence confirms this is one of the large smooth muscle caldesmons. The deduced protein molecular weight is 86.974, significantly less than the molecular weights estimated by sodium dodecyl sulfate gel electrophoresis. The protein has a high content of Gly, Lys, Arg, and Ala; there are two cysteine residues, one at either end of the molecule. Comparison with the Protein Identification Resource database demonstrates a similarity with a tropomyosin binding domain of troponin T, but none with any calmodulin or actin binding proteins. The center of the protein has an 8-fold repeat of a 13 amino acid sequence whose general motif is -Glu3-(Lys/Arg)2-Ala2-Glu2-(Lys/Arg)1-X-(Lys/Arg)1-Ala1-, where X is Glu, Gln, or Ala. Comparison with peptide sequences from a chymotryptic fragment that binds actin and calmodulin places this domain on the C terminus of caldesmon adjacent to the troponin T similarity. A tentative map of the major binding domains is proposed on the basis of available data.

defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 504968.
$To whom correspondence should be addressed Dept. of Cell Biology, Baylor College of Medicine, One Baylor Plaza, Houston, T X 77030. email: jbryan@bem.tmc.edu.
The function of caldesmon is unclear. In reconstitution experiments, caldesmon inhibits the actin-activated ATPase of myosin (19)(20)(21). This inhibition is attenuated by calciumcalmodulin and is potentiated by tropomyosin. Several groups have reported that caldesmon will interact directly with tropomyosin (22,23). The mechanism of the ATPase inhibition is complex (24,25), and there are several reports that smooth muscle myosin binds directly to caldesmon (25)(26)(27). Ikebe and Reardon (26) have reported that smooth muscle heavy meromysin and myosin subfragment 2, but not myosin subfragment 1, are specifically retained on a caldesmon-Sepharose column.
Other reports have aimed to define the organization of the actin, calmodulin, tropomyosin, and myosin binding sites on the caldesmon molecule. Proteolytic cleavage experiments (28-30) have localized the actin and calmodulin binding sites to a fragment with an estimated molecular weight of 38,000-40,000. On the basis of expression experiments using a cDNA that encoded part of caldesmon, we argued that these two binding domains were on the C-terminal end of the protein (29). Recently Riseman et al. (31) have used partial cleavage at and labeling of two cysteines to establish this point more firmly. Sutherland and Walsh (27) have employed a similar strategy to argue the myosin binding domain is on the aminoterminal half, although these authors arrive at a quite different peptide order than Riseman et al. (31). Finally, Wang (32) has labeled the cysteine residues with a photolabile reagent and shown that caldesmon can bind two calmodulin molecules, one at either end.
We have cloned, sequenced, and expressed one of the high molecular weight caldesmons from chicken smooth muscle. The deduced protein sequence coupled with peptide sequencing confirms the localization of the actin-calmodulin binding fragment to the C terminus and confirms that the two cysteine residues are at opposite ends of the protein in agreement with the results of Mornet et al. (33), Wang (32), and Riseman et al. (31). The deduced molecular weight indicates that the smooth muscle forms are markedly smaller than estimated from SDS gels and significantly less than determined by sedimentation equilibrium, 93,000 f 4,000 versus 86,974. In addition, the deduced protein sequence reveals a similarity to a tropomyosin binding site on troponin T, suggesting the location of the tropomyosin binding domain. Finally, the central region of the molecule contains a unique repetitive sequence of unknown function. The organization of sites on the extended molecule suggests that caldesmon would be well suited to act as a bridge between myosin and actin filaments.

EXPERIMENTAL PROCEDURES
Materials-Restriction enzymes were from New England BioLabs; Sequenase was from United States Biochemical Corp., and common chemicals were from Sigma. Cloning of Caldesmon DNA Sequencing-The M13 dideoxy procedure was used to sequence DNA fragments cloned into M13mp18, M13mp19, or Phagescript (Stratagene, La Jolla, CA). All cloning procedures were performed as described by Maniatis et al. (34). Sequenase was used following the manufacturer's directions.
Peptide Sequencing-Chicken gizzard caldesmon was isolated as described by Bretscher (2); the molecular weight 40,000 calmodulin actin-binding peptide was isolated as described by Fujii et al. (29). The NH2-terminal 20 amino acids of this chymotryptic peptide were determined using the Applied Biosystems Inc. (Foster City, CA) model 470-A gas phase protein sequencer equipped with an in-line model 120-A phenylthiohydantoin analyzer.
Vector Construction-pLc-@globin was obtained from K. Nagai (Medical Research Council, Cambridge, United Kingdom). The details of this vector are given by Nagai and Thogersen (35). This plasmid contains a &globin cDNA insert between BamHI and Hind111 sites. The construction of a full-length caldesmon expression vector, designated pCDM314, was done as follows. Starting with a Phagescript clone containing the caldesmon coding sequence, CDM57 in Fig. 1, we isolated a 1.8-kilobase BamHI-Hind111 fragment containing the 5' end of caldesmon. This was subcloned into pLcII obtained by restricting pLc-&globin with BamHI and HindIII. The resulting intermediate was restricted with HindIII, filled with deoxynucleotide triphosphates using the Klenow fragment of DNA polymerase and then restricted with XhoI. A 2450-base pair XhoI-SmaI fragment was isolated from a pUC19 vector carrying CDM57 and subcloned into this intermediate. This produced a plasmid carrying the full coding sequence. This plasmid was restricted with BamHI and dephosphorylated using calf intestinal alkaline phosphatase. The FX-oligonucleotide (see Ref. 35), GATCCATCGAGGGTAGGCCT-ACCCTCGATG, carrying a StuI site was cloned into the dephosphorylated BamHI site of this vector. The final expression vector, pLc-CDM314, was obtained by restricting the plasmid carrying the FXoligo a t its unique NcoI site, at the ATG start codon, blunting with mung bean nuclease, restricting with StuI, and religating. All of these manipulations were done in the 2 9 4 0 cell strain described by Shatzman and Rosenberg (36). This carries a defective X lysogen with a wild type X repressor. For expression, the final plasmid was transformed into either N5151 cells (36) or QY13 cells (35) carrying a defective X lysogen with a temperature-sensitive X repressor. Cells were grown to a ODw of 1 at 30 "C and 1 ml aliquots were heatshocked at 43 "C for 20 min and then grown for an additional 3 h at 37 "C. Cells were pelleted in a Microfuge for 3 min and resuspended in 500 gl of SDS sample buffer for gel electrophoresis. Electrophoresis and immunoblotting were done as described by Fujii et al. (29). The antibodies used have been described by Dingus et al. (13) and Fujii et al. (29).

RESULTS
We previously reported the cloning of a cDNA coding for a part of chicken gizzard caldesmon. This was obtained by screening a Xgtll library with anti-caldesmon antibodies (29). This fragment was used to screen a chicken oviduct library constructed using the Okayama-Berg vector (37) and a chicken gizzard library constructed in the X phase NM1149 (38). The relationship of these two clones, designated CDMB and CDM57, respectively, is shown in the restriction map in Fig. 1. The relationship of the two clones sequenced in this study is shown along with a partial restriction map. CDM2 was isolated from a chicken oviduct cDNA library constructed using the Okayama-Berg plasmid vector; CDM57 was isolated from a X NM1149 chicken gizzard cDNA library. The position of the caldesmon open reading frame is shown for reference. aa, amino acid.
The longest clone from the plasmid library contained a 2606-base pair insert which terminated at a HindIII site at its 5' end and at a poly(A) tract at its 3' end. Because the complete sequence shows two closely spaced HindIII sites at this location, we assume the mRNA/cDNA heteropolymers were cleaved by HindIII during construction of the library, resulting in premature termination. CDM2 includes 1004 base pairs of coding sequence and 1602 base pairs of 3"untranslated sequence.
The longest clones isolated from the NM1149 library, all approximately 2500-2600 base pairs, were initiated by oligo(dT) primers hybridized to a stretch of 12 internal A residues starting at position 2541 which is located about 35 base pairs inside the 3"untranslated region. The sequences of CDM2 and CDM57 overlap for 1038 base pairs (Fig. l ) , positions 1502-2540 in Fig. 2. We have found only a single base pair difference at position 2115, a T in CDMB and a C in CDM57. This change does not lead to an amino acid change in the deduced protein sequence. The longest cDNAs isolated from the X NM1149 library contain the entire coding sequence but lack most of the 3"untranslated region.
Polyadenylation Signals-The 3"untranslated region has two AATAAA polyadenylation signals at positions 3514 and 4085, respectively. In poly(A+) mRNA from chicken gizzard we detect a major 4100-nucleotide species and a minor, estimated <5%, 3500-nucleotide RNA on Northern blots (data not shown).
Deduced Protein Sequence-The longest open reading frame ( Fig. 2) begins with a good Kozak (39) consensus sequence at position 235 and terminates at position 2,503, specifying a protein of 756 amino acid residues. The calculated molecular weight is 86,974; the amino acid composition, shown in Table I, is in excellent agreement with the recent report from Graceffa et al. (16). Caldesmon has an unusually high content of glutamic acid and lysine, 22 and 15%, respectively, and a low content of aromatic amino acids. The calculated PI is 5.07; the calculated partial specific volume is 0.748 g/ml. In agreement with the protein chemical evidence (16, 31-33), caldesmon has two cysteine residues which are located at positions 153 and 580, respectively, in the aminoand C-terminal halves of caldesmon. The predicted molecular weight is markedly less than the values, M , 120,000-150,000, determined by SDS gel electrophoresis and is approximately 6,000 less than the sedimentation equilibrium determination, 93,000 f 4,000 reported by Graceffa et al. (16).
The calculated extinction coefficient for caldesmon is &O = 33,270 M" cm", derived using the extinction coefficients for Trp (& = 5850 M" cm") and Tyr ( 6 % = 1340 M" cm") given by Magne et al. (40). The is 3.8 in agreement with the value reported by Graceffa et al. (16) which was obtained by determining protein concentration from the refractive index measured in the analytical ultracentrifuge.
The amino acid sequences in the two underlined regions in Fig. 2 have been determined using protein sequencing methods. We determined the first sequence, beginning at Asp451, from the amino terminus of the chymotryptic peptide described by Szpacenko and Dabrowska et al.       protein sequence with itself was determinedusing an optimized search algorithm (53, 54). The window size was 10 residues; a unit cost matrix was employed that assigns a match a value of 1 and a mismatch a value of 0. Tm, tropomyosin.

T T G C C A T I AGAGGTATGG C M G A T M G T A T T M A A A A T T T A M T G T C A A T T C C C U G G M T F U T M T T U G A M GTTATAGTTI TTAAAAWGG AAAAGGTAAG GGTATAGAGG G T C U T A M T
These are located on the amino-terminal half of the deduced sequence.
Sequence Similarities and Structural Predictions-An analysis of the caldesmon protein sequence using a comparison matrix reveals the remarkable centrosymmetric pattern shown in Fig. 3 can be extended somewhat, particularly toward the C terminus, with the insertion of some padding but becomes increasingly degenerate.
A search of the Protein Identification Resource database using the predicted caldesmon sequence indicates a similarity with all of the published troponin T (TnT) sequences. The specific region identified in rabbit skeletal muscle T n T is G l P to LysI4'. This sequence is in a T n T peptide which has been identified as a calcium-insensitive skeletal a-tropomyosin binding site (41,42). The region of similarity in caldesmon is the weakly repetitive region, Glum" to Lys"', in Figs. 2 and 3. This similarity is shown in more detail in Fig. 5. The overall identity is 43%, 25 amino acids over a stretch of 58 residues.
More striking is the similarity of the distribution of acidic and basic residues indicated by the + andsymbols. This repeat region in TnT has been analyzed in some detail by Parry (43). This sequence is in the head region and is a candidate for the site of tropomyosin binding to caldesmon identified by Graceffa (22) and localized to the head region by Fujii et al. (23).
The Garnier-Osguthorpe-Robson (44) and  secondary structure prediction algorithms suggest that caldesmon could have a significant a-helical content; the predicted values are near 80-85%. The central repeated region is predicted by all of the algorithms to be helical, but how much would be stable in solution in an extended protein like caldesmon is not clear. The estimate of a-helix content by Lynch et al. (17) using circular dichroism is relatively low, about 10%. The stability of caldesmon a t high temperatures would appear to be consistent with a low helix content.
I n Vitro Expression-It could be argued, on the basis of apparent molecular weights, that we have cloned the smaller nonmuscle form of caldesmon. T o evaluate this possibility, we have subcloned the coding region into the pLcI1 expression plasmid described by Nagai and Thogersen (35). The fusion protein coded by this construct, pLc-CDM314, contains 231 amino acids from the X cII gene, three from a BarnHI linker, four from the factor X recognition site, and 755 from caldesmon. The expected molecular weight is about 91,000. Two cell strains, QY13 and N5151 (35, 36), carrying defective X lyso-a b c d e f FIG. 6. Expression of a cII-caldesmon fusion protein using pLc-CDM314. QY13 and N5151 cells were transformed with pLc-CDM314 as described under "Experimental Procedures" and grown to an OD, of 1.0 a t 30 "C. 1-ml aliquots were maintained a t 30 "C or heat-shocked at 43 "C for 20 min and then grown for 3 h a t 37 'C.
Cells were harvested in a microcentrifuge, dissolved in SDS sample buffer, and subjected to electrophoresis in 7.5% polyacrylamide gels containing 0.1% SDS. Immunoblots were done as described by gens and a temperature-sensitive X repressor were transformed with this construct and grown a t 30 "C. Expression was induced by heat shock a t 43 "C for 20 min followed by growth a t 37 "C. After 3 h, cells were collected and prepared for SDS gel electrophoresis. Caldesmon expression was detected by blotting with anti-caldesmon antibodies as described by Fujii et al. (29). The results are given in Fig. 6. The largest polypeptide had an apparent molecular weight slightly larger than chicken gizzard caldesmon. This is clearly greater than expected from the cloned sequence but is consistent with a fusion protein of the cII fragment and the smooth muscle form of caldesmon. The origin of a second molecular weight 40,000 species is not clear, but it may be a proteolytic fragment because caldesmon is known to be quite sensitive to proteases, and we have taken no precautions to avoid proteolysis.

DISCUSSION
We have sequenced two overlapping cDNAs coding for one caldesmon and have provided evidence that it is one of the "large" smooth muscle forms of this protein. We found a single base pair difference in the overlapping sequence which does not change the amino acid sequence. The sequence for CDM57 is given in Fig. 2 with a C in position 2115. It is not clear whether this single base change means the clone from the Okayama-Berg chicken oviduct library is a second isoform, a polymorphism, or the result of an artifact introduced during reverse transcription and cloning. The deduced protein sequence confirms our earlier speculative positioning of the actin-calmodulin binding head region at the C terminus of caldesmon. The sequence allows us to resolve some of the conflicting reports about this protein and begin to order the binding domains. To facilitate the discussion and to begin to relate known caldesmon protein chemistry to the sequence, we have summarized some of the available data in Fig. 7. Chymotryptic cleavage defines the head region, whereas cleavage at the cysteine residues produces three peptides designated 1-111. The assignments of the individual binding domains illustrated in Fig. 7 are discussed below.
General Properties and Molecular Weight Problems-Caldesmon has a high content of glutamic acid, lysine, and arginine and is low in aromatic amino acids. The acidic and basic residues are asymmetrically distributed with peptides I and I1 being acidic overall, whereas the head region and peptide I11 are basic in character, which agrees with the observation that the head region is not retained on DEAE resins. The protein has a calculated isoelectric point of 5.07, assuming the amino terminus is blocked.
All of the estimated molecular weights for caldesmon determined by SDS gel electrophoresis differ significantly from the expected values. A precise reason for this disagreement is not known, but Takano et al. (46)  molecular weight of 34,000 and is approximately 39% of the entire molecule. The fact that the head region and peptide I11 estimates are in error, even though they are not particularly rich in acidic groups, suggests that the slower mobility is due to something more than simple charge. These errors increase the difficulty of mapping proteolytic and CNBr fragments. It is difficult, for example, to correlate the recent CNBr map of Ball and Kovala (48) with the expected patterns. These authors identify and order four (or five) unique fragments with estimated molecular weights in SDS greater than 16,000, whereas the sequence predicts a total of eight peptides. It is quite likely that the 17-residue amino-terminal peptide, and possibly both of the C-terminal CNBr peptides with 52 and 58 residues and acidic amino acid contents of less than 5%, would not have been resolved on the 6-15% SDS gel system used. On the positive side, the predicted sequence indicates that the head region is nicely broken up into four CNBr fragments that should prove very useful in mapping the actin, tropomyosin, and calmodulin binding domains more exactly using protein biochemical methods. Actin-Calnodulin Binding Domain-Sequence comparison has not been particularly informative with respect to the location of the actin and calmodulin binding sites. A straight similarity search of the Protein Identification Resource database with the head region and with peptide I11 looking for actin and high affinity calmodulin binding sequences has not produced any significant matches that narrow the location of the binding domains to a region smaller than peptide 111. This fragment presumably corresponds to the -20-kDa binding peptide described by Szpacenko and Dabrowska (28) and by Fujii et al. (29). Some speculation is possible; Shirinsky et al. (8) have used changes in intrinsic tryptophan fluorescence to study calmodulin-caldesmon interactions. Peptide I11 contains three of the five tryptophan residues in caldesmon and both are near the C terminus, suggesting that the calmodulin binding site is C-terminal of the actin binding site. We are in the process of constructing expression vectors to produce caldesmons truncated at their C termini to test this notion.
Recently, Wang (32) has argued that caldesmon can bind two calmodulin molecules and that there is a second lower affinity binding site near the amino-terminal cysteine. Wang has confirmed this by isolating and sequencing a CNBr fragment that binds to calmodulin affinity columns; comparison with the protein sequence shows this second calmodulin binding peptide begins at Metla (Footnote 2) and includes the first cysteine residue.
Tropomyosin Binding Domain-Graceffa (22) has shown that caldesmon will interact directly with tropomyosin, and Fujii et al. (23) have provided evidence that binding is to the head region. The similarity of the amino terminus of the head region, Glu508 to with a tropomyosin binding region of troponin T suggests that this is the most likely site for tropomyosin binding. However, this conclusion is not straightforward. Troponin T has been reported to have two tropomyosin binding sites: one Caz+ sensitive site that binds near C Y S '~~ of rabbit a-skeletal muscle tropomyosin about two-thirds of the way along the molecule and a Ca2+-insensitive site that binds near the C terminus, possibly in the tropomyosin head-to-tail overlap. These two binding domains have been identified by cyanogen bromide and tryptic fragmentation studies (41, 42). The Ca2+-sensitive site is on a tryptic fragment, TnT-T2, residues 159-259; the Ca2+-insensitive site is on a cyanogen bromide fragment, TnT-CB2, residues 71-151. TnT-CB2 encompasses the region of similarity with caldesmon. Pearlstone and Smillie (41) have re-* C. A. Wang, personal communication. ported that TnT-CB2 does not bind to chicken gizzard or platelet tropomyosin at moderate salt concentrations, but TnT-T2 does. It is unclear therefore whether the Ca*+-insensitive binding domain of troponin T, the region with sequence similarity to caldesmon, actually has a binding site on smooth muscle tropomyosin. However, this region appears to be the best candidate for a tropomyosin binding domain in caldesmon. From the binding data and the sequence it is possible to make the simple prediction that peptide I11 will not bind to tropomyosin if the troponin T-like sequence is important for binding because the second cysteine group is distal to the similarity.
Myosin Binding Domain-Our placement of the myosin binding domain is the most speculative. Several reports have shown that the whole molecule will bind to smooth muscle myosin affinity columns (25-27). Sutherland and Walsh (27) have shown that one of the two smaller fragments, peptides I and I11 in Fig. 7, obtained by cleavage with 2-nitro-5-thiocyanobenzoic acid will bind to myosin-Sepharose. As discussed above, Riseman et ai. (31) demonstrated that peptide I11 binds to actin and calmodulin and a comparison of the two papers indicates that the larger of the two peptides, peptide I, binds to myosin. The caveat is that Sutherland and Walsh (27) have arrived a t a completely different map for peptides 1-111 that cannot be reconciled with the deduced sequence.
Phosphorylation Sites and Kinase Activity-Scott-Woo and Walsh (49) have presented evidence that partially purified caldesmon preparations will autophosphorylate. These observations have led to the suggestion that caldesmon is a proteinserine/threonine kinase that can self-phosphorylate in the presence of Ca2+-calmodulin. We have examined caldesmon for two sequences, -Asp-Leu-Lys-Pro-Glu-Asn-and -Gly-Thr-Pro-Glu-Tyr-Leu-Ala-Pro-Glu-(50), conserved in several members of the protein-serine kinase family and for -Asp-Leu-Arg-Ala-Ala-Asn-and -Asp-Leu-Ala-Ala-Arg-Asn-, conserved sequences in tyrosine kinases (51). We find no matching sequence to support directly the idea that caldesmon is a protein-serine kinase. In addition, we have looked for potential sites of protein phosphorylation. A search with the consensus sequence substrate for the multifunctional calmodulin-dependent kinase -Ala-X-Tyr-Ser/Thr-reveals one site beginning at Arg466-Glu-Leu-Thr. This potential phosphorylation is not clearly related to any of the established binding domains.
Central Repetitiue Sequence-The central repetitive region is intriguing. Secondary structure prediction algorithms indicate this structure should be a-helical. This would agree with the elongated shape of caldesmon observed in the electron microscope and would suggest that the repetitive region serves as a helical rod to separate the two more globular ends, one of which binds to myosin and the other to actin. This organization, if correct, would make caldesmon an ideal molecule to bridge actin and myosin filaments in smooth muscle during the "latch" state (52).

Molecular
Cloning of Caldesmon