Protein, cDNA, and Genomic DNA Sequences of the Towel Gourd Trypsin Inhibitor

Two trypsin inhibitor components of the squash family were isolated and purified from the juice of the towel gourd (Luffu cylindrica) using anhydrotrypsin affinity chromatography followed by high pressure liquid chromatography. The inhibitors were sequenced and found to consist of 28 and 29 amino acid residues. The determined sequences show high similarity to other inhibitors of the squash family, especially in the location of disulfide bonds and the reactive site and also in the COOH-terminal region. A cDNA library of towel gourd was constructed and used as a template for polymerase chain reaction amplification of two cDNA fragments of the inhibitor with an overlapping sequence. A full-len~h cDNA sequence coding for the inhibitor was then completed. The open reading frame codes for a prepro-inhibitor protein with the pre- and pro-peptides consisting of 21 and 13 residues, respec-tively. The deduced amino acid sequence of 29 residues for the inhibitor is consistent with that determined by primary structure analysis. The genomic sequence of the mature inhibitor was also ascertained using the total DNA of the towel gourd as a polymerase chain reaction template. The genomic sequence is completely identical with that of the cDNA, showing no intervening sequence.

Protease inhibitors are ubiquitous in the animal, plant, and microorganism kingdoms; they constitute an important group of proteins (1)(2)(3). The plant protease inhibitors are now drawing great attention and interest as they may play an important role in the defense strategy of plants against insect predators by reducing the digestibility and nutritional quality of the leaves (4,5). Recent studies have shown that transformation of tobacco plants with a gene encoding the cowpea trypsin inhibitor or potato inhibitor conferred increased resistance against herbivorous insect invasion (6,7).
Squash family protease inhibitors are small peptides consisting of only 27-32 residues with three disulfide bridges. They are regarded as ideal subjects for structure-function *This work was supported by a state biological high technology research grant from the government of China. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted M98055.
to the GenBankTM/EMBL Data Bank with accession numbeds) To whom correspondence should be addressed Shanghai Institute of Biochemistry, Academia Sinica, 320 Yue-Yang Road, Shanghai 200031, China. Fax: 86-21-4338357. studies (8)(9)(10)(11)(12). Their disulfide bridges are assembled in a knotty structure by a unique connecting structure (I-IV, II-V, and 111-VI). The reactive site of the inhibitors is located at an Arg (Lys)-Ile peptide bond near the NH2 terminus (13,14). Due to their low molecular weight and unique structure, squash family inhibitors have been intensively studied, including structural analyses in solution and crystals (15-17), peptide synthesis (18)(19)(20), and the expression of a synthetic inhibitor gene in Escherichia coli and yeast (21). However, the cDNA and genomic structures of this family of inhibitors have not yet been reported.
In this paper, we describe two new members of the squash family inhibitors, the towel gourd trypsin in~ibitors (TGTI-I and TGTI-111.' Their amino acid sequences have been determined. Instead of the conventional approach to screen out a TGTI cDNA clone from the cDNA library of the towel gourd, the cDNA sequence encoding TGTI was elucidated via the PCR technique using the total cDNA library as a template. Two PCR-amplified fragments with overlapping sequences were obtained, comprising a full-length TGTI-I1 cDNA flanked with 5'-and 3'-encoding regions. The open reading frame codes for prepro-TGTI-11. The genomic sequence of the mature inhibitor was also ascertained using the same PCR strategy.

EXPERIMENTAL PROCEDURES
Mater~a~-Towel gourd was purchased from the market of a Shanghai suburb. Ben~yl-DL-arginine p-nitroanilide HCl was from Dongfeng Biochemical Reagent Factory; bovine trypsin was obtained from Sigma. Immobilized anhydrotrypsin was prepared according to Axen (22) and Knights (23). Restriction endonucleases, T4 ligase Taq DNA polymerase, isopropyl-1-thio-P-D-galactopyranoside, X-gel, and the cDNA library kit were purchased from Bethesda Research Laboratories. PCR reagents were from Perkin-Elmer Cetus Instruments.
The Sequenase kit was a product of U. S. Biochemical Cop. [w3'P] dATP and [a-"PIdCTP were purchased from Du Pont-New England Nuclear. AI1 other reagents were of the finest commercial grade available.
Reversed-phase HPLC-HPLC was performed on a Waters system with two model 510 pumps and an ultraviolet detector (model 484). A 4.6 X 250-mm C18 column (Beckman Instruments) was employed. Elution was carried out with the trifluoroacetic acid-acetonitrile linear gradient system, namely eluting solution A was 0.1% trifluoroacetic acid in water, and eluting solution B was 0.85% trifluoroacetic acid in 85% acetonitrile. The effluent was monitored by measuring the absorbance at 220 nm. The flow rate was 1 mllmin.
Trypsin Inhibitory Activity Assay-This was performed essentially according to the method described previously (8).

~o l e c u l a r
C~aracterization of a Squash F a~i~y ~n~i b i t o r determined on an Applied Biosystems model 477 AI120 A protein sequenator and phenylthiohydantoin amino acid analyzer using the program provided by the manufacturer. Isolation of Towel Gourd DNA and mRNA-Total DNA was isolated from fresh tender towel gourds according to a modified procedure, using NaClO, extraction followed by RNase treatment (24). Total RNA was prepared by the guanidiniumlhot phenol method as described (25). The RNA preparation was shown to be undegraded by fo~~dehyde-agarose gel electrophoresis and staining with ethidium bromide (26, 27). Poly(A)+ RNA was isolated from a total RNA extract after three cycles of affinity chromatography on oligo(dT1cellulose. Construction of a cDNA Library-The procedure was performed according to the recommendations of Bethesda Research Laboratories. Four micrograms of mRNA was used to synthesize doublestranded cDNA. The first strand was synthesized with reverse transcriptase using the NotI primer-adapter at the poly(A) tail of the mRNA. The second strand was synthesized with DNA polymerase. The SalI adapter was then added to the double-stranded cDNA with T4 ligase. By NotI digestion, the transcribed cDNA thus had an Sal1 restriction site at one end and an NotI site at the other. The total cDNA was then cloned to plasmid pSPORT predigested with SalI and NotI and transformed into E. coEi strain DH5, thus generating a library of more than 2 X lo6 ampicillin-resistant colonies.
PCR-PCR was used for amplification of the coding gene of the inhibitor from the cDNA library. The oli~onucleotide primers were synthesized on an Applied Biosystems 380 A DNA synthesizer. The synthesized primers were purified on 8 M urea, 20% polyacrylamide gel. PCR was performed in a volume of 100 pl containing 50 mM KCI, 10 mM Tris-HC1, pH 8.3, 2 mM MgClz, 3 pg of cDNA or DNA template, 0.2 mM each dNTP, 2.5 units of Taq DNA polymerase, and 0.2 p~ each oligonucleotide primer. The sample was overlaid with 100 pl of mineral oil to prevent condensation and subjected to 30 cycles of denaturation at 94 "C for 1 min, annealing at 55 or 65 "C for 1 min, and extension at 72 "C for 1 min with a final extension phase of 5 min, performed on a Perkin-Elmer Cetus I n s t~m e n t s thermocycler. Fifteen p1 of the mixture was then applied to 7% PAGE. The PCR products were visualized with ethidium bromide staining.
Sequence Analysis of PCB Products-The PCR product of the appropriate size on PAGE was recovered by soaking the gel slice in a solution (0.5 M NH4 acetate, 10 nM MgAc2, 1 mM EDTA, pH 8.0, 0.1% SDS) with shaking at 37 "C for 4 h. The purified PCR products were then digested with two enzymes corresponding to the restriction sites at the 5'-ends of both primers and cloned into the same two sites of vector M13. The single-stranded DNA of recombinant clones conta~ning PCR products was sequenced by Sequenase (version 2.0) using the Sanger dideoxynucleotide chain termination method (28) according to the supplier's instructions.

RESULTS
Isolation and Purification of TGTI-Fresh towel gourd melons were peeled, homogenized into a greenish paste, and extracted with 50% ethanol at 4 "C overnight with stirring. The residue was removed by centrifugation. The supernatant was distilled under vacuum to remove ethanol and then adjusted to pH 8 and subjected to affinity chromatography in batches on Sepharose 4B-bound anhydrotrypsin previously e~uilibrated with 0.02 M Tris-HCl, pH 8, as described (8). The active fraction was eluted with 0.02 M HC1. The fraction collected was lyophilized and further chromatographed on a reversed-phase HPLC C18 column and fractionated by linear gradient elution. Two peaks ( P 1 and P2) with trypsin inhibitory activity were found as shown in Fig. 1 and designated as TGTI-I and TGTI-11. The trypsin inhibitory activity of TGTI-I1 was determined as shown in Fig. 2, which revealed equimolar inhibition. The same results were obtained with TGTI-I (data not shown). The total amount of inhibitors was about 1.6 mg/kg of towel gourd estimated by their inhibitory activity toward trypsin. The yield of both finally purified inhibitors was around 60%.
Amino Acid and Sequence Analyses-Both P1 and P2 components collected from HPLC as shown in Fig. 1 were taken to determine their amino acid compositions and sequences.

TTI-I (Tricboaanlbes) C P R I L M P C K V N D D C L R C C K C L S N -C Y C G
'"'

MCTI-I gourd) E R R C P R l L K Q C K R D S D C P C E C l C M A l l -C F C C
" " the towel gourd trypsin inhibitors

MCT1-l ( B i t t e r gourd)
and other squash family inhibitors.

MCEI-P ( B i t t e r gourd)
The position of the conserved reactive CMTI-l (Squash) site of the squash family of inhibitors is CMTI-IV marked by an asterisk.

R V C P K I L M E C K K D S D C L A E C I C L E I I -C Y C C
" " I

CSTI-IV (Cucumber) M M C P R I D M K C K H D S D C L P C C V C L E H I E Y C C
""

BDTI-I (Bryonia dioica) R C C P R I L M R C K R D S D C L I G C V C Q K N -C Y C G
'" 369bp 246bp b)

T 7 primer: 5 ' -C T C A C T A T A C C G m C C T G C -1' (20 mer)
FIG. 4. a, PCR strategy for sequencing of TGTI cDNA. The arrows indicate the direction of PCR amplification and bp sizes of their products. b, the oligonucleotide sequences of the primers. Primer 1 is complementary to a portion of the NHp-terminal part of TGTI. The asterisks indicate the mismatched nucleotides compared with the sequence of TGTI-I1 cDNA. Primer 2 is complementary to a portion of the COOH-terminal region of TGTI-11. SP6 and T7 primers were synthesized according to the SP6 and T7 promoter sequences in the pSPORT vector.
The results are shown in Table  I and Fig. 3. TGTI-I and TGTI-I1 are composed of 28 and 29 amino acid residues, respectively, sharing 69% similarity and having common features with other squash family inhibitors. The reactive site of both inhibitors is Arg-Ile located near the NHz terminus.
Completion of the cDNA Sequence of TGTI-11-The cDNA sequence of the inhibitor was completed by PCR in two steps, using the total cDNA library of towel gourd constructed in pSPORT plasmid as a PCR template. The procedure is shown in Fig. 4a. Based on the determined TGTI sequences and the preference of codon usage in other plant protease inhibitor genes, primer 1 was designed and synthesized corresponding to a portion of the NHz-terminal part of the inhibitor with minimal codon degeneracy. An EcoRI restriction site was added at the 5'-end of the primer (Fig. 4b). The SP6 polymerase promoter sequence present in the pSPORT vector was used as the reverse primer. The PCR-amplified product with a 350-bp fragment was separated on a 7% PAGE gel (Fig. 5) and cloned into M13 mp19 at the restriction sites of BamHI and EcoRI for sequence analysis.
The results revealed that the PCR fragment encompassed the NHz-terminal part of TGTI-I1 (position 3-29) and a 185bp 3'-flanking region upstream from the poly(A) tail. Based on the determined sequence, primer 2 was designed and synthesized corresponding to the COOH-terminal part of TGTI-11, i.e. CysZz-Glyz9. A BamHI restriction site was added at the 5'-end of primer 2 (Fig. 4b). Using the same strategy, the T7 promoter sequence present in the pSPORT vector was exploited to pair with primer 2. Another PCR-amplified product with a size almost the same as that of the 350-bp product was obtained (Fig. 5). After separation on 7% PAGE and cloning into M13 mp18 with BamHI and EcoRI, sequence analysis showed that the second PCR product encoded a 5'flanking sequence of 107 bp and the total coding sequence of TGTI-11. As a result, the full-length cDNA sequence of TGTI-I1 with 481 bp was completed by overlapping the sequences of the two cDNA segments as shown in Fig. 6. The open reading frame encoded a TGTI-I1 precursor with 63 amino acid residues. Apart from the mature inhibitor, there is a leading peptide with 34 residues upstream from the NHpterminal glycine to the initial codon methionine. The deduced amino acid sequence from the cDNA of TGTI-I1 was completely consistent with that determined by primary structure

T C T C T G T G T C T G C ' P G T T T~T~'~~~~~~~~~~~r~T G R T C I \ C -P C T T C C T C~I ) T G ? C C~~~T~~A T~~~A~T A A T~-C I I R A C~~-~P G ? G T~~~T G P R I I T C T T T T C T A T G T T C T R T G R ) I G C T C T A C A~~~C~~~~~S p o i y R
LO 15 20 2 5 FIG. 6. The nucleotide and deduced amino acid sequences of TGTI-I1 cDNA. The first 21 amino acid residues correspond to a signal peptide followed by a 13-amino acid pro-peptide. The amino acid sequence of the mature inhibitor corresponds to TGTI-11. The putative polyadenylation signal AATAAA is underlined.
analysis. It is most likely that the signal peptide of the inhibitor consists of 21 residues with a hydrophobic core (from position -34 to -14) followed by a pro-peptide with 13 residues (from position -13 to -1).
Genomic Sequence of Mature TGTI-11-Compared with the determined cDNA sequence of TGTI-11, synthesized primer 1 is different only in four nucleotides, and its 3'-end is completely complementary to its target sequence. Thus, both primer 1 and primer 2 were again used for PCR amplification, and the total DNA of the towel gourd was used as a PCR template. An amplified PCR product with about 100 bp was obtained (Fig. 5). With the same strategy used for cDNA sequence analysis, the genomic DNA sequence of TGTI-I1 was determined. The result showed that the DNA sequence of the inhibitor is identical with its cDNA sequence. The finding implies that there is no intron in the gene structure of TGTI-11.

DISCUSSION
In this study, we describe two new members of the squash family of inhibitors from the towel gourd (Luffu cylindricu), a cucurbitaceous plant. Their structural features are quite similar to those of other squash family inhibitors. The residues located at the reactive site, the disulfide bonds, and the COOH terminus are completely conserved in all squash family inhibitors (Fig. 3). Except for the bitter gourd inhibitor (MCEI-11), the reactive site of all squash family inhibitors is either an arginine or a lysine. As a result, they are all trypsin inhibitors, whereas MCEI-I1 is an elastase inhibitor with a leucine reactive site. Recently, we have succeeded in converting the T r~c h o s u n~~s trypsin inhibitor into a strong elastase inhibitor by protein engineering, in which the reactive site Arg-Ile was replaced by Ala-Ser (data not shown).
Until now, no cDNA sequence of any of the squash family inhibitors has been reported. To better understand the molecular characteristics of these inhibitors at the level of their genes, a cDNA library of the towel gourd was constructed. Instead of the conventional method of screening the cDNA clone by antibody or a DNA probe, which is quite time consuming and laborious, the PCR technique was used to directly amplify TGTI cDNA and TGTI DNA, respectively, from the total cDNA and DNA of the towel gourd without screening. Since the 3'-end of the primer is more important in the annealing and extension process in PCR amplification, primer 1 was designed and synthesized on the basis of the elucidated TGTI amino acid sequence and the preference in codon usage of other plant protease inhibitor genes. Primer 1, which corresponded to the NH,-terminal part of TGTI, provided five nucleotides, ATGCC, of the 3'-end (Met codon + two nucleotides of the Pro codon) for exact pairing with the target template. The rest of the nucleotides of primer 1 conformed to the preferred codon usage of plant protease inhibitors. Compared with the elucidated TGTI-I1 cDNA sequence, there were mismatches of only 4 base pairs in primer 1 (Pro, CCC-CCA; Arg, CGC-AGA Ile, ATT-ATC), which can be tolerated in PCR amplification.
The TGTI-I1 cDNA sequence comprises a 189-bp open reading frame flanked by 107 bp of the 5I-untranslated region and 185 bp of the 3"untranslated region. As in other major plant proteins, the initiation site was found to be m G C (29}. The 3'-untranslated region contains a polyadenylation sequence AATAAA and its poly (A) tail. However, the AATAAA sequence is found at 112 bp instead of the canonical polyadenylation site at around 20-30 bp upstream from the poly(A) tail (30). The deduced amino acid sequence of the coding region consists of 63 residues of the precursor protein (prepro-TGTI). The pre-peptide, a signal peptide, is presumed to consist of 21 amino acid residues (position -14 to -34) with a hydrophobic core necessary for entering the rough endoplasmic reticulum and cleaved at the small residue glycine (31). The pro-peptide comprises 13 residues (position -1 to -13), displaying hydrophilic features. While its functional significance remains to be clarified, it may not be too far fetched to speculate its involvement in directing the correct folding of the three disulfide bridges in TGTI, as the disulfide bridges in the squash family of inhibitors are folded in a very unique way (I-IV, 11-V, and III-VI).
The PCR-amplified fragment was identified to be the TGTI-I1 cDNA, and no TGTI-I cDNA was found in any positive clone of the PCR fragment. It is uncertain whether this was due to the fact that TGTI-I mRNA does not appear until the towel gourd is quite ripe or the failure to detect the very low amount of TGTI-I mRNA as compared with that of TGTI-I1 mRNA. During purification of the two TGTI components, it was found that TGTI-I1 was indeed the major component.
The mature TGTI cDNA sequence is completely identical to its genomic structure, demonstrating that TGTI is encoded by a single exon without any intervening sequence, as often occurs in small molecular weight peptide genes.
A c k~~& d g~e n t~-W e thank Lai-Geng Xu for amino acid sequence analysis, Jun Bian for the synthesis of primers, and Dr. Wen-Feng Xu for helpful advice.