Isolation of a Clone Encoding a Second Dragline Silk Fibroin

dragline is a unique protein fiber pos- sessing both high tensile strength and high elasticity. the amino acid sequence for the amino acid composition of dragline a partial cDNA for dragline silk dragline of to of 8-turns in proline-rich

Spiders can generate several protein silks from specialized glands designed for diverse functions, unlike insects which produce only one type of silk. Nephila clauipes, an orb-web spinning spider, has six different types of glands (1, 2), each producing a different silk. The presence of multiple silks within a single organism presents a unique opportunity to examine protein structure and function. These silks are related by a predominance of alanine, serine, and glycine in their amino acid compositions. They are synthesized in specialized cells at the tail of their respective glands, secreted into a glandular lumen, and finally extruded on demand through a duct and valve system ending in a spinneret. However, the different silks vary widely in mechanical properties, implying disparate amino acid sequences. Several factors contributed to the choice of dragline silk from the major ampullate gland of N . clauipes as the first silk to be studied, including: (i) its singular combination of high tensile strength and high elasticity (3-6); (ii) ease of gathering large quantities of the silk (7); (iii) the size and accessibility of the gland; and (iv) previous extensive physical and chemical characterization (2, 8-11). Most structural models of dragline silk suggest a pseudocrystalline protein having significant proportions of antiparallel P-sheet interspersed with regions of undefined structure which are thought to be responsible for the elastic properties (7, 12, 13). The amino acid sequence, which would give a great amount of information with regard to possible *This work was supported by Grant 28457-LS from the Army Research Office. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M92913.
$ To whom correspondence should be addressed Dept. of Molecular Biology, University Station, Box 3944, University of Wyoming, Laramie, WY 82071.
structure, was exceedingly difficult to obtain due to the solvent-and enzyme-resistant nature of dragline silk (10, 11). We made cDNA clones from the messenger RNA of the major ampullate glands in order to determine the protein's primary structure and gain insight into possible higher order structures.
The first clone isolated encoded a protein, Spidroin 1, containing repetitive elements which were not rigorously conserved in terms of sequence but did preserve a structural identity (14). Each structural repeat was composed of two segments, a polyalanine segment of 6-9 amino acids followed by a (Gly-Gly-Xaa), segment, where n ranged from 5 to 11. However, the sequence contained virtually no proline, known to be 3.5% of the amino acid composition, and only two-thirds of the tyrosine found in N . clauipes dragline silk (11, 15). A pentapeptide, Gly-Tyr-Gly-Pro-Gly, purified from the acidcleaved silk at the same time as those used as the basis for cloning Spidroin 1 was not present in the predicted sequence of Spidroin 1.

MATERIALS AND METHODS
Isolation of the Spidroin 2 Clone-Twelve cDNA clones, in pBluescript (Stratagene), were isolated from a previously constructed library (14) using the oligonucleotide 5'-CCNGGNCCATANCC-3' as a probe. The probe was labeled with 32P by kinasing (16) and hybridized to picked colonies (14) using the method of Wood et al.
(17) employing tetramethyl ammonium chloride in the final 47 'C wash. Plasmid DNA was prepared,' digested with EcoRI and BamHI, and subjected to agarose gel electrophoresis on a 1% agarose (Bethesda Research Laboratories) gel. The DNA was transferred to Zetaprobe (Bio-Rad) using a Vacublot technique (18). The resulting Southern blot was hybridized (17) with the same probe under the same conditions as the original colony screening. Four clones showed reasonable hybridization and pMASS2 was chosen because it contained the largest insert, approximately 2 kilobases.
Nested Deletion Sequencing-pMASS2 was subjected to exonuclease I11 (Erase-a-Base kit, Promega) and used in a nested deletion strategy for sequencing. DNA from each time point was prepared using a quick plasmid preparation' and subjected to double-stranded sequencing with Sequenase (United States Biochemicals) utilizing the 7-deaza-GTP sequencing kit (19). Each strand was sequenced at least two times, some sections as many as five times. This strategy helped resolve compressions seen during coding strand sequencing and aided in organizing the sequenced deletions, as the pMASS2 insert was highly repetitive at the nucleotide level as well as the peptide level. Computer Predictions of Hydrophilicity and Secondary Structure-Hydrophilicity profiles and secondary structure predictions were generated using MacVector, version 3.5 (International Biotechnologies, Inc.) on a MacIntosh IIfx (Apple Computer, Inc.) personal computer. Hydrophilicity was estimated using Kyte-Doolittle methods (20) with a window size of 7 to see smaller structure without background. Secondary structure was predicted using Chou-Fasman (21) algorithms and Robson-Garnier (22) algorithms separately, then generating a consensus prediction.

Spider Dragline Silk
Is a Two-protein Fiber 19321 CCTGGACGAZIATCCACCACCACAACAACCCCCAGGACCATATGCCCCTGGACAACAAGGACCATCTCCACCTCCCACTCCCCCTOQLGCA 90

M T T T A T l l A A A A T A T C C A T G G A T T m : T A G C C T C G G C A A C T A A T T G C T C C T A C T A T G T A A T T T T T T T T T A B B T B B B T T~T c c A A C n c 1980
FIG. 1. DNA sequence of pMASS2 and predicted amino acid sequence of Spidroin 2. The termination codon, TAA, is marked with an asterisk. The putative polyadenylation signal site and the beginning of the poly(A)-tail are underlined. This sequence has been submitted to GenBank.

RESULTS
tail. The frequency of codon usage in Spidroin 2 ( Table I) is The peptide absent from the Spidroin 1 sequence, Gly-Tyr-Gly-Pro-Gly, was used as a basis for designing a DNA probe (5'-CCNGGNCCATANCC-3' ( n = A, T, C, or G)) which was used to rescreen the original major ampullate gland cDNA library from which the Spidroin 1 clone had been isolated. A second cDNA clone was isolated and sequenced which encoded a separate, distinct protein, Spidroin 2.
The DNA sequence is shown in Fig. 1 along with the predicted amino acid sequence. The start of the 16-base long poly(A)-tail (nucleotide 1982) is shown underlined, as well as the likely polyadenylation signal site (nucleotide 1961-1966). There is a relatively short 3"noncoding region with the stop codon (nucleotide 1882) occurring 97 bases before the poly(A)-very similar to that of Spidroin i (14). For example, glycine, the most prevalent amino acid in both proteins, shows an 89% preference for A or T as the third nucleotide in Spidroin 2 and 94% in Spidroin 1. Glutamine shows a 97% preference for A in Spidroin 2 and 98% in Spidroin 1. The large preference for A and T as the third nucleotide in dragline silk can be explained by considering the secondary structure of the mRNA. The repetitive nature of the DNA encoding the polyalanine segments, (GCX),, could result in numerous hairpin loops formed between nearby polyalanine coding regions if the third base were also G or C instead of A or T. The same is true for other G/C-rich regions as well. Chavancy et al. (23) have noted that in vitro translation of Bombyx mori mRNA proceeds more efficiently if the fibroin mRNA is heated to

Spider Dragline Silk
Is a Two-protein Fiber

FIG. 2.
Predicted amino acid sequence for the Spidroin 2 protein, rearranged to show repetitive elements. The most repetitive protein elements have been arranged from the aminoterminal through the highly conserved region, followed by the less conserved region and divergent COOH-terminal tail. The dashes represent deletions, allowing the elements to be arranged for maximum identity.
100 "C and quickly cooled, suggesting that a G/C-rich message may already have a great degree of secondary structure which would be compounded by having G or C in the third position of the codons.
The first 464 amino acids of the predicted partial protein sequence can be grouped into highly conserved repetitive elements, shown in Fig. 2 with deviations from the conserved sequence underlined. The next 65 residues show less conservation, and the last 98 residues are widely divergent. The pentapeptide on which our probe was based, Gly-Tyr-Gly-Pro-Gly, is represented numerous times throughout the sequence. The proline-rich region has a repeating pentapeptide motif in which Gly-Pro-Gly-Gly-Tyr alternates with Gly-Pro-Gly-Gln-Gln. This region can be divided into two distinct elements, a segment with a variable number of the alternating pentapeptides followed by a highly conserved core of 3 pentapeptide repeats. The variations are groups of 5 amino acids, in contrast to the pattern seen in Spidroin 1 which consists of groups of 3 amino acids. The joining region, Ser-Gly-Pro-Gly-Ser, is very highly conserved and leads into a series of 6- 10 alanines with an occasional conservative serine substitution.
While the repetitive regions of Spidroin 1 and Spidroin 2 show very little homology, there is a 49-amino-acid region in the COOH-terminal domain of Spidroin 2 with 80% identity to a corresponding region in Spidroin 1 (Fig. 3). These regions both follow the highly repetitive regions of their respective proteins.

DISCUSSION
We believe Spidroin 1 and Spidroin 2 are the only two major protein subunits of dragline silk from N. cluuipes. All of the peptides reported in a previous paper (14), the peptide used for designing our probe (Gly-Tyr-Gly-Pro-Gly), and over 20 smaller peptides' produced by partial acid digestion of dragline silk are found within the predicted protein sequences of Spidroin 1 and Spidroin 2. The total amino acid composition of dragline silk can be accounted for by a combination of the amino acid compositions of the two Spidroins. Using the cloned cDNAs as probe^,^ each Spidroin exhibits only a single hybridizing mRNA band in extracts from the major ampullate gland, indicating a unique mRNA for each protein.
The overall 3.5% proline content of dragline silk can be accounted for by averaging the high level of proline (13.5%) in Spidroin 2 with the negligible amount of proline in Spidroin 1. Using their respective cDNAs as probes, the sizes of the mRNAs for Spidroin 1 and Spidroin 2 have recently been determined as 5.6 and 3 kilobases, respectively.s We calculated the amino acid contribution of the two Spidroins to the total amino acid composition of dragline silk by assuming that: 1) the length of each protein was proportional to the size of its M. Xu and R. V. Lewis, unpublished results. Y. Kadokami and R. V. Lewis, manuscript in preparation. mRNA; 2) the repetitive regions of both proteins extended into their respective uncharacterized amino-terminal regions; and that 3) the molecular ratio of Spidroin 1 to Spidroin 2 is about 3 to 2. The calculations indicate that Spidroin 1 and Spidroin 2 are sufficient to account for the total amino acid composition. It should be noted that within a single strand of silk from one spider, the amino acid composition varies from site to site (11). Varying amounts of proline have also been found in dragline silk from a single spider over its lifetime (24). These variations in amino acid composition can be accounted for if the expression of the two Spidroin proteins is independently regulated.
The predominant use of A or T (U) as the third nucleotide in the codons for glycine and alanine is not as pronounced as that found for B. mori silk (25). However, it has been shown that the production of dragline protein from spider silk mRNA is subject to translational pauses (26,27) in the same manner as B. mori silk (28) under both i n vivo and i n vitro conditions.
Both organisms appear to produce gland specific pools of tRNAs for glycine and alanine as these amino acids represent the major structural components of the silk proteins (23, 29, 30). Efficient in vitro translation of the silk proteins from both organisms requires tRNA supplementation from appropriately conditioned glands (27,28,30). While the pattern of tRNA accumulation in B. mori is approximately proportional to the use of corresponding amino acids (31), alanine tRNA is the most abundant species in N . clavipes major ampullate glands (30). The difference can probably be attributed to the different nature of the repetitive elements of the two silks. Since the alanines in both Spidroin 1 and Spidroin 2 are found in clusters of 6-10, as opposed to the dispersed nature of B. mori alanines ((Gly-Ala-Gly-Ala-Gly-Ser),), it may be necessary to have a larger pool of alanine tRNA during translation of spider silk to get through the areas of repetitive alanine usage.
The overall repetitive nature of the amino acid sequence of Spidroin 2 is emphasized by computer predictions of the hydrophilicity and secondary structure of the protein (Fig. 4). Computer predictions of secondary structure using Chou-Fasman ( C F ) and Robson-Gamier (RG) statistical methods predict a number of turns in the proline-rich regions and helical conformations for the polyalanine regions. The Robson-Garnier method predicts significant regions of P-sheet, especially in the polyalanine segments, but the Chou-Fasman method confines any sheet-like structure to the hydrophobic COOH-terminal tail. The consensus prediction (CfRg) shows that there are numerous turns in the proline-rich regions alternating with polyalanine segments which are predicted to be helix-forming.
We originally believed that the polyalanine segments were likely to form a-helices (lo), based on computer predictions for Spidroin 1 and Spidroin 2 and physical studies of peptides in aqueous solutions (10). Based on arguments which follow, we now believe that Spidroin 2 is composed of P-turn structures in the proline-rich regions alternating with 0-sheet regions formed by the polyalanine segments. We also believe the polyalanine regions in Spidroin 1 are in the &sheet conformation.
The @-turn structure, difficult to detect with x-ray diffraction, Fourier transform infrared spectroscopy, or Raman spectroscopy, is the most likely conformation for the pentapeptide repeats of Spidroin 2 to adopt (32)(33)(34). Gly-Pro-Gly-Xaa-Yaa repeats are known to form P-turns and a Gln in position 4 may indicate type I1 P-turns (34). The P-turn conformation has been demonstrated for similar proline-rich sequences in unrelated proteins such as gluten, synaptophysin, and elastin (32,33).
The polyalanine regions of Spidroins 1 and 2 probably form the antiparallel 0-sheets observed by many studies (3-6, 8, 35). X-ray diffraction studies of drfgline silk from our laboratory and others (36) show a 10.6-A spacing between stacks of pleated sheets in dragline silk. This corresponds to the spacing found for 0-sheets formed by polyalanine (37). The Gly-Gly-Xaa repeats of Spidroin 1 have been shown to be unfavorable for p-sheet formation in silk (38). Studies on polyalanine peptide crystals containing low amounts of water demonstrate only the presence of ,&sheet structures (39), and spider dragline silk contains less than 6% ~a t e r ,~ corresponding to less than one water molecule per 3 amino acids.
Under tension, the linked @-turns in Spidroin 2 may form a P-turn spiral or extend an already present spiral, as @-turns are known to have a degree of structural flexibility (32,33). This would be similar to an elastic mechanism proposed for elastin (33) with entropic forces generated by the disruption of P-turns driving the retraction of the P-turn spiral. The high tensile strength of the fiber could be derived from the high proportion of p-sheet (oriented along the axis of the fiber). Elucidation of the primary structure (and predicted secondary structure) of Spidroin 2, and demonstration of the two protein subunit nature of dragline silk, has allowed disparate results from chemical and biophysical studies to be incorporated within a single model.