cDNA and Gene Sequence of Manduca sexta Arylphorin, an Aromatic Amino Acid-rich Larval Serum Protein HOMOLO~Y TO ARTHROPOD HEMOCYANINS*

The serum (storage) proteins produced by insect lar-vae at the end of the feeding cycle are hexameric blood proteins with one or more type of subunits. The cDNA and gene structure of the aromatic amino acid-rich larval serum protein arylphorin from the tobacco horn- worm, Manduca sexta, has been determined. In M. sexta arylphorin there are two subunits a and 8, which have 686 and 687 amino acids, respectively, and whose amino acid sequences are 68% identical. The two genes, separated by 7.1 kilobases of chromosomal DNA, are transcribed in the same direction. Based on the alignment of the amino acid sequence, the rate of nucleotide substitution between the two coding regions predicts that the two genes diverged about 100 million years ago. Both genes contain 5 exons and the upstream region contains a sequence, TGATAAA, which is sim- ilar to a sequence found in all other storage protein genes for which information is available. When the National Biomedical Research Foundation protein se- quence data base was searched, it was found that the arylphorin subunits showed significant similarity to the arthropod hemocyanins, which are hexameric oxy-gen-carrying proteins. Based on the alignment of the sequence of M. sexta arylphorin and the hemocyanin from the spipy lobster (Panulirus interruptus), for which a 3.2 A structure has been determined, it was observed that the highest concentration of conserved residues were found in those regions of the sequence which are involved in subunit interactions


cDNA and Gene Sequence of Manduca sexta Arylphorin, an Aromatic Amino Acid-rich Larval Serum Protein
HOMOLO~Y TO ARTHROPOD HEMOCYANINS* (Received for publication, June 26, 1989) Elizabeth Willott$, Xiao-Yu Wang, and Michael A. Wells#  The serum (storage) proteins produced by insect larvae at the end of the feeding cycle are hexameric blood proteins with one or more type of subunits. The cDNA and gene structure of the aromatic amino acid-rich larval serum protein arylphorin from the tobacco hornworm, Manduca sexta, has been determined. In M. sexta arylphorin there are two subunits a and 8, which have 686 and 687 amino acids, respectively, and whose amino acid sequences are 68% identical. The two genes, separated by 7.1 kilobases of chromosomal DNA, are transcribed in the same direction. Based on the alignment of the amino acid sequence, the rate of nucleotide substitution between the two coding regions predicts that the two genes diverged about 100 million years ago. Both genes contain 5 exons and the upstream region contains a sequence, TGATAAA, which is similar to a sequence found in all other storage protein genes for which information is available. When the National Biomedical Research Foundation protein sequence data base was searched, it was found that the arylphorin subunits showed significant similarity to the arthropod hemocyanins, which are hexameric oxygen-carrying proteins. Based on the alignment of the sequence of M. sexta arylphorin and the hemocyanin from the spipy lobster (Panulirus interruptus), for which a 3.2 A structure has been determined, it was observed that the highest concentration of conserved residues were found in those regions of the sequence which are involved in subunit interactions in the hexameric protein. It is suggested that the insect storage proteins and the arthropod hemocyanins have evolved from a common ancestor.
During the final larval stadium of holometabolous insects, a few proteins, called the larval serum proteins or larval storage proteins (LSP),' accumulate in large amounts in the hemolymph where they can account for up to 85% (by weight) of the ~emoIymph protein. The LSPs have molecular masses of about 500 kDa and are composed of six subunits of approximately 72-80 kDa, which may or may not be identical. The proteins have been called storage proteins because amino acids from them are recoverable from many different proteins in the adult (Levenbook and Bauer, 1984) suggesting that the LSPs serve as a store of amino acids for synthesis of adult proteins. The proteins have also been implicated in cuticle sclerotization (Agrawal and Scheller, 1987;Kaliafas et al., 1984;Konig et ai., 1986;Webb and Riddiford, 1988).
There are two well characterized classes of LSPs (Levenbook, 1985). One class is rich in aromatic amino acids, generally containing 18-25% phenylalanine and tyrosine, and proteins in this class are called arylphorins (Telfer et al., 1983). Another class is relatively rich in methionine, approximately 6%, and is often more prevalent in females than in males (Tojo et al., 1980;Ryan et al., 1985a). Manduca sexta arylphorin has two subunits which by sodium dodecyl sulfatepolyacrylamide gel electrophoresis are approximately 72 and 76 kDa each with a high mannose type carbohydrate chain (Ryan et al., 1985b). Arylphorin is present in both males and females throughout larval development though its concentration in the hemolymph increases greatly in the last larval instar (Kramer et aL, 1980;Willott, 1988).
The LSPs represent important model proteins because their genes are highly expressed and are ~eve~opm~ntally regulated. In order to investigate gene regulation it is necessary to know the sequence of the genes and mRNAs which code for the proteins. The cDNA and gene structure for the methioninerich storage protein from Bombyz mori have been reported (Sakurai et aL, 1988a(Sakurai et aL, , 1988b. The genes for the three arylphorin subunits from Drosophila melanogaster (Smith et al., 1981;Delaney et aL, 1986;Lepesant et al., 1986) and for the arylphorin from Sacrophuga peregrina (Matsumoio et at., 1986) have been cloned and some partial sequence data reported. However, to date, the complete sequence for an arylphorin has not been reported.
In this paper we report the complete cDNA and gene sequences for the two subunits of arylphorin from M. sexta.
In addition we report that the arylphorins, as well as the B. mori methionine-~ch storage protein, show a remarkable degree of sequence similarity to the arthropod hemocyanins.

MATERIALS AND METHODS
cDNA Library C o~t r~t~o n and Screening-A larval fat body cDNA library in hgtll (Kanost et al., 1989) was screened by differential hybridization (Maniatis et aL, 1982) to larval and adult cDNA to select clones expressed mainly or solely in larval fat body. The larval-specific clones were then screened with antisera to arylphorin as described by . From this screening two clones were isolated which each had an approximately 2.4-kilobase insert but different restriction enzyme maps.   were subcloned into pTZ vectors by using convenient DNA restriction sites or by using these after DNA amplification of the Xgtll inserts with the polymerase chain reaction (Oste, 1988), using X forward and reverse primers. To obtain suitable length fragments in the a subunit, exonuclease 111 deletions of pTZ clones were constructed as described by Henikoff (1984).

T T G~T A T G T A T A T M T T T U I I T G G A C M A G M A T A T A T G T A T M T T T A T G U C A C C T A T G T~T T T C T~C T T T T T C C C T C T T A C A C A T A C A G T A C~M C U I I G~T A T~T A T T T A~A C C~~~~
Isolation of Genomic Clones-The probe, which corresponded to the coding region of the a-subunit of arylphorin, was 32P-laheled by nick translation (Maniatis et al., 1982). The labeled probe was used to screen a M. sexta genomic library in EMBL-3  using -1.1 X lo5 plaques and the conditions described by Maniatis et al. (1982). X DNA was obtained from positive clones by the plate lysate method (Maniatis et al., 1982), and purified by the method of Benson and Taylor (1984). The clones were further characterized by digestion with various restriction enzymes. Two of the clones contained the entire coding region for the a and p subunits of arylphorin.
Selected DNA fragments from these clones were subcloned into pTZ18R or pTZ18U plasmids and digested by exonuclease I11 (Henikoff, 1984).
DNA Restriction Mapping-DNA was digested with various restriction enzymes. The products were separated in agarose gels and transferred to nitrocellulose (Southern, 1975). The filters were hybridized to labeled cDNA probes, as described above.
DNA Sequencing-Sequencing was carried out using the dideoxy chain termination method of Sanger et al. (1977), using Sequenase (United States Biochemical Corp.), with templates obtained from pTZ vectors by generating single stranded DNA from the fl origin of JM101.
replication using helper phage M13K07 and Escherichia coli strain Computer-assisted Analysis of Sequence Data-The National Biomedical Research Foundation protein sequence data base was searched with the FASTP program, and the significance of the similarity was determined with the RDF program (Lipman and Pear-son, 1985). The sequences of the two arylphorin subunits were aligned with the PRTALN program (Wilbur and Lipman, 1983), and this alignment was used to determine the rates of nucleotide substitution between the two arylphorin mRNAs using the computer programs of Li et al. (1985). Progressive sequence alignment and construction of phylogenetic trees were performed using the computer programs of Feng and Doolittle (1987).
Primer Extension-The following oligonucleotide, ACTGTCA-TAATCCTAGCGG, was end labeled using [32P]ATP and T4 polynucleotide kinase. The labeled probe was incubated with RNA and reverse transcriptase under the conditions described by Williams and Mason (1985). The extension product was recovered by ethanol precipitation, denatured with 50% formamide and analyzed by autoradiography following separation on a 8% polyacrylamide sequencing gel. Fig. 1 shows the organization of the genes for the two subunits of M. sexta arylphorin and the strategy used to sequence the genes. Fig. 2 presents a partial restriction enzyme map of the cDNAs for the two proteins and the sequencing strategy used. Both genes contain five exons and are organized in a fashion similar to that of the methionine-rich storage protein gene from B. mori (Sakurai e t al., 1988a(Sakurai e t al., , 1988b, although the introns are smaller in the M. sexta genes.

RESULTS AND DISCUSSION
Southern analysis of the EcoRI digest of X clone 10A showed that some fragments hybridized to the a subunit cDNA and others to the fl subunit cDNA. A restriction map constructed from two overlapping X clones (10A and 10B) is presented in Fig. 3 and shows that the two arylphorin genes are separated

Y K I G K D Y D V W L N I D N Y S~D F L L L Y R T C W P K G F E F S I F Y E~~I A~~Y Y K I G K E Y N I E A N I D N Y S N l M A V E B F L P L Y R~F~~Y E F S P F Y D R~~r G V~~Y
2:::: : :: :: :::::: :: ::: :: : : ::: : ::::

Y~~D W F Y Q L Y D R I I N Y I N E~Q Y~P Y N Q N D~~G~~S D~K W L T Y F E Y Y D F Y~S~D~Y Q L~I~I Y E Y K Q Y~F Y S S E K~~G~D~D K L~F F E Y Y D F
:::::::: ::::: :: :: : ::::::: : : ::: :: :::: : ::::::

D V S N S~S~I~-F P Y G Y~P R~K F F S V S I G V K ? D V A~A V F K I F~P~D S N D A S N S V~K E~S Y~D~I R Q~~K P F S V S I D I K ?~V D A~I~P~D D N
: :::::::::::::: :: ::::: ::: ::::

RWHPEYFKPPNWnFBDVHVYHEGEQFPYK~PFWWKVeV
... :::: :: : :: ::: :: :::::::: :  Iation during development, one or more of these regions may represent important control sites. One potentially important region consists of a 59-nucleotide sequence beginning 104 and 108 nucleotides upstream from the transcription start site in the a and @ genes, respectively. These sequences, which are 67% identical, are shown in Table I. Included within these sequences is a 7-nucleotide sequence, TGATAAA, which, as shown in Table I, is similar to a sequence found in the upstream region of the genes for all storage proteins for which sequence information is available. Another potentially important region is found 600 and 221 nucleotides upstream from the transcription start site in the a and p genes, respectively.
These sequences, which are 71% identical, are shown in Table   I and contain sequences which are similar to the H-box region in Drosophila yolk polypeptide genes (Yan et aL, 1987). A similar sequence was also found in the M. sexta microvitellogenin gene (Wang et al., 1989). Fig. 6 presents the amino acid sequences deduced from the gene and cDNA sequences. Table I1 presents the amino acid composition of the two subunits derived from the sequence, as well as the composition of the isolated protein, and it can be seen that the agreement is excellent. Each subunit contains a 16-amino acid signal sequence; in these signal sequences 14 of the 16 amino acids are identical. The mature a subunit contains 686 amino acids and a calculated molecular weight of 82,278, while the mature /3 subunit contains 687 amino acids and has a calculated molecular weight of 82,279. These molecular weights do not agree with the differences noted by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, but we can offer no explanation at present. Each subunit contains two consensus glycosylation sites (NXT/S), although only one appears to be used (Ryan et al., 1985b). The alignment shown in Fig. 6 shows that the mature proteins are 68% identical with only two gaps. Using the protein alignments shown in Fig. 6 to align the cDNA sequences and the computer program of Li et al. (1985), it was determined that there have been 0.208 f 0.013 substitutions/nonsynonomous site between the coding regions of the two genes. Using a value of 0.9 X lo-' substitutions/nonsynonomous site/year (Li et al., 1985), we can estimate that the two genes diverged approximately 100 million years ago.
In the apolipophorin-I11 genes from M. sexta and Locusta migratoria a bias toward the use of C or G in the third position of codons was noted (Kanost et al., 1988). In the arylphorin genes a strong bias for use of C in the third position of codons is found for Phe (84% for the CY gene and 92% for the p gene); Tyr (79% for the a gene and 81% for the P gene); Asn (76% for the a: gene and 78% for the @ gene); and Asp (83% for the a gene and 61% for the @ gene). Together TTC (Phe) and TAC (Tyr) account for about 16% of all codon usage, which means that the corresponding tRNAs must be abundant in order to support the high rates of arylphorin biosynthesis seen in the last larval stage.
When the National Biomedical Research Foundation protein sequence data base was searched, the only sequences found with significant similarity to the M. sexta arylphorins were several subunits of various arthropod hemocyanins. The significance of these similarities was evaluated by comparing the sequences of the hemocyanins with 50 randomly shuffled sequences having the same amino acid composition as either the a or p subunits using the RDF program (ktup = 2) (Lipman and Pearson, 1985). For the a subunit the following z values were obtained, where z > 10 is considered significant:  SPLB  ALPH  BETA  BMOR  TARD  TARE  HSCB   SPLB   BETA   ALPH   BMOR  TARD  TARE  HSCB   SPLB  ALPH  BETA  BMOR  TARD  TARE  HSCB   SPLB  ALPH  BETA  BMOR  TARD  TARE  HSCB   SPLB  ALPH  BETA  BMOR  TARE  TARD  HSCB   SPLB  ALPH  BETA  BMOR  TARD  TARE   The possible relationship between insect storage proteins and arthropod hemocyanins was proposed by Telfer and Massey (1987) based on amino acid composition comparisons and the fact that both classes of proteins are hexamers. Arthropod hemocyanins are copper-containing proteins which transport oxygen in the he~olymph (for a review, see Linzen et a!., 1985). They are hexamers or multi-hexamers with subunits of =75,000 Da. Thus, at least from a structural perspective the storage proteins and the hemocyanins have similar subunit sizes and form hexamers. When the sequences of the M. sextu arylphorins, the B. mori methionine-rich storage protein, and hemocyanin subunits from P. interruptus, E. californicum subunits D and E, and L. polyphemus were aligned by the method of Feng and Doolittle (1987) the results shown in Fig. 7 were obtained. The percent identities between the various proteins, based on this analysis, are given in Table  111. Interestingly, the storage proteins are almost as similar to the various hemocyanins as are the various hemocyanins to each other. However, only one of the six copper-binding histidines found in the hemocyanins is conserved in the insect storage pToteins.
A 3.2 A structure has been reported €or the P. interrupts hemocyanin (Gaykema et al., 1984(Gaykema et al., ,1985. When we examined the aligned sequences of this hemocyanin and the arylphorins, we noted that those regions of the proteins which contained the highest concentration of identical residues or conservative replacements corresponded to those portions of the hemocyanin protein which form the subunit contacts between the hexamers. These residues would be expected to be conserved due to structural requirements for interactions along the interface between subunits. in addition, many of the identical or conservative replacements occurred either within or at the beginning or end of a-helical segments in the hemocyanin. Thus, it appears reasonable to propose that the hexameric insect storage proteins and the hexameric arthropod hemocyanins may have a common three-dimensiona~ structure. If this were the case, it may be that these two classes of proteins have evolved from a common ancestor. An evolutionary tree showing such a possible relationship is presented in Fig. 8. This tree was derived from the aligned sequences and the distance scores given in Table 111. This tree is consistent with the phylogenetic relationships of the species from which the protein sequences were obtained. While the construction of such a tree does not prove an evolutionary relationship, it does suggest that a search for hexameric proteins in the blood of other arthropods and perhaps annelids would be worthwhile. In addition to the LSPs, the hexameric motif has been reported in other insect hemolymph proteins, including a copper-containing riboflavin binding protein from ~y u~p~~u cecropia (Telfer and Massey, 1987) and hexameric 500-kDa protein from L. migratoria (de Kort and Koopmanschap, 1987). These observations may also be taken as support for the notion that a hexameric structure of blood proteins may be common feature in the arthropods, although at this time there is no evidence to suggest why a hexameric protein is advantageous.