Primary Structure of Apolipophorin-I11 from the Migratory Locust, Locusta rnigratoria POTENTIAL AMPHIPATHIC STRUCTURES AND MOLECULAR EVOLUTION OF AN INSECT APOLIPOPROTEIN*

The amino acid sequence of an insect apolipoprotein, apolipophorin-I11 from Locustu rnigrutoriu, has been deduced from the sequence of its cloned cDNA. The mature hemolymph protein consists of 161 amino acids. Optimized alignments of this protein with apo-lipophorin-I11 from the tobacco hornworm, Manducu sextu, disclosed an overall sequence identity of only 29’70, even though the two proteins are functionally equivalent. The L. migrutoriu sequence is composed of 12 repeating peptides that are variable in length. Six amphipathic helical segments of varying length were identified in each protein using a newly described al-gorithm for detecting such secondary structures. The degree of sequence identity between the two insect apoproteins is considerably less than that observed among orthologous mammalian apolipoproteins. How-ever, calculation of the rates of synonymous and non- synonymous nucleotide substitutions indicates that the insect genes may be evolving at rates similar to the mammalian apolipoprotein genes. Further comparative analyses of insect and mammalian apolipoproteins should provide insights about the limits of sequence diversity tolerated by their predicted amphipathic hel-icaJ

The amino acid sequence of an insect apolipoprotein, apolipophorin-I11 from Locustu rnigrutoriu, has been deduced from the sequence of its cloned cDNA. The mature hemolymph protein consists of 161 amino acids. Optimized alignments of this protein with apolipophorin-I11 from the tobacco hornworm, Manducu sextu, disclosed an overall sequence identity of only 29'70, even though the two proteins are functionally equivalent. The L. migrutoriu sequence is composed of 12 repeating peptides that are variable in length. Six amphipathic helical segments of varying length were identified in each protein using a newly described algorithm for detecting such secondary structures. The degree of sequence identity between the two insect apoproteins is considerably less than that observed among orthologous mammalian apolipoproteins. However, calculation of the rates of synonymous and nonsynonymous nucleotide substitutions indicates that the insect genes may be evolving at rates similar to the mammalian apolipoprotein genes. Further comparative analyses of insect and mammalian apolipoproteins should provide insights about the limits of sequence diversity tolerated by their predicted amphipathic hel-icaJ domains.
Lipids are transferred between insect tissues by lipophorin, the major lipoprotein of insect hemolymph (Beenakkers et al., 1985;Chino et al., 1985;Shapiro et al., 1988). Lipophorin contains two apoproteins, apolipophorin-I (apoLp-I,' M, -250,000) and apolipophorin-I1 (apoLp-11, M, -85,000) (Ryan et al., 1984). In the African migratory locust, Locusta migratoria, and the tobacco hornworm, Manduca sexta, which use lipids as the major fuel for flight, the peptide adipokinetic hormone is released during flight and causes lipid stored in *This work was supported by Grants HD-07951 and HL-18577 from the United States National Institutes of Health and from the Canadian National Sciences and Engineering Research Council. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked ''nduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequencefs) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession numberfs) 503888.
** Established Investigator of the American Heart Association.
$$ To whom correspondence should be addressed: Dept. of Biochemistry, University of Arizona, Tucson, AZ 85721.
the fat body to associate with lipophorin for transport to flight muscles. This loading of lipophorin with diacylglycerol results in a larger, less dense lipoprotein particle which contains a third apoprotein, apolipophorin-I11 (apoLp-111, M, -20,000) (Shapiro and Law, 1983; Van der Horst et al., 1984;Goldsworthy et al., 1985;. apoLp-I11 exists free in the hemolymph in resting animals and associates with lipophorin during lipid loading until each particle contains 16 molecules of apoLp-111 in M. sexta  and nine molecules (Chino and Yazawa, 1986) or 14 molecules (Van der Horst et al., 1988) of apoLp-I11 in L. rnigratoria. M. sezta apoLp-I11 is an asymmetric, lipid-binding protein (Kawooya et al., 1986) which contains no carbohydrate (Kawooya et al., 1984). Its role in lipid transport appears to be to increase the lipid carrying capacity of lipophorin by covering the expanding hydrophobic surface resulting from diacylglycerol uptake (Kawooya et al., 1986;Wells et al., 1987). Like mammalian apolipoproteins M . sexta apoLp-I11 is composed of tandemly repeated amino acid sequences with a high potential for forming amphipathic helical structures (Cole et al., 1987), which may be responsible for its lipid binding activity.
L. migratoria apoLp-I11 is a glycoprotein of M, = 19,000 and exists as three species that are indistinguishable in size and amino acid composition but differ in isoelectric point (Chino and Yazawa, 1986). apoLp-I11 from M. sexta and L. rnigratoria function equally well in an in vitro system for production of low density lipophorin (Van der Horst et al., 1988), yet the two proteins differ markedly in amino acid composition, and the L. migratoria protein contains 11% carbohydrate (Chino and Yazawa, 1986). In this report we describe the primary structure of L. rnigratoria preapoLp-I11 deduced from a cloned cDNA. Computer-assisted comparative sequence analyses of the orthologous (Fitch and Margoliash, 1970) M. sexta and L. migratoria apolipoproteins were undertaken to identify structural similarities which underlie their functional equivalence. In addition the availability of these apolipoprotein sequences allowed us to examine the extent and rates of sequence divergence between the two apoLp-I11 genes.
Purification and Analysis of apoLp-Ill Protein-Hemolymph from mature adult maIe locusts was diluted 1:2 with cold 0.1 M sodium phosphate, pH 7.0, 0.15 M NaC1, 5 mM l-phenyl-2-thiourea, 5 mM EDTA, and hemocytes were removed by centrifugation. Diluted cell-free hemolymph (21 ml) was heated in a boiling water bath for 10 min. Precipitated protein was removed by centrifugation (5000 X g, 10 min), and the supernatant was applied to a column (90 X 2.5 cm) of Sephacryl S-200 and eluted with 0.1 M sodium phosphate, pH 7.0. Fractions which contained apoLp-111 (detected by sodium dodecyl sulfate-polyacrylamide gel electrophoresis) were combined, dialyzed against 0.1 M acetic acid, and lyophilized, yielding 40 mg of protein.
Part of this preparation (3.5 mg) was further purified by reversed phase high performance liquid chromatogtaphy (pBondapak C-18, Waters, 25 X 0.5 cm). Samples were dissolved in 0.1% trifluoroacetic acid and eluted with a linear gradient of 0-64% acetonitrile in 0.1% trifluoroacetic acid (45 min, 1 ml/min). Fractions containing apoLp-111 were pooled and lyophilized, yielding 2.4 mg of protein. apoLp-I11 prepared by this method contained a single band when analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (Kanost et al., 1987). This preparation of apoLp-I11 was used as an antigen for production of antibodies in rabbits and for analysis of amino acid composition and NHz-terminal sequence. To determine amino acid composition, apoLp-111 was hydrolyzed in uacuo in 6 M HC1 at 110 "C for 24 h and analyzed on a Beckman model 12OC analyzer modified for single column analysis. NHz-terminal amino acid sequence was determined by automated Edman degradation as described by Flynn et at. (1983).
cDNA Cloning and Sequencing-RNA from fat body of adult female locusts was isolated as previously described (Chinzei et al., 1982), and poly(A)+ RNA was prepared by binding to oligo(dT)-cellulose (Aviv and Leder, 1972). Synthesis of cDNA, construction of a cDNA library in X-gtll, and screening of the library with anti-LocUsta-apoLp-111 antiserum were performed as described by Cole et al. (1987). Fivethousand recombinant plaques yielded eight positive clones. Four of these were purified to homogeneity, and the EcoRI insert of the largest clone (X-LmApoIII-8) and its restriction fragments were subcloned into pTZ18R and pTZ19R. Single-stranded DNA generated from the fl origin of replication of these plasmids (Dente et al., 1983) was sequenced by the dideoxy chain termination method (Sanger et aL, 1980), using modified T7 DNA polymerase (United States Biochemical) (Tabor and Richardson, 1987).
Computer-assisted Analysis of Sequencing Data-All calculations were carried out on a MicroVAX I1 (Digital Equipment Corp.) running the VMS operating system (version 4.4). Program AMPHI for the prediction of amphipathic helical segments was adhpted from the published FORTRAN source code (see Appendix to Margalit et al., 1986). Program LWL85 for estimating synonymous and nonsynonymous rates of nucleotide substitution (Li et al., 1985) was kindly provided by Dr. Wen-Hsiung Li (University of Texas Health Science Center at Houston).

RESULTS AND DISCUSSION
Nucleotide Sequence of apolp-111 cDNA-The strategy for sequencing the apoLp-I11 cDNA is shown in Fig. 1 and the nucleotide and deduced amino acid sequence in Fig. 2. The 679-base pair sequence of the apoLp-I11 cDNA clone contains an open reading frame beginning with an ATG codon at nucleotide 35 and extending to nucleotide 571, followed by a 98-base pair 3"untranslated sequence ending with a poly(A) tail (Fig. 2). The five nucleotides upstream from the ATG are CCGCC which match the consensus sequence for eukaryotic initiation sites (CCA/GCC) (Kozak, 1984).
ApoLp-III Amino Acid Sequence-The cDNA codes for a  179-amino acid polypeptide with a molecular weight of 19,142. This agrees well with an estimated molecular weight of 19,000 for apoLp-I11 translated in vitro and immunoprecipitated.' The NHz-terminal amino acid sequence of apoLp-I11 isolated from hemolymph begins with the 19th amino acid encoded by the cDNA sequence. The 32 residues determined by Edman degradation are in good agreement with the deduced sequence with the exceptions of residue 16 (discussed below) and residue 18, which was Thr in the cDNA and Ser'in the protein.

Y Q Q L P H T I Y H A A H E L H E T L E
When the site of cotranslational cleavage of the' signal peptide was predicted by the method of Von Heijne (1986) using the computer program SIGSEQ2 (Folz and Gordon, 1987) the highest probability for cleavage was after Ala-16 (residue -3 in Fig. 2), and the second highest was after Pro-18 (residue -1 in Fig. 2). The accuracy of this predictive scheme is -80% when the sites with the highest or second highest probability for cleavage are tested using in vitro cotranslational cleavage assays (Folz and Gordon, 1987). Thus, it is probable that the -1 position of preapoLp-III's signal peptide is occupied by a proline residue. During the course of von Heijne's initial assessment of the sites of cotranslational proteolytic processing of eukaryotic preproteins, it appeared that signal peptidase cleayage after a proline was prohibited (von Heijne, 1983). However, a few examples of processing after a proline have been recently noted. These include pheasant prelysozyme (Weisman et al., 1986) as well as mutant human apolipoprotein AI1 produced using oligodeoxynucleotide-directed site-specific mutagenesis where the pentapeptide prosegment had been removed and a proline introduced at the -1 position (Folz et al., 1988).
The deduced amino acid sequence of the mature prptein beginning at Asp (1) is a 161-residue polypeptide with a molecular weight of 17,229. The molecular weight of apolp-I11 isolated from hemolymph is approximately 19,000 (determined by sedimentation equilibrium) (Chino and Yazawa, 1986). The difference may be accounted for by the 11% carbohydrate content of the fully processed apoLp-I11 (Chino and Yazawa, 1986). The amino acid sequence contains two putative NHZ-linked glycosylation sites (Asn-X-Ser/Thr) M. R. Kanost and G . R. Wyatt, unpublished data.  (Bause, 1983) at residues 16 and 83. Residue 16 of the protein could not be determined by Edman degradation, which indicates that it is probably glycosylated.
The amino acid composition calculated from the deduced sequence is in good agreement with that determined by analysis of the protein (Table I). The sequence confirms that the mature protein lacks methionine, as noted by Chino and Yazawa (1986). This raises questions about the labeling of secreted apoLp-I11 with [35S]methionine reported by Izumi et al. (1987). We have purified a different locust hemolymph protein of M, = 19,000 which does contain methionine: which may explain this result.

Repeating Units and Potential Amphipathic Regions-We
have previously shown (Cole et al., 1987) that M. sexta apoLp-I11 is composed almost entirely of repeated tetradecapeptides with amphipathic helical potential and postulated that these peptides constitute the structural basis for lipid-binding activity as in the mammalian apolipoproteins (reviewed in Boguski et al., 1986a). It was of interest, therefore, to determine if the locust protein has a comparable periodic structure and to analyze sequence relationships between the two proteins. An optimized alignment between the locust and hornworm sequences was generated using the computer program ALIGN (Dayhoff et al., 19781, and the results are displayed in Fig. 3. The overall sequence identity is 29%, allowing for gaps representing hypothetical insertion and/or deletion mutations. This relatively low degree of sequence similarity was somewhat surprising, and its functional and evolutionary significance is discussed below. As in M. sexta apoLp-111, repeating units are present in the amino acid sequence of L. migratoria apoLp-111. This can be seen by comparison with the M. sexta sequence in which 12 repeated sequences, approximately 14 residues in length, have previously been defined (1abeledI-XZI in Fig. 3). The presence of repeated sequences in the locust protein was independently verified using the computer program RELATE as well as comparison matrix methods (Boguski et al., 1986a,  shown). The 12 repeating peptides that comprise most of the locust sequence (Fig. 3) are more variable in length than the hornworm peptides, so that no average repeat length can reasonably be defined. Patterns of residues corresponding to the first eight (most highly conserved) positions in the M. sexta tetradecapeptide repeat unit are identifiable in the locust sequence and recapitulate the following motif:  (Fig. 3, Table 11). However, the sequences of these repeating units are highly degenerate (see below). The repeating sequences of M. sexta apoLp-I11 have the potential to assume amphipathic, helical conformations (Cole et d., 1987). To investigate the structural potentials of the L. migratoria sequence, we have employed a new optimized algorithm for detecting amphipathic helical structures (Margalit et al., 1987;Cornette et al., 1987). This method, which has been used successfully to predict immunodominant sites for helper T-cell stimulation (Margalit et al., 1987), has a number of advantages over previous methods for associating amphipathic secondary structure in a protein with the sequence of hydrophobicity values of its residues (see Cornette et al., 1987). The method applies a power spectrum procedure (least squares fit of a sinusoid) on a sequence of overlapping blocks of hydrophobicity values and identifies those blocks for which the maximum intensity occurs at a frequency near 100 O , reflecting a periodicity of 3.6 residues/turn (the structure of an a-helix). The method then determines whether those sequence blocks, if arranged in a helical conformation, would be amphipathic. An amphipathic segment is considered stable if it contains at least three consecutive blocks of the right periodicity. Finally, the amphipathic character of the segment is quantified by combining the magnitudes of the intensity peaks in the power spectrum (amphipathic index) with the length of the segment to yield an amphipathic score (see Margalit et al., 1987 for a complete exposition of this method).
The amphipathic segments predicted by this approach in both M. sexta and L. migratoria apoLp-I11 are shown in Table   11. Six amphipathic helical segments of varying lengths are present in each protein. All of these segments incorporate at least some of the characteristics of the repeating units previously defined in the hornworm protein. However, there is not an exact correspondence between these repeat units and the predicted amphipathic segments (Fig. 3, Table 11). Furthermore, there is not an exact correspondence between the locust and hornworm sequences in terms of their predicted amphipathic regions (although considerable overlap does exist). In M. sexta apoLp-I11 long sequences with high amphipathic scores occur at the termini of the molecule, while in L. migratoria the segments with the highest amphipathic scores are located more centrally. This lack of correspondence is undoubtedly due to the low degree of sequence identity between the two proteins as well as the presence of multiple insertion and/or deletion mutations. One notable alignment of residues that are predicted to be within amphipathic regions involves repeat IX of M. sexta apoLp-I11 and repeat XI of the L. migratoria sequence (Fig. 3). These repeats are each exactly 14 residues long and are identical in six positions. All but one of the remaining positions are conservative substitutions. There are several other interesting features of the repeating elements present in the amphipathic segments. Although positions 1 and 4 maintain predominantly hydrophobic resi-\ " " " " -I " " " -I I " " " -I I " " " l ( " "

TABLE I1 Predicted amphipathic segments in insect apolipophorins-ZII
Amphipathic segments were predicted using a block length of 7 (which Corresponds to two turns of an a-helix) and the hydrophobicity scale of Fauchere and Pliska (1983). The first column indicates the midpoints of the predicted blocks, the sequences of which are displayed in column 2 along with three flanking residues. The numbers 1-8 correspond to the first eight positions of the tetradecapeptide repeat unit previously defined for M. serta apoLp-I11 (Cole et al. 1987). The range of angles includes both a (80-120") and 3-10 (105-135") helices. The amphipathic score is the sum of the amphipathic indices of the blocks. Sequence numbers refer to the mature proteins. Segments 9-16 and 70-87 of the L. migratoria sequence contain potential N-glycosylation sites. dues, there is considerable variability in other regions, particularly in the L. migratoria sequence. For example, 69% of the residues occupying positions 2 and 3 in the M. sexta segments are acidic whereas only 28% of the corresponding residues in the locust segments are acidic with most of the remainder being the uncharged amide derivatives (which nevertheless have similar relative hydrophobicity values). Seventy-five % of the residues in position 6 of the hornworm segments are lysine but none of the locust position 6 residues are occupied by this basic amino acid (although 2 of 8 residues include arginine and histidine). Lysine is thought to have special significance because in a large number of naturally occurring helical segments, it occurs as the ultimate or penultimate COOH-terminal residue much more frequently than would be expected by chance alone (Margalit et al., 1987).

Range of Amphipathic
Finally, the average amphipathic score for segments of M.  et al., 1988). These hybrid lipoproteins are functional in vitro in loading with diacylglycerol from locust fat body, and they can both be used as substrates Locust Apolipophorin-III Sequence by lipase from locust flight muscle. Thus, the two proteins must have some conserved features, perhaps intrasequence cooperative effects or a pattern of folding of amphipathic helices, which contribute to these functions. ApoLp-I11 from L. migratoria has recently been crystallized (Holden et al., 1988), and determination of its structure by x-ray crystallography will show which parts of the sequence actually form amphipathic helices. Site-directed mutagenesis of the L. migratoria and M. sextu cDNA clones should then provide a novel model system for examining experimentally the limits of sequence variation in predicted amphipathic helical structures. Molecular Evolution-As stated previously the degree of amino acid sequence identity between the locust and hornworm proteins is only 29% even with the introduction of gaps in the alignment to maximize the number of matching residues. Despite the fact that the majority of nonidentical residues represent conservative substitutions, this degree of sequence identity appears rather low compared with that of other orthologous proteins, including even the mammalian apolipoproteins, which are known to have evolved rapidly (Boguski et al., 1986a). Rat and human apoA-I, for instance, are 64% identical in their amino acid sequences. The biochemical properties of amphipathic sequences may not depend upon a precise sequence of amino acids but rather a particular spatial distribution of sets of interchangeable amino acids with similar chemical properties (Boguski et al., 1986b). Nevertheless, there must be some limits on sequence variation that preserve functional equivalence.
These issues prompted us to examine more closely the molecular evolution of these insect apolipophorins. The common ancestor of humans and rodents existed prior to the great mammalian radiation during the Cretaceous period approximately 75 million years ago, and the exopterygote and endopterygote insects are believed to have diverged during the Carboniferous period approximately 320 million years ago (Hennig, 1981), although the fossil record and divergence times for insects are less reliable than those for mammals. Thus, the insect and mammalian proteins may have evolved at similar rates but with the former having had a longer time to accumulate sequence changes. To test this hypothesis, we have used the method of Li et al. (1985) to estimate the rates of nucleotide substitution in the apoLp-I11 genes. This method has been used to study the rates of evolution in the mammalian apolipoprotein gene family (LUO et ul., 1986).
The method estimates the rates of both synonymous and nonsynonymous nucleotide substitution. Nucleotide substitutions are considered synonymous (phenotypically silent) if they result in no change in the amino acid coded for and nonsynonymous if they do. It is important to distinguish the two because it is known that the rate of synonymous substitution is generally much greater than that of nonsynonymous changes. The method yields two values, K, and KA, which represent the number of (synonymous) substitutions/synonymous site and the number of (nonsynonymous) substitutions/nonsynonymous site, respectively. Nucleotide substitution rates are obtained by dividing K, or KA by the time elapsed since divergence of the species.
K8, KA values, and respective rates for insect apoLp-111 and selected mammalian apolipoproteins are presented in  1.67 X 10-9 2.33 X 10-9 1.27 X 10-9 1.61 X lo-$   111,1.06, is considerably higher than the average K, for mammalian apolipoproteins. However, when this value is converted to a rate (1.66 X lo-') the result is that the insect genes are actually evolving nearly three times more slowly than the mammalian genes at their respective synonymous sites. Li et ai. (1985) point out that their method tends to give underestimates when the degree of sequence divergence is large, which is the case for the two apoLp-I11 sequences, and thus these rates should be interpreted with caution. One observation which may explain this result is the codon bias which is present in the apoLp-I11 sequences (Table IV). For most amino acids there is a bias toward the use of codons with C or G in the third position in both M. sexta and L. migratoria apoLp-111. This may reflect some selection pressure for these codons in synonymous mutations in apoLp-111, perhaps due to differences in tRNA availability (Ikemura 1985). It is known that in prokaryotes, highly expressed genes have very biased codon usage and low rates of synonymous substitutions (Sharp and Li, 1987). Perhaps selection at the level of translation has also occurred in the apoLp-I11 genes. The average KA for the mammalian apolipoprotein mRNA sequences is 0.26, which corresponds to an average rate of 1.73 X lo-' substitutions/nonsynonymous sitelyear. The average nonsynonymous rate for 35 mammalian proteins is 0.88 x lo-' (Li et al., 1985), indicating that the apolipoproteins are evolving nearly twice as rapidly as the average mammalian protein at nonsynonymous sites. The KA for apoLp-I11 (0.74) is about three times higher than the average Ka for the mammalian apolipoproteins, but when the differences in divergence times are factored in, the rate for apoLp-111 is about 33% slower than that of the mammalian apolipoproteins at nonsynonymous sites. Again, the rate of substitution in the insect genes may be underestimated due to their degree of sequence divergence. Perhaps, then, the insect and mammalian apolipoproteins are actually evolving at similar rates at nonsynonymous sites. Still, apoLp-I11 may be evolving somewhat more rapidly than the average mammalian gene.
ApoLp-111 is the first protein to have been sequenced in both L. migratoria and M. sexta. A large database of other orthologous sequences from these and other species will be essential for placing the rate of apolipophorin sequence change in the larger context of insect protein evolution.