The Major Protein from Lipid Bodies of Maize CHARACTERIZATION AND STRUCTURE BASED ON cDNA

In plant seeds, the storage triacylglycerol is packed in discrete particles called lipid bodies which consist of a lipid core surrounded by a phospholipid monolayer with embedded proteins. We have cloned and se- quenced a nearly full-length cDNA for the major protein (L3) associated with the lipid bodies of maize. The L3-cDNA clone was identified by hybrid-selected translation analysis and contains the complete 3’ non-coding region and an open reading frame of 432 nu- cleotides. This open reading frame encodes a polypeptide with amino acid composition, hydrophobicity, and predicted protease digestion pattern which correlate well with those of the authentic L3 protein. Analyses of predicted secondary structure and local hydropathy of the deduced amino acid sequence suggest three struc- tural domains in the protein. An internal domain of 72 contiguous hydrophobic or neutral amino acids is bounded at the amino-terminal side by a hydrophilic a-helix and on the carboxyl-terminal side by an am- phipathic a-helix. The data suggest that L3 is uniquely suited to interact with both lipid and phospholipid moieties of the lipid body. A simple model for the topology of on the lipid body is proposed. The unusual structure of the lipid body protein is discussed and compared to those of the two well-studied classes of lipid-associated proteins, and intrinsic membrane proteins.

In plant seeds, the triacylglycerol is packaged and stored in organelles called lipid bodies. These spherical lipid bodies are approximately 0.5 pm in diameter and are composed of a central core of triacylglycerol surrounded by a "half-unit" membrane consisting of a single layer of phospholipid and a small set of tightly associated proteins (1)(2)(3). The storage of seed lipids in many small subcellular organelles rather than in a single large lipid droplet/cell as seen in mammalian white adipose tissue may be related to the need for rapid mobilization of these reserves during seed germination and seedling growth. Partitioning a single large lipid sphere into many small ones greatly increases the surface area/volume of lipid. This increased surface area may render the seed lipid more accessible to degradative enzymes and facilitate its rapid mobilization. The proteins associated with lipid bodies and their role in lipid packaging and metabolism have not been well-studied.
We have been using the maize seed as a model system to study lipid bodies and their associated proteins (4-7). The * The work was supported by United States Department of Agriculture Grant 86 CRCR 12156. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. lipid bodies in maize are primarily confined to a specialized embryonic tissue called the scutellum, where they comprise 50% of the cell dry weight at seed maturity. Only four major polypeptides are associated with maize lipid bodies, one of higher M, (45,000) called H protein and three of lower M , (15,500,18,000, and 19,500) called L3, L2, and L1 proteins, respectively. The smallest of the L proteins (L3) is quite abundant, constituting nearly 30% of the total lipid body protein and approximately 1% of total protein in the scutella. The structure of the proteins and the nature of their association with one another and the lipid and phospholipid components of the lipid body is unknown.
We have cloned and sequenced a nearly full-length cDNA for L3, the major protein associated with the lipid bodies of maize. The deduced amino acid sequence was characterized according to local hydropathy and its secondary structure predicted by computer analysis. These analyses suggest that there are three structural domains in L3, and allow us to propose a simple model for the topology for L3 on the lipid body. The structure of the protein is unique but has features in common with both mammalian apolipoproteins and intrinsic membrane proteins.

RESULTS
Identification of L3-cDNA A maize cDNA library was constructed from maturing maize embryo RNA, and putative L3 clones were selected as described under "Experimental Procedures." The identity of the clones was determined by hybrid-selected translation analysis. Several clones specifically selected an mRNA which was translated into a polypeptide with the approximate M, of L3 (data shown for one of the clones, Fig. 1). The hybridselected translation product was immunoprecipitated with polyclonal antiserum specific to the L3 protein (Fig. 1). Little or no translatable mRNA was selected by nonrecombinant vector.
The nucleotide sequence of an L3-cDNA clone was determined by the dideoxynucleotide chain termination method. The sequencing strategy is shown in Fig. 2   3. The cDNA insert is 757 bp2 long excluding the poly(dA) tract and hybridizes to a maize scutellar mRNA approximately 1000 nucleotides long (Fig. 4A). These data are consistent with primer extension analysis which indicates that the cDNA is 86% complete, missing approximately 125 nucleotides at the 5' end (Fig. 4B). The GC content of the cDNA sequence is quite high, 62.6% overall and 71.6% in the coding region. The clone terminates in a poly(dA) tract preceded by a perfect polyadenylation signal (AATAAA) (20) located 13 bp upstream. Analysis of the period constraints (Fickett's statistic) of the cDNA was used to predict coding and noncoding regions independent of reading frame (21). This analysis suggests with 95% confidence that the first 400 bp of the cDNA clone are protein coding while the last 300 bp are noncoding. The L3-cDNA contains two long open reading frames, both of which contain this presumptive protein coding region. ORF-1 begins at nucleotide 1 and encodes 147 amino acids, while ORF-2 begins at nucleotide 3 and encodes 161 amino acids.
Two major lines of evidence indicate that ORF-1 encodes the L3 protein.
Codon Usage-1) The codon usage in ORF-1 is dramatically skewed (95.1%) in favor of C or G in the third or "wobble" position, a phenomenon which has been positively correlated with protein coding regions in prokaryotic and eukaryotic organisms (22). The codon usage for ORF-2 is not skewed. 2) Computer analyses indicate that ORF-1 uses preferred codons and does not use rare codons as determined from the combined codon usage tables of genes from related tissues in maize and other monocots (Ref. 23, data not shown). In contrast, ORF-2 uses rare codons and does not use preferred codons.
Protein Characteristics-1) The calculated amino acid composition of the ORF-1 deduced protein fits well with the previously reported (5), experimentally determined composition of L3 (Table 1). The amino acid composition of the ORF-2 deducedprotein is inconsistent with the L3 values, especially with respect to arginine, where it differs from L3 by 19 mol % (Table 1). 2) ORF-1 encodes a hydrophobic protein, in agreement with published experimentally determined hydrophobicity of L3 (5). ORF-2 encodes a hydrophilic protein.
3) The protease (trypsin, Staphylococcal V8 protease) digestion patterns of L3 are consistent with those predicted by the ORF-1 deduced amino acid sequence (data not shown). The digestion patterns predicted by ORF-2 deduced amino acid sequence are quite different and do not fit the experimentally determined L3 data.
We were unable to determine directly if the clone contains the entire protein coding region since the amino-terminal sequence of L3 is not known. Direct sequence analysis indicates that the protein is blocked at the amino terminus. However, the first ATG in ORF-1 is located within a very GC-rich area, the sequence context of which is not in good agreement with the proposed initiator codon consensus sequences of eukaryotes (24). Furthermore, the skewed codon usage pattern noted in the coding region is continued in the region upstream from the ATG. These data suggest the clone may not contain the entire coding region for the L3 protein.
An incomplete deduced amino acid sequence could account for minor discrepancies in the comparison of deduced and experimental amino acid composition.

Analysis of Deduced L3 Amino Acid Sequence
Local hydropathy of the deduced L3 amino acid sequence was analyzed by the quantitative method of Kyte and Doolittle (25), which plots the average hydropathy over a moving window of 7 amino acids. This analysis shows that the L3 protein may be divided into three domains based on the hydropathic characteristics of the molecule (Fig. 5). Both the amino-and carboxyl-terminal domains are largely hydrophilic. An internal region of 72 amino acids consists entirely of hydrophobic, weakly hydrophobic, or neutral amino acids and constitutes a large hydrophobic domain.
Chou and Fasman (26) rules for empirical prediction of protein secondary structure were used to determine the potential of L3 to form a-helical, p-sheet, and p-turn conformations. This analysis suggests that the amino acids at or near the amino terminus (amino acids 15-38) of the protein and those at the carboxyl terminus (amino acid 115-147) will form a-helices. Empirical prediction of secondary structure in strongly hydrophobic regions is unreliable due to the absence of an appropriate data base of known structures of membrane-associated proteins (27, 28).
When the carboxyl-terminal 33 amino acids are plotted according to their individual hydropathy, it can be seen that hydrophilic amino acids alternate with hydrophobic or neutral amino acids in a regular manner. This pattern of hydrophobicity was examined with respect to the a-helical structure of the region. The helix was visualized on a "helical wheel" (29) which shows the view looking down an a-helix with amino acids at the correct angular displacement from one another, and in a cylindrical plot (30) which shows the longitudinal arrangement of amino acids using a diagram of a cylinder split parallel to its axis and laid flat. The helical wheel representation of the carboxyl-terminal 18 amino acids is shown in Fig. 6A and the cylindrical plot visualization of the same region is shown in Fig. 6B. These analyses suggest that the L3 carboxyl-terminal a-helix is amphipathic, since hydrophobic amino acids are separated from hydrophilic amino acids on opposing sides of the helix.

Model for L3 Topology on Lipid Body
We suggest the following simple model for the topology of L3 on the lipid body based on predicted secondary structure and hydrophobicity analyses (Fig. 7). L3 is assumed to be a monomer since there is no data available on protein-protein interactions on the lipid body. The sequence at or near the amino terminus is predominantly hydrophilic and is predicted to form an a-helix starting at amino acid 15. This hydrophilic domain is expected to protrude from the lipid body surface into the cytosol. The central domain of the protein consists of 72 contiguous hydrophobic or neutral amino acids and is expected to penetrate the triacylglycerol core of the lipid body due to hydrophobic interactions. This hydrophobic domain is represented in a helical hairpin conformation for several reasons. The helical conformation (either a or 310) satisfies hydrogen-bonding requirements of the peptide backbone and represents the most energetically feasible conformation in a nonpolar environment (31). The division of the hydrophobic region into a pair of antiparallel helices connected by a bend is predicted in view of the presence of a central region of 12 amino acids containing 3 prolines which serve to interrupt the helical conformation and facilitate the turn. The carboxylterminal domain of L3 has certain features which render it suitable for interaction with surface phospholipids according to the amphipathic helix hypothesis (32). It is predicted to be an amphipathic a-helix containing opposing polar and nonpolar faces with positively charged amino acids occurring at the interface and negatively charged residues along the center of the polar face (Fig. 6). The helix may sit at the surface of the lipid body, its nonpolar face interacting with hydrophobic acyl tails of surrounding phospholipid molecules and its polar face orienting toward the cytosol. The specific distribution of charged amino acids in the helix allows interaction of positively charged amino acids with the phosphate group of the phospholipids and of negatively charged amino acids with positively charged groups on the phospholipids (e.g. choline of phosphatidylcholine).

Comparison with
Other Lipid-associated Proteins-The lipid body of maize is composed of triacylglycerol surrounded by a layer of phospholipid and a set of specific proteins. Similar lipid-phospholipid-protein complexes termed lipoproteins are present in the plasma of mammals. The lipoproteins play a number of roles, including lipid transport, direction of lipids to certain tissues, mediation of lipid accessibility to various enzymes, and regulation of enzyme activities (33-35). The protein components of lipoproteins, called apolipoproteins, are a group of related proteins which have been studied extensively. Many apolipoproteins are composed of tandem amino acid repeats which form the dominant structure of the proteins, an amphipathic a-helix with charge distributions suitable for interaction with phospholipids (36).
Since L3-protein may play a similar structural and/or functional role with the apolipoproteins, its structure was compared to that of the known apolipoproteins at both the amino acid and cDNA level. As a unique feature, the structure of L3 is dominated by a completely hydrophobic domain, which is unparalleled in the apolipoproteins, and does not show the internally repeated sequences which are so common in the apolipoproteins. On the other hand, L3 does contain an amphipathic a-helix at its carboxyl terminus, which is composed of 33 amino acids. If homology comparison is extended to include related amino acids, the L3 amphipathic a-helix shares 24.2% homology with a conserved common block of 33 amino acids encoded by a portion of exon 3 of most apolipoproteins (37). The homology at the above-mentioned regions of the polypeptides extends to some degree to the nucleic acid level. The L3 mRNA shares 33-46% homology with the conserved region of various apolipoprotein genes, as determined using the local homology algorithm of Smith et al. (38). However, these homologies may simply reflect the common structural constraints involved in producing an amphipathic helix.
The intrinsic membrane proteins form another class of lipid-associated proteins. Many of these proteins contain simple lipophilic transmembrane segments which serve as membrane anchors (31, 39, 40). The transmembrane domains are characteristically composed of 20-33 uncharged and largely nonpolar amino acids flanked on both sides by charged amino acids. The internal domain of L3 is quite similar to this characteristic simple transmembrane domain in that it is a large hydrophobic region bounded on either side by a charged amino acid. The major difference is the length of the hydrophobic region, which is unusually long in L3. Transmembrane domains of intrinsic membrane proteins are tailored to the width of the membrane. In contrast, proteins associated with the lipid body need only penetrate a single layer of phospholipid into a totally hydrophobic environment.
Possible Function of L3-polypeptide-Although the exact function of L3 is unknown, the unusual structure of the protein and its abundance on the lipid body suggest that it may serve a "structural" role to stabilize the lipid body. Thus, the phospholipid coat and associated proteins serve to package the stored lipid in small discrete particles, providing maximal surface area such that the lipid can be rapidly degraded during germination.
It is also possible that L3 itself, or in association with one or more of the other lipid body proteins, serves functions other than or in addition to this structural "packaging" func-tion. The lipids are hydrolyzed during germination by the enzyme lipase, which is synthesized de novo on free polysomes and attaches itself specifically to the surface of the lipid body (6). The lipid body proteins, singly or in association, may act as the recognition signal for the specific binding of lipase to the lipid bodies. The RNA was base hydrolyzed. and the cDNA was made double stranded by reactlon with Klenow (large fragment) primed by formation Of a halrpln loop In the CDNA Structure (8). then further extended using reverse transcrlptase. The halrpin loop was Cut wlth S I nuclease and the double stranded cDNA (ds =DNA) polished by reaction ulth Klenow (large fragment) as previously described (9) Standard sequenclng gels were sometlmes supplemented with 25-35% formmide to increase denaturation Of the DNA products and to overcome a problem with gel cornprenslons due to the hlgh GC content of the sequence.