Human Apolipoprotein E THE COMPLETE AMINO ACID SEQUENCE*

The amino acid sequence of human apolipoprotein E (apo-E) has been determined. Apo-E2, one of the major isoforms of apolipoprotein E, is a polypeptide of 299 amino acids having a calculated molecular weight of 34,145. The isoform apo-E3 differs from apo-E2 by a single amino acid substitution of arginine for cysteine at residue 158 (italicized). The sequqnce of apo-E2 is:

Consideration of the primary and secondary structure provides information related to two known functions of apo-E, lipid binding and apo-B,E cell surface receptor interaction. There are a series of amphipathic helices in the carboxyl-terminal third of the polypeptide chain which may represent the lipid-binding site(s) for apo-E. In addition, a sequence enriched in lysine and arginine residues, which includes the site of amino acid substitution in apo-E2 and apo-E3, appears to be the region of the molecule involved in the interaction of E-containing lipoproteins with specific lipoprotein receptors on cell membranes.
* A brief account of this work was presented at the American Heart Association's 54th Scientific Sessions, Dallas, Texas, November [16][17][18][19]1981. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Apolipoprotein E is a glycoprotein found in several classes of lipoproteins in both humans and other animals (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11). Plasma concentrations of apo-E' increase in response to increases in dietary cholesterol in several animal models, suggesting that apo-E is involved with cholesterol metabolism (12). Apo-E, like apolipoprotein B, can interact with the apo-B,E cell surface receptors of cultured fibroblasts and smooth muscle cells. Lipoproteins containing either apo-E or apo-B can regulate the intracellular metabolism of cholesterol in cells possessing the apo-B,E receptors (13,14). In addition, it has been postulated that apo-E is important in the uptake of chylomicron remnants, with apo-E serving as the recognition signal for remnant uptake by the liver (15,16). Recently, a liver receptor that interacts with apo-E, but not with apo-B of normal low density lipoprotein, has been described, and it has been suggested that this receptor is responsible for chylomicron remnant uptake (17, 18).
The genetic mode of inheritance of the human E apoprotein has been extensively studied and has provided insight into the role of apo-E both in normal subjects and in those with lipid abnormalities (19)(20)(21)(22). Human apo-E exists as three major isoforms (E2, E3, and E4) and several minor isoforms. It has been proposed that the biosynthesis of apo-E is under the control of three independent alleles at a single genetic locus with each allele coding for one of the major isoforms and that the minor isoforms arise from post-translational modification of the major isoforms (21,22). As a result, three homozygous (E2/E2, E3/E3, and E4/E4) and three heterozygous (E2/E3, E2/E4, and E3/E4) states would be predicted. Consistent with this genetic model, we have recently demonstrated that the three major isoforms have different amino acid compositions, establishing that the genetic influence is at the level of the apo-E structural gene (23). Based on amino acid composition and partial sequence data, we proposed that cysteine-arginine interchanges at two substitution sites accounted for the known charge differences between the major apo-E isoforms (23). As a result of these interchanges, apo-E2 has 2 cysteine residues per mol, apo-E3 has 1 cysteine, and apo-E4 has no cysteine. In addition, we demonstrated that the minor isoforms from homozygous subjects have the same cysteine content as the major isoform, establishing the structural relation between the minor isoforms and the major isoform (23).
A human pathological lipid disorder, called type I11 hyperlipoproteinemia or dysbetalipoproteinemia (24), is associated with the abnormal occurrence of only E2 as the major apo-E isoform (19,25). Subjects with this disease are homozygous for apo-E2 (21,22). It has been suggested that remnant

4172
Human Apolipoprotein E Sequence lipoproteins accumulate in these individuals because the interaction of apo-E2 with lipoprotein receptors is defective (26,27). In addition, we have demonstrated that apo-E2 was deficient, compared with apo-E3 or apo-E4, in binding to fibroblast receptors. We also showed that this deficiency was due, at least in part, to the single amino acid substitution (a cysteine for arginine) and that the deficient binding of apo-E2 could be partially corrected by converting the cysteine at the substitution site to a charged lysine analogue (28).
In light of the central role that the E apoprotein plays in the interaction between lipoproteins and receptors and of the association of apo-E2 with abnormal lipid metabolism in type I11 subjects, we initiated a program to determine the amino acid sequence of two forms of the human apo-E. The sequence strategy involved selecting subjects homozygous for either the E2 (type 111) or the E3 forms of apo-E. The apo-E from these subjects was the sole source of material for the two independent sequence determinations. In addition, apo-E preparations used in the sequence studies included the minor sialylated isoforms. In this report, we present the complete amino acid sequence of apo-E2 and the comparative sequence of apo-E3. These data c o n f i i that the two proteins differ by only a single arginine-cysteine substitution. Consideration of the primary and secondary structure of apo-E reveals that there are specific regions apparently responsible for the two known functions of apo-E, ie., specific cell receptor binding and lipid binding.

DISCUSSION
The construction of the total sequence of apo-E2 is shown in Fig, 1. Peptide CB8 is the only CNBr peptide lacking homoserine, and it has carboxyl-terminal histidine, as does the whole protein; therefore, CB8 must be the carboxyl-termind peptide. Peptide T37 contains the carboxyl-terminal sequence of CB7 and the NH2-terminal sequence of CB8; therefore, this places CB7 before CB8 in the sequence. Peptide T30 contains the entire sequence of CB6 and the NHP-terminal sequence of CB7; therefore, this places CB6 before CB7. Peptide T30 is the only tryptic peptide which begins with methionine, and CB5 is the only CNBr peptide which has a basic residue penultimate to homoserine; therefore, CB5 must overlap with T30 and precede CB6 in the sequence. Peptide T6 contains the entire sequence of CB2 and terminates in the sequence -Met-Lys. Lysine is the,NHz-terminal amino acid of both the whole protein (and CB1) and CB3. Thus, CB2 is placed before CB3 in the sequence. The NHz-terminal sequence of peptide T6 does not correspond to any of the carboxyl-terminal sequences of any of the CNBr peptides subjected to sequence analysis, and since the accounting is complete for all of the methionine residues, peptide T6 perforce must overlap with the NHn-terminal CNBr peptide, CB1. The ordering of the tryptic peptides in the carboxylterminal half of peptide CB5 was accomplished with chymotryptic peptides derived from CB5, as shown in Fig. 1. The ordering of tryptic peptides within CB1 was done on the basis of the sequence determined on intact apo-E2.
The sequence proposed in Fig.  1 agrees well with the the range of values determined by other methods for human apo-E (2,11,19,22). Apo-EB is a single polypeptide of the same length, differing from apo-E2 by a single substitution of arginine for cysteine at residue 158. As previously reported (23), this substitution accounts for the charge difference between apo-E2 and apo-E3 seen on isoelectric focusing gels. Apo-E4 (the sequence has not yet been determined) lacks cysteine altogether, and it is likely that it differs from apo-E3 by having an arginine rather than a cysteine at residue 112 (23). The structure of apo-E reveals no evidence for repeated sequences or domains. The longest repeating sequences are 4 residues, -Pro-Leu-Val-Glu-(residues 183-186 and 267-270), and -Leu-Glu-Glu-Gln-(residues 78-81 and 243-246). There are no remarkable distributions of amino acids in the polypeptide chain, but there are a few tendencies. There is a high incidence of basic residues separated by one amino acid (this occurs 11 times, most of them in the carboxyl-terminal half of the molecule). The only stretches of consecutive basic amino acids are -Arg-Lys-(residues 142-143) and -Arg-Lys-Arg-(residues 145-147) in apo-E2 with an additional one occurring in apo-E3 (-Lys-Arg-, residues 157-158). There is a high incidence of basic residues either preceded or followed by a leucine residue (this occurs 21 times in apo-E2, 22 times in apo-E3). The basic amino acids are more abundant in the center of the molecule compared with the termini, while the opposite is true for the hydroxy amino acids. Of the aromatic amino acids, 6 of 7 tryptophan residues and all 3 phenylalanine residues are located within 45 residues of either terminus.
Application of the Chou-Fasman rules (29,30) for predicting secondary structure predicts that apo-E has a high degree of ordered ~tructure.~ There are 11 segments with predicted ahelical structure involving 62% of the amino acids. There are 8 p-turns predicted (11% of the amino acids) and only 3 short segments of P-sheet predicted (9%). The remainder is predicted to be random coil, with most of it occurring in one continuous segment from residues 164-202.
There are 5 segments of predicted a-helical structure that satisfy some or all of the requirements for an amphipathic helix. Amphipathic helices have been implicated as the structures in apolipoproteins responsible for their lipid-binding capability, and such structures have been observed in all of the lipid-binding apolipoproteins that have been sequenced (31)(32)(33). In apo-E, residues 203-221, 226-243, and 245-266 have the characteristics of predicted a-helical structure and both a hydrophobic and a hydrophilic face. The hydrophilic face has its positively charged amino acids distributed at the hydrophobic-hydrophilic boundary and its negatively charged amino acids in the center of the hydrophilic face. These three segments contain 4, 5, and 3 ion pairs, respectively. Fig. 2 shows a space-fiiing model of one of these segments. In addition, residues 60-78 and 268-285 have predicted a-helical character and contain a hydrophobic and hydrophilic face, but the distribution of the charged amino acids does not follow the above mentioned pattern. It is of interest that 4 of the segments with an amphipathic helix reside within the carboxyl-terminal third of the polypeptide chain, and it is quite likely that one or more of these structures specify the site(.%) of interaction between apo-E and lipid in the lipoprotein particle.
An interesting segment of the sequence, apparently the site Human Apolipoprotein E Sequence 4113 " " " -~~ " " " " " " " " " CB1 2 u " " " -     involved in the interaction of apo-E with cell receptors, occurs in the region bounded by Tyr-118 and Tyr-162. Residues 133-150 within this region are predicted to have an a-helical structure in which 6 leucines form a hydrophobic face and in which all 8 basic residues are either opposite to or at the boundary of the hydrophobic face. At each end of this helix, the polypeptide chain has a predicted @urn followed by a short segment of p-structure. The site of the single amino acid difference between apo-E3 and apo-E2 resides within one of these short p-structures. Apo-E2, which is defective in receptor binding, contains a cysteine residue substituted for arginine. Previously, it has been shown that the selective chemical modification of a limited number of lysine or arginine residues prevents apo-E from binding to the apo-B,E receptors (34,35). It can be reasonably speculated that the substitution of cysteine for arginine at position 158 in the apo-E may be at least partially responsible for the defective binding of apo-E2. A portion of the receptor-binding activity can be restored by inserting a positive charge at residue 158 (28). Both the amino acid substitution (involving the loss of an arginine residue in this region of apo-E2) and the abundance of arginine and lysine residues in this segment strongly implicate this region as a portion of the apo-E molecule crucial to cell receptor-binding activity. Further attempts to identify the binding site are continuing in our laboratory.

GLU-GLU-LEU-ARG-VAL-ARG-LEU-ALA-SER-HIS-LEU-ARG-LYS-LEU-ARG-LYS-ARG-LEU-LEU-ARG-ASP-ALA-ASP-ASP-LEU-GLN-
The attachment site(s) of the carbohydrate moiety of apo-E is unknown at this time. Linkage via an N-glycosidic bond is unlikely, since there is only one asparagine in the molecule (residue 298). This residue does not satisfy the general requirement for the formation of an N-glycosidic linkage, ie. a sequence composed of Asn-X-Thr(Ser) (36). Furthermore, the tryptic and cyanogen bromide peptides which contain this asparagine do not behave abnormally in any way. The only tryptic peptide of apo-E which behaves abnormally is peptide T 5 (see "Results"). This peptide contains 3 serine and 2 threonine residues and lacks asparagine/aspartic acid. It is, therefore, more probable that the carbohydrate is attached via an 0-glycosidic linkage to a site or sites in this peptide (residues 42, 44, 53, 54, and/or 57).