The Complete cDNA and Amino Acid Sequence of Human Apolipoprotein B- 100”

We have determined the complete sequence of apoli- poprotein (apo) B-100 cDNA. It is 14.1 kilobases in length and codes for a 4563-amino acid protein, in- cluding a 27-amino acid signal peptide and a 4536-amino acid mature protein. Further, we identified 2366 residues of apoB-100 by direct sequence analysis of apoB-100 tryptic peptides. The mature peptide is characterized by high hydrophobicity (0.916 kcallres- idue) and predicted &sheet content (21%). Dot matrix analysis revealed the presence of many long internal repeats in apoB-100. The mature peptide contains 25 cysteine residues, 12 of which are in the N-terminal 500 residues. Twenty potential N-linked glycosylation sites were identified, of which 13 were proven to be glycosylated, and 4 were found not to be glycosylated by direct analysis of tryptic peptides. Our findings on apoB structure provide a basis for future experimen- tation on the role of apoB-100-containing lipoproteins in atherosclerosis.


MATERIALS AND METHODS
ApoB-100 cDNA Cloning and Sequencing-The complete cDNA sequence of apoB-100 was obtained from sequences determined on 30 overlapped cDNA clones in hgtll (5). The clones were identified by antibody screening (5,6) or by oligonucleotide screening (7). The sequence was determined on both strands in entirety by the method of Sanger et al. (8); both M13 primers and synthetic oligonucleotide primers were used in the sequencing.
Sequencing of ApoB-100 Peptides-LDL was purified from the plasma of a patient with familial heterozygous type I1 hyperlipoproteinemia (9). It contained apoB-100 as its sole protein component. It was reduced and alkylated before tryptic digestion. The peptide mixture was fractionated on a 2.6 X 150-cm Sephadex G-50 column. The void volume fractions were delipidated and redigested with trypsin. The other fractions from the G-50 column, together with the redigested void volume fraction, were purified by high performance liquid chromatography on a Vydac CIS reverse-phase column (4.6 X 250 mm) (10). Some of the peptides were pure at this stage. Those that were still mixtures were refractionated on a Shandon Hypersil ODS 5-pm reverse-phase column (11). The individual peptides were collected and sequenced either by a manual modified Edman degradation method (12) or by an automated gas-phase sequencer (13). When a step produced no detectable amino acid and the following two steps agreed with the sequence rule for glycosylated asparagine residues of Asn-X-Thr/Ser (where X could be any amino acid except proline), the site was assumed to be glycosylated provided that amino acid analysis of the peptide showed the presence of aspartic acid derived from the missing asparagine residue.

RESULTS AND DISCUSSION
The complete sequence of human apoB-100 cDNA (Fig. 1, Appendix) covers 14,070 bp, plus the poly(A) tail. It includes a 5' untranslated region of 78 bp, a coding region of 13,689 bp, and a 3' untranslated region of 303 bp preceding the poly(A). A putative polyadenylation signal, AATAAA, is located 22 bp upstream from the poly(A). ApoB-100 mRNA is thus one of the largest eukaryotic mRNAs known.
The complete amino acid sequence of apoB-100 is shown in Fig. 1. There are 4563 amino acids in apoB-100, including a 27-amino acid signal peptide and a 4536-amino acid mature peptide. Of the latter, 2366 residues have been confirmed by direct sequencing of apoB-100 peptides (see Fig. 1). ApoB-100 is thus one of the largest monomeric proteins known, the calculated molecular weight being 512,937 daltons for the mature protein. The protein is characterized by high hydrophobicity; the average value of 0.916 kcal/residue is considerably higher than the corresponding values of 0.718, 0.772, 0.806, 0.863, 0.825, 0.838, and 0.752 kcal/residue for apoE, apoA-IV, apoA-I, apoA-11, apoC-I, apoC-11, and apoC-111, respectively, but lower than that of integral membrane proteins like rhodopsin (1.008 kcal/residue) (14). Fig. 2 is a hydrophobicity plot (15) of apoB-100. The sequence is generally quite hydrophobic. However, long stretches of exclusively hydrophobic residues seen in some membrane spanning peptides have not been found in the sequence (except the signal peptide region, not included in Fig. 2). Nevertheless, subdomains can be identified on the plot that have higher (or lesser) degrees of hydrophobicity. In general, the carboxyl-terminal region is considerably more hydrophobic than the rest of the molecule, residues 4131 + 4536 having an average hydrophobicity of 1.045 kcal/residue. Secondary structure analysis (16) of apoB-100 primary sequence indicates that it contains 43, 21, and 20% &-helical, p-sheet, and random structures, and 16% p-turns, respectively. The high content of predicted p-sheets in apoB-100 is consistent with previous experimental observations (17, 18). A notable feature in the sequence is the highly uneven distri- bution of Cys residues (Fig. 2). Seven of these Cys are in close proximity to each other, being separated from their nearest neighbor by 10 residues or less (Cys-51, -61, -70; -358, -363; and -939, -949 respectively). Furthermore, of the 25 CYS in apoB-100, 12 are located in the N-terminal 500 residues. These and other Cys residues undoubtedly play crucial roles in maintaining the conformation of apoB in apoB-containing lipoproteins. They may also be involved in disulfide linkages with apo(a) in lp(a), lipoprotein particles highly correlated with cardiovascular diseases (19). Twenty potential N-linked glycoslyation sites are predicted from the sequence (Fig. 2). Of these, 13 were found to be glycosylated and four unglycosylated by direct sequence analysis. Of particular interest is ,a cluster of six potential sites between residues 3050 and 3450, all of which were found to be glycosylated. These glycosylated sites might be potential antigenic determinants to some anti-apoB-100 sera.
On the basis of a partial cDNA sequence, Knott et al. (20) identified a region of apoB that shows significant homology to apoE (140-150), the putative LDL receptor binding domain. We identified this region of apoB as residues (3345-3381) and have preliminary evidence that it binds to the LDL receptor. 2 Since internal repeats were identified in all the other apolipoproteins (21-23), we analyzed the apoB-100 sequence for such repeats by dot matrix analysis (data not shown). There seems to be many long internal repeats. First, the segment of 92 amino acids from residues 1761-1852 have a 25% homology (identity) with the segment from residues 1906-1997. Second, the segment 1882-2000 (119 residues) has a 23% homology with the segment 2988-3106. Third, the segment 2358-2428 (71 residues) has a 25% homology with the segment 4207-4277. Fourth, the segment 3215-3424 (210 residues) has an 18% homology with the segment 3699-3908. Fifth, the segment 4057-4129 (73 residues) has a 26% homology with the segment 4316-4388. In addition, there are many shorter repeats. For example, the segment 1604-1651 (48 residues) has a 29% homology with the segment 2741-2788; the segment 1532-1568 (37 residues) has a 32% homology with the segment 2831-2867; a segment of 16 residues starting at 2333 is repeated four times, and many repeats of about 28 residues lie between residues 1621 and 1872. Thus, as in the other apolipoproteins (23), apoB-100 also contains internal repeats. However, in contrast to those in the other apolipoproteins, the repeats in apoB-100 do not show any regular pattern and most of them do not lie consecutively.
All the known apolipoproteins appear to share a common block of 33 amino acids that comprises three repeats of 11 residues, each repeat starting with a hydrophobic residue (23). Since this block is conserved in evolution and may play an important structural role, we have used the human apoA-I segment (residues 11-43) to search for homologous segments in apoB-100. We found that segments 581-613, 1318-1350, 1369-1401, 2862-2887, 4043-4075, and 4151-4183 have, respectively, a 30, 24, 24, 27, 24 and 24% similarity with the apoA-I segment. Whether any of these segments are true homologs of the latter can be tested by studying the genomic structure of apoB-100 because the codon encoding the last amino acid of the common block is invariably split by an intron between the second and third positions (23). Of particular interest among the above segments in apoB-100 are the segments 581-613 and 1318-1350, because, like the common block in other apolipoproteins, they can be divided into three repeats of 11 residues, each starting with a hydrophobic residue. CCTCA~IACCCMIGCMC~CCICAC~CCMCCAGCIAICC~CCTICCAICACTCCCAMCCACACICCAMITACMC~CICM~ITCATITICMG~ICCACMCTCICAMCCCIMCAITMICCGCICCCTCTC 8550

A D l C t i C I I S A N E A C l A A S l I A K C E S K L E V L N F D i Q A N A~L S N P K I N P L A L 2797
MCGffiTUG~MCTICTCUCCMCIACCICffiMC66~UT~ACTGAMICCICTTTTTTCGAMIGCIATIGACGC~ICAMCACACIGCCMCTTIACACACAC~IACACICCAGCTIACIMICCACTGATT 8700