Sequence of human asialoglycoprotein receptor cDNA. An internal signal sequence for membrane insertion.

A cDNA library from the human hepatoma cell line Hep G2 was prepared in the expression vector lambda gt11. Using specific antibodies, a cDNA clone containing the entire coding sequence for the human asialoglycoprotein receptor was isolated and sequenced. The deduced amino acid sequence of 291 residues is very homologous to the sequence of the major asialoglycoprotein receptor protein from rat. The comparison shows that there is no significant post-translational processing and no leader sequence, cleaved or uncleaved, at the amino terminus. An internal signal sequence, probably the membrane-spanning segment, residues 41-59, is assumed to direct insertion of the carboxyl-terminal ligand binding portion of the receptor across the endoplasmic reticulum membrane.

Partially deglycosylated plasma glycoproteins are efficiently and specifically removed from the circulation by a receptor-mediated process (1). In mammals the asialoglycoprotein receptor (ASGP-R'), specific for desialylated (galactosyl-terminal) glycoproteins, is expressed exclusively in hepatic parenchymal cells. Following ligand binding to this cell surface receptor, the receptor-ligand complex is internalized and transported by a series of membrane vesicles and tubules to an acidic sorting organelle (CURL) where receptor and ligand dissociate (2). The receptor returns to the cell surface, while the ligand is transported to lysosomes where it is degraded (3-5). The ASGP-R from rabbit, rat, and human liver have been purified (6)(7)(8). The complete amino acid sequences of the major rat receptor protein, R-l(9), and of an analogous, chicken N-acetylglucosamine-specific receptor (10) have been determined, as has a partial sequence of a second rat ASGP-R, R-2/3 (9). Many of the functional studies, however, describing the kinetics of ligand binding, internalization, and recycling of the receptor, as well as its biosynthesis, have been performed on the human hepatoma cell line Hep G2 (4, 11,12). Taking advantage of this model system for receptormediated endocytosis, we have cloned and sequenced the cDNA encoding the human ASGP-R. cDNA Libraries-The maintenance of the human hepatoma cell line Hep G2 has been described earlier (11). Poly(A)+ RNA was isolated by the guanidinium isothiocyanate/CsCl gradient method (13) and by chromatography on oligo(dT)-cellulose (14). cDNA was synthesized, essentially following the procedures described in Ref. 14, using oligo(dT) as a primer for reverse transcription. After secondstrand synthesis using the Klenow fragment of DNA polymerase I, the cDNA was treated with S1 nuclease, filled-in using Klenow fragment, blunt-end ligated to synthetic EcoRI linkers, and digested with EcoRI. After removal of excess linkers by gel filtration on Sephadex G-100 the cDNA was ligated into EcoRI-cut and phosphatase-treated X gtll DNA (15). For the preparation of a second cDNA library with longer inserts the second cDNA strand was synthesized using RNase H and DNA polymerase I as described in Ref. 16. Consequently, S1 nuclease digestion and the fill-in reaction were omitted. Furthermore, endogenous EcoRI sites were modified by

Sequence of the Human ASGP Receptor cDNA
EcoRI methylase before ligation to the linkers. Finally, the cDNA was fractionated by electrophoresis on an agarose gel, and the fraction >600 base pairs (bp) was electroeluted and ligated into the vector DNA. After in vitro packaging of the recombinant DNA into X particles (14), the resulting libraries were amplified and screened as described in Ref. 17 using an affinity purified polyclonal rabbit antihuman ASGP-R antibody (12) and '"1-iodinated protein A. Sequence Analysis-cDNA clone A21, subcloned into the plasmid pUC13 (18), was progressively digested by Ba131 exonuclease from either end (19). A large number of thus deleted cDNAs were subcloned into M13 and sequenced by a modified Sanger "dideoxy" procedure (20,21). The sequence data were analyzedusing the computer-assisted method by Staden (22).

RESULTS AND DISCUSSION
In contrast to the rat and rabbit receptor, the human ASGP-R migrates on sodium dodecyl sulfate-polyacrylamide gels as a single species of M , = 46,000. In the presence of tunicamycin an unglycosylated polypeptide of approximately M , 34,000 is synthesized. In Hep G2 cells approximately 0.01% of the [35S] methionine incorporated into newly synthesized protein was in the ASGP-R (12). Thus a rough estimate for the frequency of the ASGP-R mRNA is 1 in 10,000.
Double-stranded cDNA was sythesized from poly(A)+ RNA from Hep G2 cells and ligated into the bacteriophage expression vector X gtll (15). About 2 million recombinant phage were screened for expression of P-galactosidase-ASGP-R hybrid proteins in infected host bacteria using an affinity purified polyclonal rabbit anti-human ASGP-R antibody (12) and lZ5I-iodinatedprotein A. Approximately 1 out of every 100,000 recombinant phage scored positive, consistent with the expected frequency of ASGP-R cDNA inserted in the correct orientation and reading frame. The majority of these clones hybridized to one another. Clone A12, at 450 bp the largest, was used to probe blots of gels of poly(A)+ RNA from Hep G2 cells, and also from HeLa cells which do not express the ASGP-R (Fig. 1). A single mRNA of 1500-1600 bp, sufficient to encode the ASGP-R, was detected. This RNA was absent from HeLa cells, as expected. A rat a-tubulin clone hybridized to a single 1800-bp species in both RNA preparations, a result demonstrating the integrity of the RNA.
Using A12 as a probe, a second X gtll library with longer cDNA inserts (see "Experimental Procedures") was screened by plaque hybridization, and several clones larger than 1 kilobase were thus isolated. The largest one, A21, was entirely sequenced and contained the full coding sequence of the ASGP-R (Fig. 2). In Fig. 3 the deduced amino acid sequence is compared to the sequences of the rat ASGP receptors.
The 1277 bases of the cDNA contain only one large open reading frame of 933 bases (bases -60 to 873 in Fig. 2). Beginning at the first ATG methionine codon (position 1) the coding region comprises 873 bases. This is preceded by 172 noncoding bases at the 5'-end and followed by 232 noncoding bases at the 3'-end. Clone A21 does not extend into the poly(A) region at the 3'-end of the mRNA, and we do not know if any noncoding bases are missing from the 5'-end.

R-2/3
The corresponding polypeptide of 291 residues would have a M , of 33,122 which agrees well with the experimental value of 34,000 for the unglycosylated receptor as determined on SDS-gels (12).
The rat R-1 and the human ASGP receptors are strikingly homologous; 79% of the residues are identical. (For comparison, the human and mouse &globin sequences have 82% identical residues (23).) The rat R-1 sequence lacks the initiator methionine, 6 residues at the C terminus, and 1 amino acid a t position 15 in the human sequence. Both receptor sequences consist of two hydrophilic domains (residues 1-40 and 60-291) separated by a very hydrophobic, in all likelihood membrane-spanning, segment (residues 41-59). The homology extends t,hroughout the whole sequence with 75 and 79% identical residues in the N-terminal and the C-terminal domains, respectively. It is noteworthy that the hydrophobic region is also very well conserved, with 16 identical residues out of 19 and 3 conservative replacements (Leu-Phe at residue 42, Leu-Ile at 50, and Cys-Ser at position 57). This might indicate that this segment serves a more specific function than just a hydrophobic membrane anchor (see below). Although only a partial sequence of R-2/3 is available, the human ASGP-R appears more homologous to R-1 than to R-2/3 (Fig. 3).
Of the 3 sites for N-linked glycosylation (Asn-X-Thr/Ser) in the rat R-1 sequence (positions 76, 79, and 147 in Fig. 3 ) only 2 are conserved in the human sequence (positions 79 and 147). In place of asparagine 76 in the rat R-1 sequence, a threonine is found in the human one. Experiments on the biosynthesis of the human receptor in the presence of inhibitors of glycosylation as well as with endoglycosidases confirm the presence of two N-linked oligosaccharides (12, 24). Since N-linked carbohydrates are invariably found on the extracytoplasmic segments of membrane proteins (25), it was concluded that the carboxyl terminus of the rat receptor faces the outside of the cell and contains the ligand binding site (9). The same reasoning would apply to the human receptor. Furthermore, the human ASGP-R is phosphorylated at a serine residue whose position, however, is not known.' Since phosphorylation is a cytoplasmic activity, this observation and the finding that the N-terminal domain of the chicken hepatic lectin is phosphorylated at Ser 7 (26) strongly suggest that the amino terminus of the ASGP-R faces the cytoplasm. Thus the transmembrane orientation of the ASGP-R, carboxyl terminus outside, amino terminus inside, would be opposite to that of such transmembrane proteins as the vesicular stomatitis virus G protein, glycophorin, or the class I MHC proteins (27).
Our most significant finding is that the comparison of the coding sequence of the human ASGP-R with the sequence of * Schwartz, A. L. (1985) Biochem. J., in press. the mature rat protein indicates that there is no significant post-translational proteolytic processing either at the aminoor carboxyl terminus. There is clearly no hydrophobic leader sequence, cleaved or otherwise, at the N terminus such as the kind described for a large number of other membrane or secretory proteins (28), including the low-density lipoprotein receptor (29, 30). These sequences initiate membrane insertion of the nascent polypeptide. Unlike these proteins, the amino terminus of the ASGP-R is not extruded through the membrane. A similar situation obtains for the transferrin receptor (36) and the invariant chain of the HLA-DR protein (31), which have a transmembrane orientation the same as the ASGP-R. In the case of the red blood cell protein Band 111, a long (450 residues) amino-terminal hydrophilic segment faces the cytoplasm, and the carboxyl terminus is inserted into the membrane. An internal, presumably hydrophobic sequence functions as a signal for insertion of the nascent protein into the rough endoplasmic reticulum (32). As judged by the synthesis and modifications of its N-linked oligosaccharides, the ASGP-R is also inserted into the rough endoplasmic reticulum and matures through the Golgi en route to the plasma membrane (12). We suggest that the well conserved membrane-spanning sequence might function as an internal signal sequence, and initiate the transport of the carboxyl terminus of the nascent ASGP-R protein across the endoplasmic reticulum membrane. As proposed by Engelman and Steitz (33), this sequence might form half of a helical hairpin hydrophobic enough to insert into and span the membrane, and initiate subsequent extrusion of the C-terminal portion of the polypeptide across the bilayer.
In vitro mutagenesis of the cDNA clone, followed by expression either in transfected cells or in cell-free systems, may define the regions of the protein essential for membrane insertion, ligand binding, and receptor internalization and recycling.