Structure, Organization, and Chromosomal Mapping of the Human Macrophage Scavenger Receptor Gene*

Macrophage scavenger receptors (MSR) mediate the binding, internalization, and processing of a wide range of negatively charged macromolecules. Func- tional MSR are trimers of two C-terminally different subunits that contain six functional domains. We have cloned an 80-kilobase human MSR gene and localized it to band p22 on chromosome 8 by fluorescent in situ hybridization and by genetic linkage using three common restriction fragment length polymorphisms. The human MSR gene consists of 11 exons, and two types of mRNAs are generated by alternative splicing from exon 8 to either exon 9 (type 11) or to exons 10 and 11 (type I). The promoter has a 23-base pair inverted repeat with homology to the T cell element. Exon 1 encodes the 5”untranslated region followed by a 12- kilobase intron which separates the transcription initiation and the translation initiation sites. Exon 2 en- codes a cytoplasmic domain, exon 3, a transmembrane domain, exons 4 and 5, an a-helical coiled-coil, and exons 6-8, a collagen-like domain. The position of the gap in the coiled coil structure corresponds to the junc- tion of exons 4 and 5. These results show that the human MSR gene consists of a mosaic of exons that encodes the functional domains. Furthermore, the specific arrangement of exons played a role in determining the structural characteristics of functional domains.

Macrophage scavenger receptors (MSR) mediate the binding, internalization, and processing of a wide range of negatively charged macromolecules. Functional MSR are trimers of two C-terminally different subunits that contain six functional domains. We have cloned an 80-kilobase human MSR gene and localized it to band p22 on chromosome 8 by fluorescent in situ hybridization and by genetic linkage using three common restriction fragment length polymorphisms. The human MSR gene consists of 11 exons, and two types of mRNAs are generated by alternative splicing from exon 8 to either exon 9 (type 11) or to exons 10 and 11 (type I). The promoter has a 23-base pair inverted repeat with homology to the T cell element. Exon 1 encodes the 5"untranslated region followed by a 12kilobase intron which separates the transcription initiation and the translation initiation sites. Exon 2 encodes a cytoplasmic domain, exon 3, a transmembrane domain, exons 4 and 5, an a-helical coiled-coil, and exons 6-8, a collagen-like domain. The position of the gap in the coiled coil structure corresponds to the junction of exons 4 and 5. These results show that the human MSR gene consists of a mosaic of exons that encodes the functional domains. Furthermore, the specific arrangement of exons played a role in determining the structural characteristics of functional domains.
Macrophage scavenger receptors (MSR)' are trimeric gly-*This work is supported by research grants for cardiovascular disease and for aging and health from the Ministry of Health and Welfare of Japan and grants from the Ministry of Education, Science and Culture of Japan. Part of this work is also supported by grants from the Mitsubishi Foundation and the Cell Science Research Foundation. The costs of publication of this article were defrayed in part hy the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequencefs) reported in thispaper has been submitted tu the GenBankTM/EMBL Data Bank with accession number(s) 1113263, 013264, and 013265.
' The abbreviations used are: MSR, macrophage scavenger receptors; AA, any amino acid; bp, base pairs; kb, kilobases; LDL, low density lipoproteins; PCR, polymerase chain reaction; PMA, phorbol 12-myristate 13-acetate; RACE, rapid amplification cDNA ends; RFLP, restriction fragment length polymorphism; SRCR, scavenger receptor cysteine-rich domain. coproteins, which are implicated in the pathologic deposition of cholesterol during atherogenesis through the uptake of modified low density lipoproteins (LDL) and also in the host defense against pathogenic organisms through the recognition of bacterial endotoxin precursors (1-6). MSR mediate multiple functions. One of the most characteristic functions of MSR is its recognition of an extraordinarily wide range of ligands, including modified LDL (e.g. acetyl-LDL, oxidized-LDL) , modified proteins (maleylated bovine serum albumin), polynucleic acids (poly(I), poly(G)), carbohydrate (fucoidan), and other macromolecules (1). In order to mediate its role, the receptor must be processed and folded properly (7, 11) and expressed only on the cell surface of macrophages (8). After binding their ligands, the receptors gather in the coated pits, are internalized, and appear in certain vacuoles which lack acid phosphatases, probably endosomes of macrophages. They, then, dissociate their ligands and return to the cell surface (8,9). One of these multifunctional receptors is a well studied LDL receptor. In case of the LDL receptor, the gene consists of a mosaic of exons encoding each functional domain (10). The functional MSR protein is a trimer containing two different C-terminal subunits. Each subunit consists of six unique functional domains (2-4, 11).
In order to elucidate the genetic bases of this complex multifunctional receptor, and to study its regulation of gene expression, we have cloned and characterized the human MSR gene. In this paper we report 1) the complete nucleotide sequences of cDNAs, 2) the cloning and characterization of the whole MSR gene, 3) analysis of the 5'-flanking region, 4) determination of exon/intron organization and its relationship to protein structure and function, 5) chromosomal localization as determined by in situ hybridization, 6) the presence of three common restriction fragment length polymorphisms (RFLPs), and 7) the placement of the locus on the genetic linkage map of human chromosome 8.

MATERIALS AND METHODS
Cloning of Full-length cDNA-In order to obtain the full-length cDNA, we screened a cDNA library of a human monocytic leukemic cell line THP-1 treated for 3 days with 200 nM of phorbol 12-myristate 13-acetate (PMA) (4), using fragments from the 5' region of human type I and 11 MSR cDNAs, phSRl and phSR2 (4), as probes. Three cDNA clones were found to contain an additional 20-bp sequence beyond the 5' ends of phSR1. Screening of this library with typespecific restriction fragments from the 3' regions of phSRl and phSR2, respectively, as probes, resulted in isolation of clones containing additional sequences from the 3'-non-coding regions. Since these clones did not contain 3' ends of cDNAs including the poly(A) by the RACE method (12). RNA of PMA-treated THP-1 was used as signal and poly(A) tail, we performed an amplification of cDNA ends a template for the polymerase chain reaction (PCR) using an oligo(dT)-linker primer in combination with primers specific to either the type I or I1 3'-non-coding regions. PCR products were analyzed by Southern blot hybridization with an internal oligonucleotide as a probe. A hybridizing DNA band was cloned, and three independent clones were sequenced to exclude PCR artifacts.
Genomic Cloning-A cosmid library containing human genomic DNA partially digested with Sau3A was constructed in a cosmid vector pWE16 by a method previously described (13). Approximately one million recombinants were screened with inserts of cDNA phSRl and phSR2 as probes. Of the 18 positive clones, six representative clones (cMSR-16, -32, -33, -35, -40, and -44) were chosen for further analyses. Clones were aligned by restriction mapping and Southern blot hybridization with oligonucleotide probes synthesized from cDNA sequences. This analysis showed that three clones, cMSR-44, -16, and -32, overlapped with each other, and that another set of three, cMSR-33, -40, and -35, also overlapped with each other, although separately. Thus, two cosmid contigs were isolated in the MSR genomic region (Fig. 1). A gap between the two contigs could not be isolated after several attempts to screen the cosmid library with region-specific probes. The size of the gap was estimated by Southern blot analysis of genomic DNA using end probes of cMSR-32 and cMSR-33. Both probes hybridized to a 13.5-kb EcoRI fragment and a 14-kb BamHI fragment, indicating that the gap between these cosmids was located in these fragments. These two genomic fragments were cloned in A FIX and A DASH vectors (Stratagene) from genomic DNA completely digested with EcoRI and BamHI, respectively. A clone, hElO, containing an EcoRI-digested insert and a clone, AB12, containing a BamHI-digested insert were chosen for further analysis from positive clones ( Fig. 1). Genomic clones were subcloned in pBluescript I1 (Stratagene) after digestion with EcoRI, Xbal, HindIII, or BglII. Nucleotide sequences of exons and their boundaries were determined on these subclones. From the murine genomic DNA library, a A phage clone encoding an a-helical coiled coil domain was cloned.
Determination of Transcription Initiation Site-Primer extension reactions (14) were carried out with 30 pg of poly(A)+ RNA obtained from THP-1 treated with PMA (2, 4) or human liver. A singlestranded oligonucleotide, 5'-d(GTCCTAAAGAAAGCAGC)-3', which is complementary to the sequence of the 5' end of cDNA ( 4 ) was used as a primer. For an RNase protection assay (15), a 206-bp DNA fragment corresponding to the genomic DNA sequence surrounding the transcription initiation site was amplified with primers, forward-AGAGTGAGTTACTGAC and backward-GTCCTAAA-GAAAGCAGC, by PCR. RNA probes were synthesized with T7 RNA polymerase, [c~-~*P]UTP and the DNA construct. Fluorescent in Situ Hybridization-Cosmid clones were localized t o chromosomal bands by the method described previously (16). Replicated prometaphase R-banded chromosomes, prepared by treatment of thymidine-synchronized cells with bromodeoxyuridine, were used for non-isotopic fluorescent in situ hybridization of biotinlabeled cosmid probes. Detection was enhanced by fluorescein-avidin.
RFLP and Linkage Analysis-For screening RFLPs, DNAs from six unrelated individuals were digested with EcoRI, HindIII, BamHI, MspI, Taql, RsaI, PuuII, PstI, or BgZII. after blotting, DNAs were hybridized with 32P-labeled MSR cosmid DNAs. Polymorphic markers were genotyped by using a panel of 59 three generation families (667 individuals) at the Howard Hughes Medical Institute, the University of Utah. Linkage analysis was performed with the LINKAGE program package (17). The genetic map was constructed from the reference families by using the GMS (Gene Mapping System) algorithm, as described previously (18).

RESULTS AND DISCUSSION
Complete Structure of Type I and 11 MSR cDNAs-The cDNAs clones isolated by the combination of the rescreening of the PMA-treated THP-1 cDNA library and the RACE method covered the entire sequences of the 3' end of type I and I1 MSR cDNAs. Results of the primer extension and RNase protection assay (see below) were taken into consideration for determination of the cDNA 5' ends. Type I MSR cDNA consists of 122 bp of 5'-untranslated region, 1353 bp of coding region, and 2204 bp of 3"untranslated region as determined by the RACE method. There is a sequence, AT-TAAA, 15 bp upstream from the polyadenylation site. The total length of type I MSR cDNA, 3679 bp, corresponds to the length of the MSR type I mRNA, about 4 kb, as seen in Northern blot experiments considering the length of the poly(A) tail. Type I1 MSR cDNA consists of 122 bp of the 5'untranslated region, 1074 bp of the coding region, and 1634 bp of the 3"untranslated region as determined by the RACE method. There is a sequence, ATAAA, 15 bp upstream from the polyadenylation site. The length of type I1 MSR mRNA determined by Northern analysis also corresponds to the total length of the type I1 MSR cDNA plus poly(A) tail.
Type I and I1 MSR cDNA have an identical sequence from the 5' terminus to nucleotide number 1155, but each has their unique 3' coding and untranslated sequence thereafter. This strongly suggests that the type I and I1 cDNAs are generated from a single gene by alternative splicing processes involving the 3' region of the gene.
Cloning of Human MSR Gene-A 120-kb genomic region containing the entire human MSR gene was cloned as a single contig consisting of six overlapping cosmids and two X phage clones (Fig. 1). The length of the MSR gene is approximately 80 kb, which is about 20 or 30 times the size of type I or I1 cDNA, respectively. It consists of 11 exons interrupted by 10 introns ( Table I). Exons 1 through 8 and 10 range from 54 to 413 bp in size, while exons 9 and 11 are relatively long, being 1675 and 2335 bp, respectively. Sequences at the exon-intron junctions for all 10 introns are compatible with the consensus sequences for the splicing junctions, including AG-GT (19).
Comparison of the gene sequence with the two cDNA sequences revealed that exon 1 through 8 encodes the 5' cDNA region common to both type I and I1 cDNAs, exon 9 encodes the C-terminal coding region and the 3"untranslated region specific to type I1 cDNA, and exons 10 and 11 encodes the Cterminal coding region and the 3"untranslated region specific to type I cDNA (Fig. 1). This structural feature demonstrates that type I and I1 cDNAs are generated from a single MSR gene by alternative splicing from the last common exon (exon 8) of the MSR gene to either exon 9 (for type I1 cDNA) or to exon 10 and 11 (for type I cDNA). Both exon 9 and exon 11 possess polyadenylation sites for type 11 and I mRNAs, respectively. The presence of a single MSR gene in human genome was also verified by Southern blot hybridization of genomic DNA with several probes, each containing single exons of the MSR gene. Each probe detected a single band in several restriction enzyme digests, indicating the presence of a single MSR gene in the human genome (data not shown).
Characterization of the 5' End of MSR Gene-The site of transcription initiation in the MSR gene was determined by primer extension analysis and RNase protection assay using poly(A)+ RNA obtained from PMA-treated THP-1 and human liver tissue. The size of the 5'-non-coding region determined by primer extension analysis is 122 bp (Fig. 2). There is an intron which splices the 5"non-coding sequence. Since the 5"non-coding region is not fully covered by the cloned cDNAs, we confirmed the transcription initiation site by RNase protection assay. As can be seen in Fig. 3, the size of the probe protected from RNase digestion was the same as that determined by primer extension analysis. The protection was specific to mRNA of macrophages (Fig. 3, lane 1 ); mRNA from HEL cell (cultured human erythroleukemia cells) could not protect the probe (lune 2 ) .
The nucleotide sequence of the 5'-flanking region is shown in Fig. 4. There are two AT-rich sequences, TATTGAAA and ATTAAGAAA, that might serve as TATA boxes (20, 21) between 18 and 37 bp upstream from the transcription initiation position. We did not find the sequence CCAAT (21). The presence of two TATA-like sequences and the lack of a polyacrylamide 50"; urea gel elertrophoresis and were compared with the adjacent sequence ladder ol)tained with the same primer ( 3 1 1 and a suhrlone of genomir DNA containing the :i'-flanking region. The antisense sequence is indicated.      CCAAT box was similar to the structure of the promotor of the human LDL receptor gene (10). In contrast, the consensus sequence of a sterol-responsive element which mediates the down-regulation of gene expression by sterol in the LDL receptor gene (22) was not found in the MSR promoter region. This result is consistent with the finding that MSR activity is not regulated by cellular sterol content, which leads to the accumulation of cholesterol and foam cell formation during atherogenesis (1). There is a 23-bp inverted repeat sequence at -592 to -570, and at -424 to -402 (Fig. 4, underline). The region surrounding this 23 bp also has additional inverted sequences (Fig. 4, dotted underline). This inverted repeat sequence contains a subsequence GGGATTACA which is highly homologous to the consensus T cell element GGGPuTTT(C/A)A, which mediates the T cell-specific induction of the interleukin 2 gene by phorbol ester (23). The expression of the MSR protein is limited to macrophages and related cells (6, 7), and in THP-1 cells the expression is induced by phorbol ester treatment (2, 11). A consensus for the AP-1 binding site, TGA(G/C)TCA, which mediates the phorbol ester-inducible enhancer element (24), was not found in this region. The expression of the MSR gene reaches the peak 3 or 4 days after PMA treatment, which is relatively slower than the period required for other phorbol ester-responsive genes. These results suggest that the effect of phorbol ester on the MSR gene may be indirectly mediated.

Rabbit 212-ISKLKER VHUSAE
Exon Organization and Protein Domains- Fig. 5 shows the relationship between exon organization and protein domains as determined on the basis of protein sequence and in vitro mutagenesis experiments of the receptor expressed in cultured cells (2)(3)(4)27). The introns interrupt the protein coding sequence in such a way that many of the protein segments are revealed as products of individual exons. Exon 1 encodes the 5"untranslated region followed by a 12-kb intron which separates the transcription initiation site and the translation initiation site. The MSR gene has a collagen-like domain, and in the case of mouse a2(I) collagen gene, the enhancer element is located in the first intron, while in the case of the al(1) collagen gene, insertional mutagenesis in the first intron resulted in a block of transcription of this gene (25,26). The role of the first intron in MSR gene expression remains an open question. Exon 2 encodes the last 4 bases of the 5'-noncoding region and 70% of the cytoplasmic domain. Exon 3 encodes the remaining cytoplasmic domain and most of the transmembrane domain.
Exons 4 and 5 encode the region which possesses a cluster of possible N-linked sugar attachment sites. Analysis of the bovine and human receptor protein indicates that most of the potential sites are actually glycosylated (2,11). This region includes an a-helical coiled-coil structure and spacer domain connecting a membrane-spanning domain and fibrous structure. Within the a-helical coiled-coil structure (2,4), there are as many as 23 seven amino acid "heptad" repeats. Fig. 6 shows the comparison of predicted heptad repeats from various animal species. The repeats are divided into two groups due to the disruption of repeats by a skip a t 204-211. The junction of exon 4 and 5 exactly matches this skip position, indicating that the a-helical coiled coil domain of MSR consists of two coil structures encoded in different exons and that the junction generates a distortion of the coiled-coil which might be important for its function. The interruptions of hydrophobic amino acid repeats by histidines are encoded by both exons. These structural features are well conserved in the animal species studied. Analysis of the murine MSR gene confirmed that the position of the gap in the coiled-coil structure exactly corresponds to the position of the junction of the two exons.
Exons 6-8 encodes the collagen-like structure. Exons 6 and 7 have a size of 81 bp encoding nine Gly-Xaa-Yaa triplets. In the case of fibrillar collagen genes, most exons in the triple helical domain have a size of 54, 45, 99, or multiples of 9 bp (26); this rule is conserved in exons of the MSR gene. Exon 8 encodes the five Gly-Xaa-Yaa triplets and the short C-terminal non-triple-helical region. I n vitro mutagenesis experiments indicated that the cluster of basic amino acids encoded by exon 8 is essential for ligand binding.' Exon 9 encodes the type 11-specific coding sequence and the 3'-non-coding sequence. Exons 10 and 11 encode the type I-specific coding sequence, the scavenger receptor cysteinerich domain (SRCR domain), and the 3'-non-coding sequences. A group of genes which encode the domain highly homologous to the SRCR domain has been reported (27). The SRCR domain is divided into two regions. The N-terminal half contains 2 conserved cysteines, and a higher degree of identity among the amino acids of the SRCR proteins is present in this region. The C-terminal region of the SRCR domain has 4 cysteines; each cysteine residue appears every 10 amino acids. The N-terminal half is encoded by exon 10 and the C-terminal half by exon 11. The genes encoding proteins homologous to the SRCR domain (27) can be found in species from sea urchin (sea urchin speract receptor) to humans (CD5 and complement factor l), suggesting that this domain may be used for the construction of complex mosaic proteins and/or mediates certain particular physiological functions.
As an integral membrane protein, MSR belongs to the so called "inside-out" type receptor (2, 4). MSR lacks a signal sequence and have an N-terminal cytoplasmic domain, a single transmembrane domain and two different C-terminal extracellular domains. Alternative 3' splicing resulting in the generation of various proteins have been reported in several genes. In the case of the immunoglobulin constant region, secreted and membrane-bound forms are generated. In the case of the calcitonin gene, calcionin and calcitonin generelated proteins are generated by 3' alternative splicing, which is also related to cell-specific splicing activity. In the case of MSR, organization of the MSR gene is suitable for the generation of multiple C-terminal extracellular structures.
Physical and Genetic Mapping-We localized the human MSR gene cytogenetically on chromosomal bands by means of non-isotopic fluorescent in situ hybridization. High resolution mapping was facilitated by simultaneous staining of replicated prometaphase R-bands with propidium iodide (16). Fig. 7 demonstrates the result obtained with cosmid clone MSR35, which indicated that the MSR gene is located on human chromosomal band 8p22.
RFLP markers were then sought at the human MSR locus to carry out genetic linkage mapping. MSR cosmid DNAs were used as probes in Southern blot analysis of genomic DNA from six unrelated individuals; the DNA was digested with various restriction enzymes. Three common RFLPs were identified (Table 11). A MspI RFLP with four alleles of 6.3, 3.1, 2.9, and 2.7 kb was detected by a 0.9-kb EcoRI-Hind111 fragment of cMSR-32 as a probe. The same probe detected a BamHI RFLP with alleles of 14 and 9 kb. A HindIII RFLP with alleles of 13 and 9 kb was detected by a 9-kb HindIII fragment of cMSR-35. Allele frequency and heterozygosity, calculated from the data on 100 unrelated individuals of the reference families, are shown in Table 11.
To locate MSR on the genetic map, genotypes of the three T. Doi and T. Kodama, personal communication.

Human Macrophage Scavenger
Receptor Gene 2125 MSR RFLP systems were determined for 667 individuals in 59 CEPH and Utah reference families (28). To improve the information content of these systems, we used a MSR haplotype constructed by combining genotypes of all three RFLP systems, which gave a heterozygosity of 0.79. Pairwise linkage analyses of MSR haplotype showed significant linkage with five markers, D8S21, NEFL, D8S5, D8S17, and LPL. A sexaveraged linkage map of the six loci includipg the MSR was constructed by a multipoint linkage analysis with LINKAGE and GMS programs and the precise location of MSR in the linkage group was sought. This analysis established the most likely order and distances of the six loci as Tel.-D8S17-MSR-D~S~~-L P L -D~S~-N E F L -C~~I . and placed MSR locus 11 cM dista1 to LPL and 22 cM distal to NEFL, respectively (Fig.  8). LPL and NEFL have been localized cytogenetically to 8p22 and 8p21, respectively (29,30). Therefore, highly significant linkage of the MSR locus with LPL and NEFL supports the cytogenetic assignment of MSR at 8~2 2 .
MSR are thought to play an essential role in the metabolism of modified plasma lipoproteins by macrophages and are implicated in the pathogenesis of atherosclerosis. Genetic variations at the MSR locus may influence the susceptibility t o atherosclerotic disorders. The RFLP markers identified in this study will enable future genetic linkage and association studies of those pathological conditions in which the MSR gene is a candidate gene. High heterozygosity (0.79), obtained by using the three systems jointly, will prove particularly useful in such investigations. Although the involvement of MSR in atherogenesis is well known, the physiological role of MSR remains obscure. The genetic study of the MSR gene may provide us further information concerning the physiological and pathological role of MSR.