Identification of cDNA Clones Encoding Different Domains of the Basement Membrane Heparan Sulfate Proteoglycan”

We have used antibodies to the basement membrane proteoglycan to screen Xgtl 1 expression vector librar- ies and have isolated two cDNA clones, termed BPG 5 and BPG 7, which encode different portions of the core protein of the heparan sulfate basement membrane proteoglycan. These clones hybridize to a single mRNA species of approximately 12 kilobases. Amino acid se- quences obtained on peptides derived from protease digests of the core protein were found in the deduced sequence, confirming the identity of these clones. BPG 5 spanned 1986 base pairs and has an open reading frame of 662 amino acids. The amino acid sequence deduced from BPG 5 contains two cysteine-rich do- mains and two internally homologous domains lacking cysteine. The cysteine-rich domains show homology to the cysteine-rich domains of the laminin chains. A globule-rod structure, similar to that of the short arms of the laminin chains, is proposed for this region of the proteoglycan. The other clone, BPG 7, is 2193 base pairs long and has an open reading frame of 731 amino acids. The deduced sequence contains eight internal repeats with 2 cysteine residues in each repeat. These repeats show homology to the neural-cell adhesion mol- ecule N-CAM and the plasma alB-glycoprotein. Loop-ing structures similar to these proteins and to other proteins of the immunoglobulin gene superfamily are proposed for

Basement membranes are thin extracellular sheets that underlie all epithelial and endothelial cells and surround muscle and adipose cells and the peripheral nervous system. Major components of basement membranes include collagen IV, laminin, nidogenlentactin, and a heparan sulfate proteoglycan (1,2). While there is considerable information on the primary structures of the former components, the heparan sulfate proteoglycan structure has been variable in different tissues (see Ref. 3 for review), and the primary structure is * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper hus been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 504054 and 504055.
4 Supported by a fellowship from the Juvenile Diabetes Foundation. To whom correspondence should be addressed Instituto Nazionale per la Ricera sul Cancro, Viale Benedetto XV, 10 still not known. The heparan sulfate proteoglycan has diverse functions, creating an ionic charge barrier in the surfaces of basement membranes (4), binding other basement membrane components (5), and participating in cell adhesion and stabilization of focal contacts (6).
Proteoglycans are a diverse group of macromolecules characterized by the presence of one or more glycosaminoglycan side chains attached to a core protein (3) and are usually classified by their source, size, and attached glycosaminoglycan. Immunological studies have provided information on the nature and distribution of these proteins. However, due to their high carbohydrate content, proteoglycans are difficult to characterize directly, and their exact structure and functional domains are best defined by cDNA sequencing and analysis. Several proteoglycan core proteins have been cloned and sequenced, including the aggregating cartilage proteoglycan ( 3 ,PG 19 (8), , collagen IX (12), and the invariant chain proteoglycan (13), as well as partial sequences of a large fibroblast chondroitin sulfate PG (14) and the chicken (15) and bovine (16) cartilage proteoglycan. These clones have provided information on core protein structure and have, for example, shown homologies with other matrix components.
The heparan sulfate-containing basement membrane proteoglycan from the EHS' tumor (17) has been extensively investigated. This proteoglycan has a large core protein (Mr = 400,000) (18). Rotary-shadowed electron micrographs of the proteoglycan show the core protein in the form of a series of globules on a linear core, giving a "beads on a string" appearance (19)(20)(21), with the 2-3 heparan sulfate side chains projecting from a terminal globule. Proteolytic mapping has shown that the core protein consists of a 200-kDa trypsinresistant fragment (P200), which lacks heparan sulfate chains, and a trypsin-sensitive (200 kDa) domain containing the heparan sulfate chains (22). The trypsin-resistant fragment is further subdivided into immunologically distinct 44 and 46-kDa fragments (P44 and P46) by V8 protease.
Here we report partial sequences from cDNA clones to the core protein of this proteoglycan using mRNA from basement membrane-producing cells. The deduced sequence of these clones reveals two different domains in the core protein, one with sequence homology to the laminin B1 and B2 chains and the other with homology to the neural cell adhesion molecule N-CAM and the immunoglobulin gene superfamily.

MATERIALS AND METHODS
Peptide Purification of Sequencing-Heparan sulfate proteoglycan was purified from the EHS tumor, and antibodies against the proteo-The abbreviations used are: EHS, Engelbreth-Holm-Swarm; TBS, Tris-buffered saline. Basement Membrane Proteoglycan cDNA Clones glycan were raised as described (17). The 46-and 44-kDa fragments of the core protein generated by V8 protease digestion were purified as described (22). Briefly, purified proteoglycan was digested for 6 h with V8 protease and the digest chromatographed on Sepharose CL-4B in 0.2 M NaCl, 0.02 M Tris-HC1, pH 8.0. Fractions containing the fragments were then chromatographed on a DEAE-5PW high performance liquid chromatography column. These were reduced with dithiothreitol, alkylated with iodoacetamide (23), and rechromatographed on a column of DEAE-5PW, which resolved several peaks from each fragment. The peaks were collected and sequenced in the gas phase as described below. Tryptic peptides from the core protein were also isolated and sequenced. The proteoglycan was carboxymethylated using standard procedures (24). Briefly, lyophilized proteoglycan was dissolved in 6 M guanidine HCl, 3 mM EDTA, 0.25 M Tris-HC1, pH 8.6, at 5 mg/ml and reduced by addition of 8-mercaptoethanol (725 PM) at 50 "C under nitrogen for 1 h. After cooling to room temperature, the sample was alkylated with ['4C]iodoacetic acid (Amersham Corp.) supplemented with iodoacetic acid. 8-Mercaptoethanol was added to stop the reaction. The sample was dialyzed against 5% acetic acid followed by dialysis first against water and then against 0.1 M N-ethylmorpholine, pH 7.5 (Aldrich). After dialysis, the sample was digested with 12.5 units of ~-1-tosylamido-2phenylethyl chloromethyl ketone-trypsin immobilized on agarose (Pierce Chemical Co.) for 4 h at 37°C. The digest was stopped by addition of 200 pl of glacial acetic acid and lyophilized. Tryptic peptides were chromatographed on a column (1.5 X 90 cm) of Bio-Gel P-10 (Bio-Rad, 200-400 mesh) in 0.1 M N-ethylmorpholine, pH 7.5. Resolution was poor, and the single broad peak was arbitrarily divided into six fractions which were lyophilized. Each fraction was reconstituted in 100 ~1 of 0.15% trifluoroacetic acid and applied to a Zorbax OD5 column (reverse-phase) equilibrated in 0.15% trifluoroacetic acid and connected to a series 4LC Perkin-Elmer high performance liquid chromatography apparatus. Peptides were eluted from the column in a linear gradient of 0-100% acetonitrile in 0.15% trifluoroacetic acid with a flow rate of 0.8 ml/min, and fractions were monitored for absorbance at 230 and 280 nm. An aliquot (1%) of each fraction was assayed for radioactivity by liquid scintillation. Purified peptides were lyophilized and reconstituted in 0.2% acetic acid and applied to a gas-phase sequenator (Applied Biosystems Model 470A) for NH2-terminal sequence determination.
Library Construction-Two cDNA libraries were constructed as follows. A random primer cDNA library was made from poly(A)' RNA isolated from EHS tumor cells (18), followed by guanidine isothiocyanate extraction, CsCl gradient centrifugation (25), and oligo(dT)-cellulose chromatography (26). First and second strand synthesis, methylation, end polishing, and EcoRI linker ligation were as described (26) for primer extension, except that mixed hexamers (Pharmacia LKB Biotechnology Inc.) were used as a random primer. The cDNA was digested with EcoRI, size-fractionated on a Sepharose CL-4B column, ligated into Xgtll (Vector Cloning Systems), and packaged (Vector Cloning Systems).
A second library using a primer based on amino acid sequence was constructed. Poly(A)+ RNA was prepared by the guanidine hydrochloride method (27) from mouse melanoma K1735 variant M2 cells grown in culture (28). The degenerate primer TA T A GGNGACGC based on the amino acid sequence Tyr-Tyr-Gly-Asp-Ala was used as a specific primer for first strand. The ligated and EcoRI-cut cDNA was size-selected on a 1.5% agarose gel from 1 to 4 kb, ligated, and packaged as above.
Library Screening and Isolation of Clones-Libraries were screened with antibodies to the heparan sulfate proteoglycan from the EHS tumor. Libraries were plated onto Y1090 Escherichia coli and allowed to grow for approximately 3 h at 42 "C. Nitrocellulose filters (Schleicher & Schuell BA85), which were soaked in 10 mM isopropyl-1-thio-0-D-galactopyranoside and allowed to air-dry, were then placed onto the cultures, and the cultures were incubated at 37 "C overnight. Then the filters were removed, blocked twice for 15 min with 3% gelatin in TBS (20 mM Tris-HC1, pH 7.5, 500 mM NaCl), rinsed with 1% gelatin in TBS, and incubated with antibodies to the proteoglycan (in 1% gelatin in TBS) for 2 h. Filters were then rinsed with TBS (three times for 15 min) and incubated with second antibody (antirabbit conjugated with horseradish peroxidase with 1% gelatin in TBS) for 2 h. Filters were again rinsed (three times for 15 min) with TBS and developed in 0.05% 4-chloro-1-naphthol, 17% methanol, 0.0005% H20z in TBS. Immunopositive plaques were picked, replated, and rescreened until pure. Large-scale preparations of purified phage T T T c c were then made according to Maniatis et al. (25). Inserts were cut out with EcoRI and purified by agarose gel electrophoresis. Clones were also subcloned into PBR322 for large-scale preparations of plasmids (25) and purification of inserts. Isolation of Antibodies to the Fusion Proteins-Antibodies specific to the fusion proteins were affinity purified from antibodies to the proteoglycan. Immunoreactive clones were plated in 150-cm2 dishes on a layer of Y1090 cells at near confluence, and nitrocellulose filters were prepared and reacted with antibodies to the proteoglycan as above for immunoscreening. Filters were then extensively rinsed in TBS, and antibodies remaining bound were eluted with 15 ml of 0.2 M glycine, pH 2.8, for 3 min. Eluted antibody solution was immediately neutralized with 4.5 ml of 1 M KzHP04 and dialyzed against phosphate-buffered saline overnight. This antibody solution was then used to stain Western blots of enzymatic digests of the proteoglycan. This method was used to determine the portions of the proteoglycan the clones encoded, as well as to eliminate weaker reacting, falsepositive clones which did not stain specific bands.
Hybridization Analysis-Northern blots of poly(A)+ RNA were prepared by the procedure outlined in Maniatis et al. (25). Southern blots were prepared from mouse spleen DNA (a gift of Dr. P. Killen, Dept. of Pathology, University of Michigan) digested with either EcoRZ, HindIII, or BglII as described (25). Purified cDNA inserts were nick-translated (25), and approximately 1 X lo7 cpm of labeled cDNA was incubated with blots under stringent hybridization conditions as described (29).
DNA Sequencing and Analysis-DNA clones were sequenced by "shotgun" random fragment generation (30) and the dideoxy chain termination method (31). Many regions were sequenced separately with the Sequenase system (United States Biochemical Corp.), as well as with Klenow fragment. The combination of the two methods eliminated ambiguous regions in the sequence. Sequences were ordered with the Microgenie program (Beckman), and were analvzed with the IDEAS programs-SEQMAN, SEQHP, SEQDP, ALOM, DELPHI, and CHOFAS.

RESULTS
Peptide Sequencing-Peptides were isolated from trypsin and V8 protease digests of the proteoglycan. NH2-terminal sequences obtained from nine of these peptides, two from the V8 digest, and seven from the tryptic digest, are listed in Table I. Two peptides, one from the V8 digest (V46) and one from the tryptic digest (T48), had overlapping sequences and are probably from the same region of the molecule (see below).
Antibody Screening and Immunoreactivity-A cDNA library constructed from EHS tumor RNA with random primers was screened with polyclonal antibodies to the proteoglycan, and a 309-base pair clone (BPG 1) was isolated. This clone was sequenced but did not show matches with the available peptide sequences. However, a number of criteria indicated that this clone encodes a portion of the proteoglycan core protein. BPG 1 hybridizes to an RNA species of the TABLE I Amino acid sequences derived from peptide fragments NHz-terminal sequences from peptides isolated from the core protein of the basement membrane heparan sulfate proteoglycan by tryptic or V8 protease digestion. Amino acids are shown in the singleletter code. Peptides V46 and T46 contain overlapping sequence.

T r y p t i c p e p t i d e s T46 G T P Q D C Q P C P C Y G A P A A G Q A A H T C F L L T D G T2 A A G V P S A S I T W R T3 A F A Y L Q V P E R T4 F L V H D A F W A L P K T5 Y E L G S G L A V L R T6 V D S Y G G F L R T7 C R P T T Q E I V Y R R
expected size (see below), and the amount of this mRNA, as well as the protein core itself, increases in F9 teratocarcinoma cells treated with retinoic acid and CAMP (32). This species of mRNA was also present in all cell lines tested which made the 400-kDa precursor protein, with the highest levels observed in the mouse melanoma K1735 variant M2 cells. RNA prepared from these cells was used in the construction of a subsequent library with oligonucleotide primers based on the peptide sequence data. Nine immunopositive clones were isolated by screening with antibodies to the proteoglycan. Antibodies generated by affinity selection to the fusion proteins of each of these cDNA clones were tested for their reaction with Western blots of heparitinase and protease-treated proteoglycan ( Fig. 1). Amido Black staining demonstrates the 400-kDa core protein obtained by digesting the proteoglycan with heparitinase, as well as the large (P200, M , z 200,000) fragment produced by trypsin digestion, and the smaller P46 and P44 fragments (Mr s 44,000 and 46,000) produced by V8 protease digestion.
Polyclonal antibodies prepared to the proteoglycan itself stained numerous bands in these proteolytic digest. Seven of the antibodies to the fusion proteins from cDNA clones did not stain either the core protein or peptide fragments (not shown), suggesting that these were to other proteins. However, antibodies to the fusion proteins of clones BPG 5 and BPG 7 stained the core protein produced by heparitinase treatment of the proteoglycan, as well as different peptides in the proteolytic digests. Antibodies to the BPG 1 fusion protein produced a pattern similar to those for BPG 7 (data not shown). Antibodies to the fusion protein of BPG 5 stained the P200 fragment strongly and the P46 fragment. Antibodies to the fusion protein of BPG 7 did not stain the P200 or the P46 fragments, but did stain numerous smaller bands in the trypsin digest and certain bands in the V8 digests of the proteoglycan. These results suggest that clones 5 and 7 encode different regions of the core protein and that BPG 5 encodes the P46 subfragment of P200.
Northern Analysis-BPG 1 hybridizes to a 12-kilobase mRNA species (Fig. 2, lune I), larger than the mRNA for the  laminin A and B1 chains (Fig. 2, lune 2). Clones 5 and 7 also hybridize to this mRNA on Northern blots (Fig. 2, lanes 3-7). This species is in the size range expected for the proteoglycan core protein and was present in basement membrane producing cells such as those from the EHS tumor (Fig. 2, lanes 3 and 5), induced F9 teratocarcinoma (Fig. 2, lune I), M2 cells of the mouse melanoma K1735 (Fig. 2, lunes 4 and  6 ) , and in kidney, a tissue with abundant basement membranes (Fig. 2, lune 7). M2 cells contained more of the 12kilobase mRNA than the EHS cells (compare Fig. 2, lanes 3 and 5 with lunes 4 and 6), while this mRNA was present at much lower levels of the F9 cells and whole kidney.
Sequence Analysis of BPG 5-These cDNA clones were sequenced and the complete sequence of BPG 5 and the amino acid sequence for which it codes is shown in Fig. 3. BPG 5 is 1986 nucleotides long with a single open reading frame of 662 residues, encoding a fragment of the proteoglycan of M , z 71,129. Also shown in Fig. 3 beneath the deduced sequence are 47 residues from the overlapping sequences V46 and T46 (Table I). This sequence from the peptides completely agrees with the deduced sequence from residues 245 to 291, except for 2 residues, and confirms the identity of BPG 5 as encoding a portion of the P46 fragment of the proteoglycan. The amino acid just NHp-terminal to the V46 peptide sequence is glutamine, the residue at which V8 protease cleaves peptides. The amino acid sequence Tyr-Tyr-Gly-Asp-Ala from the P46 fragment, used for determining the degenerate oligonucleotide primer, occurs at the 3' end of the clone, where priming apparently occurred. There is a potential N-glycosylation site (Asn-X-Ser) at position 317. There are several noteworthy structural features predicted by the deduced sequence of BPG 5. Cysteine residues (Fig. 3,  TCC CCT GCA CCC GGG GAG CTG TCG TTC TCT TCC TTC  CAC AAC CTC CTG TCT  GAA CCC TAC TTC  TGG AGT CTT CCC GCC AGC TTC CGA Gu;   105  120 165 180 Ser P r o A l a Pro G l y G l u L e u

225
A s p L y s V a l T h r Ser T y r G l y G l y G l u L e u A r g P h e T h r V a l Met G l n A r g P r o A r g P r o Ser Ser A l a P r o L e u H i s A r g G l n Pro L e u Ser P h e T y r T r p G l n L e u Pro G l u I l e T y r G l n G l y Asp L y s V a l A l a A l a T y r G l y G l y L y s L e u A r g T y r T h r L e u

1575 590
A l a L e u G l n G l y P r o G l u A r g A r g Ser T y r G l u Ile Ile P h e A r g G l u G l u P h e T r p A r g A r g P r o A s p G l y G l n P r o A l a T h r A r g G l u  boxed) occur in two clusters, one in the central portion of the clone (amino acids 187-395, CR1) and the other at the 3' end (amino acids 591-662, CR2). Computer analysis of the sequence by the program SEQHP indicates that these domains have an internal repeating structure containing 8 cysteines, with 1 completely conserved glycine. A schematic model of BPG 5 is shown in Fig. 4. There are four repeats in the central cluster, and one and a half in the 3' cluster. Comparison with laminin sequences indicates that similar cysteine repeats are found in the B1 and B2 chains of laminin (33,34), in the cysteine-rich, rod-like elements in domains I11 and V. The cysteine repeats in BPG 5 and those of the laminin B1 and B2 chains are aligned in Fig. 5. The position of the cysteines and a glycine residue are highly conserved, and certain other residues are frequently conserved. The third repeat in the central cluster has an insertion of nine amino acids in the sequence Cys-X-X-Cys-X-Cys. Interestingly, the repeats in domain V and I11 of the B1 chain and domain V of the B2 chain begin at this point. It is possible that this indicates an exon boundary where duplications have occurred.
The cysteine clusters are separated by cysteine-free regions, located on either side of the central cluster, which show internal homology with each other. This includes a segment of 17 identical residues, and 2 conserved tryptophan residues. The alignment of these cysteine-free regions and the first 1.   laminin B1 and B2 chains (33, 34). Amino acid numbers are given to the left. The consensus sequence is based on all sequence shown. The darkest boxes are highly conserved cysteine and glycine residues (capital letters in consensus sequence). Lighter boxes are amino acids conserved more than 50% (also capital letters in consensus sequence). Lightest boxes are those conserved more than 30% (small letters).
repeats of CR1 and CR2 are shown in Fig. 6. In addition to the highly conserved residues in the cysteine-free regions, the sequences of the first 2 cysteine repeats are also very highly conserved. It is possible that the cysteine-free regions and the flanking cysteine-rich regions were duplicated together as a large unit. The cysteine-free segments appear to be similar (using SEQHP) to domain IV, a globular structure in the laminin B2 chain, which also occurs between cysteine-rich domains. There are even higher similarities to equivalent regions (IVa and IVb) in the laminin A chain (67). Analysis of BPG 5 with the computer program ALOM indicates a potential transmembrane segment of 17 amino acids at positions 138-154, which is shorter than most transmembrane segments. BPG 7 hybridized to the same mRNA species as BPG 5.
However, sequence analysis indicates that the peptide encoded by this clone has a completely different structure than that encoded by BPG 5 (Fig. 7). This clone is 2193 nucleotides long with a single open reading frame of 731 amino acids (Mr E 77,776). BPG 1 was found to be contained within BPG 7 (nucleotides 851-1160). A 12-residue sequence of a tryptic fragment, T2, was identical to a portion of the amino acid sequence and is shown below the deduced sequence following a potential trypsin cleavage site (Arg-Ala). This sequence also contains four potential N-linked carbohydrate attachment sites (Am-X-Thr or Asn-X-Ser). The cysteines within this clone are regularly spaced and constitute eight repeats of approximately 95-residue segments with each repeat containing 2 cysteine residues and a conserved tryptophan. Schematic models of this clone are shown in Fig. 4, and the alignment of these repeats is shown in Fig. 8. Three of the repeats are highly conserved with respect to each other (amino acids 183-474, repeats 3,4, and 5). The remaining repeats show varying but lower degrees of conservation. Searches of the computer data banks indicated two proteins with significant homology to the sequence encoded by BPG 7, the mouse neural cell adhesion molecular N-CAM (35), and the plasma alB-glycoprotein (36). Alignments of the homologous regions within these proteins are shown in Fig. 8. Chicken N-CAM (37) (not shown) has a very similar sequence to mouse N-CAM and resembles the structure encoded by BPG 7. Statistical analysis by the program SEQDP comparing the BPG 7-encoded sequence with the mouse N-CAM sequence using 20 random alignments and a gap penalty of eight indicated a standard deviation from the optimal alignment of 19.6 units. The same statistical analysis against the alB-glycoprotein indicated a standard deviation of 10.8 units.
Southern analysis was performed on mouse genomic DNA to examine the possibility of closely related genes to this proteoglycan (Fig. 9). Genomic mouse DNA was digested with either EcoRZ, HindIII, or BglII. Clones BPG 5 and BPG 7 had internal BglII sites, but no internal EcoRI or HindIII sites. BPG 5 hybridized to a single large band in the EcoRZ and HindIII digests and to a major and several minor bands in the BglII digest. These results are consistent with a single gene. BPG 7 hybridized to several bands with varying intensities in all digests. This pattern could be due to other similar genes or to cross-hybridization with the highly conserved repeats in this domain which may exist elsewhere on the sequence encoding the proteoglycan.

DISCUSSION
The isolation and sequencing of two nonoverlapping clones for the basement membrane proteoglycan has allowed a portion (40%) of the structure of the molecule to be predicted, showing two distinct but repeated structural motifs. BPG 5 encodes a portion of the proteoglycan core protein which resembles domains in the short arms of the laminin chains, suggesting that these components have an evolutionary relationship. It is probable that the cysteine-rich repeats form disulfide-bonded, rod-like structures, whereas the cysteine-free regions form globular domains. Such a structure is consistent with electron micrograph data indicating that the core protein has globules separated by short rod-like portions (19)(20)(21). There may also be immunological cross-reaction with laminin due to common sequences, since antibodies to laminin immunoprecipitate the precursor protein to the proteoglycan (18), while antibodies to the proteoglycan cross react with laminin (17). These similarities imply parallel structure and functional relationships between these regions of the proteoglycan and those on laminin. A cell binding sequence has been identified within the cysteine repeats in the laminin B1 chain, Tyr-Ile-Gly-Ser-Arg (38), but this sequence has not been found in the proteoglycan as yet. The functional importance of these cysteine repeats is strongly suggested by their conservation in four basement membrane peptides (laminin A, B1, B2, and the proteoglycan). It is possible that these regions may participate in interactions shared by these molecules. For example, laminin and the heparan sulfate proteoglycan both bind to collagen IV at approximately the same site along the molecule (5).
BPG 7 encodes a portion of the proteoglycan composed of eight repeating units, which is completely different from BPG 5. One portion of this clone contains a conserved Ser-Gly, the sequence to which glycosaminoglycans are often linked (39)(40)(41)(42), and the surrounding sequence, Asp-Ser-Gly-Glu-Tyr, is conserved in four of the eight repeats. The Ser-Gly attachment sites for chondroitin sulfate in the cartilage proteoglycan were noted to be in close association with acidic and hydrophobic residues (7,16). The sequence Asp-Ser-Gly-Glu-Tyr follows this pattern. This sequence is in the trypsin-sensitive half of the core protein, which also contains the heparan sulfate side chains. It is possible that these conserved Asp-Ser-Gly-Glu-Tyr sequences could be used for heparan sulfate side chain attachment, although this has not been shown directly.
The alB-glycoprotein (36) has significant sequence similarity to the domain encoded by BPG 7, and the 5' portion of mouse N-CAM (35) shows even greater similarity. Both N-CAM and the plasma cYIB-glycoprotein contain five repeats of approximately the same size as encoded by BPG 7. If disulfide bonds are formed between the cysteines in each repeat, as found in a,B-glycoprotein, then this domain would consist of a series of loops (Fig. 4). Particularly conserved in these proteins is the sequence Asp-X-Gly-X-Tyr-X-Cys, found in seven of the eight repeats in BPG 7 (four times as Asp-Ser-Gly-Glu-Tyr), in four of the five repeats in N-CAM, and to some extent in all of the repeats in the alB-glycoprotein. There is a potential N-glycosylation site near this region in the N-CAM sequence, and the glycosylated sequence Asn-Tyr-Ser-Cys occurs twice in this region in the alB-glycoprotein. These sites occur at approximately the same position as the potential heparan sulfate attachment site in the BPG 7 domain, indicating that this region is accessible for glycosylation. The repeated domains of N-CAM and the alB-glycoprotein were found to be similar to the immunoglobulin supergene family, as is BPG 7.
N-CAM is involved in cell-cell interactions including neurite fasciculation and neuro-muscular junction formation (see Refs. 43 and 44 for review). It apparently acts through homophilic binding of the protein on apposing cell surfaces. It is also possible that there could be interaction between N-CAM and the proteoglycan core protein. N-CAM has been shown to interact with heparin (45) and with heparan sulfate proteoglycans (46). a-Helical portions in the N-CAM repeats were proposed to be heparin binding sequences by Barthels et al. (35). Interactions between N-CAM and heparan sulfate or the proteoglycan core protein could explain the association of heparan sulfate proteoglycans with neurite outgrowth (47)(48)(49) and neuro-muscular junction formation (50,51).
Comparison of these clones with other proteoglycans does not indicate any significant homologies. As the cysteine repeats of laminin and BPG 5 have partial similarity to the epidermal growth factor-like cysteine repeats, there is some Leu T h r V a l G l n P r o G l y G i n G l n A l a G l u P h e A r g m S e r A l a T h r G l y A s n P r o T h r P r o M e t Leu G l u T r p I l e 30 GGG  similarity with the epidermal growth factor-like portions of the large fibroblast chondroitin sulfate proteoglycan (14). BPG 7 was noted to contain a conserved Ser-Gly sequence, a potential side chain attachment site. The surrounding sequence Asp-Ser-Gly-Glu-Tyr does not conform to the recognition signal Ser-Gly-X-Gly proposed by Bourdon et al. (52).
The 100-plus Ser-Gly attachment points of the cartilage proteoglycan do not conform to this sequence, although cartilage Ser-Gly sequences are associated with acidic residues, which is consistent with the enhancement of in vitro xylose attachment by acidic residues observed by Bourdon et al. (52).
Tumors which produce basement membrane components in copious quantities have been used to isolate and characterize many basement membrane components, such as laminin (53)(54)(55)(56), collagen IV (57)(58)(59), and entactin (43,60), and to obtain cDNA clones for these components (33,34,61,62). The use of the mouse tumor cell line M2 in this study likewise facilitated the isolation of clones to the basement membrane proteoglycan. The complete inclusion of a clone obtained from EHS mRNA in a clone from the M2 cells, as well as occurrence of amino acid sequences from the EHS proteoglycan within clones from the M2 cells, indicates that the same proteoglycan is produced by both cell types. Several lines of evidence indicate that this proteoglycan or derivatives of this proteoglycan are present in authentic basement membranes. Northern analyses with these clones show an identical species of mRNA in RNA extracted from various mouse tissues. This is consistent with Southern analysis of total mouse genomic DNA using BPG 5, which suggests that there is a single gene for this molecule in the mouse. Furthermore, antibodies to the EHS proteoglycan stain all basement membranes (171, and the 400-kDa precursor or the 400-kDa core protein has been detected in a variety of basement membrane-producing cell types, including endothelial (63), muscle (64), and Schwann (65) cells, as well as intact glomeruli (66). Current evidence suggests that it is post-translational modifications of the core protein, as well as differences in the size and number of side chains, which account for the diversity of proteoglycan derived from this gene product (66).