Alternative Splicing of @-Galactosidase mRNA Generates the Classic Lysosomal Enzyme and a ,&Galactosidase-related Protein”

We have isolated two cDNAs encoding human lyso- somal &galactosidase, the enzyme deficient in Gml-gangliosidosis and Morquio B syndrome, and a &ga- lactosidase-related protein. In total RNA from normal fibroblasts a major mRNA of about 2.5 kilobases (kb) is recognized by cDNA probes. A minor transcript of about 2.0 kb is visible only in immunoselected polyso- mal RNA. A heterogeneous pattern of expression of the 2.5-kb @-galactosidase transcript is observed in fibro- blasts from different GM1-gangliosidosis patients. The nucleotide sequences of the two cDNAs are extensively colinear. However, the short cDNA misses two noncontiguous protein-encoding regions (1 and 2) present in the long cDNA. The exclusion of region 1 in the short molecule introduces a frameshift in its 3’-flanking se- quence, which is restored by the exclusion of region 2. These findings imply the existence of two mRNA tem- plates, which are read in a different frame only in the nucleotide stretch between regions 1 and 2. Sequence analysis of genomic exons of the &galactosidase gene shows that the short mRNA is generated by alternative splicing. The long and short cDNAs direct the synthesis in COS-1 cells of &galactosidase polypeptides of 85 and 68 kDa, respectively. Only the long protein 6-well plates 4 days before addition of con~tioned media. The uptake was carried for a further 3 days. Cells were harvested by trypsinization and homogenized by vortexing in double-distilled water. Enzyme activities were measured in cell homogenates using 4-meth-ylumbelliferyl substrates (7). Indirect Immunofluorescence-For light microscopy, COS-1 cells were transfected with pCDH@Ga constructs as above, but omitting the labeling step. Twelve hours before harvesting, transfected cells were reseeded at a low density on coverslips. Fixation and immuno- labeling were performed according to Ref. 33 using anti-8-galactosid-ase antibodies and goat anti-(rabbit XgG) conjugated with fluorescein in the second incubation step.

patients different clinical phenotypes have been described that are classified as severe infantile, juvenile or mild infantile, and adult forms with residual 8-galactosidase activity ranging from <1 to 15% of normal levels (reviewed in Refs.  1, 7).
The biosynthesis and processing of p-galactosidase have been studied in normal and mutant human fibrobl~ts. The enzyme is synthesized as an 85-kDa precursor, which is posttranslationally processed to the mature lysosomal form of 64 kDa (8). In cells of an infantile and an adult GMl-gangliosidosis patient, the precursor protein was found to be synthesized in a low amount, but no mature form could be detected (9). In a Morquio B cell strain, synthesis and processing of 8galactosidase proceed normally (9).
Lysosomal p-galactosidase has been purified to apparent homogeneity from various sources and species (reviewed in Ref. 2). In mammalian tissues (10,11) as well as in human cultured fibroblasts (12) the majority of the active enzyme is present in a high molecular weight aggregate, and only a small fraction of the enzyme is found as monomeric 64-kDa polypeptide. It has been demonstrated that the aforementioned aggregate includes other glycoproteins: the heterodimeric 32-20-kDa "protective protein" (8,(13)(14)(15) and, under certain experimental conditions, the lysosomal neuraminidase (16). It is likely that these three glycoproteins, p-galactosidaseneuraminidase-protective protein, form a specific complex within lysosomes since they copurify, by virtue of their association, and they influence each other's activity and stability (16,17). Recently, Oshima et al. (18) have published the sequence of the lysosomal p-galactosidase, deduced from its cDNA.
We report on the cloning, sequence, and expression of two distinct cDNAs encoding the classic lysosomal form of the enzyme and a 8-galactosidase-related protein with no enzymatic activity and a different subcellular localization. We provide evidence that the latter derives from alternatively spliced precursor mRNA.
Celt Culture-Human skin fibroblasts from normal individuals, four patients with GMl-gangliosidosis, and one obligated heterozygote were obtained from the European Human Cell Bank, Rotterdam (Dr.
Protein Sequence Analysis-Human placental @-galactosidase was purified together with neuraminidase and protective protein, as described previously (17). The different components were separated by SDS-PAGE, under reducing conditions according to Hasilik and Neufeld (19). The 64-kDa @-galactosidase band was digested in situ with tosylphenylalanyl chloromethyl ketone-treated trypsin (Worthington Diagnostic Systems Inc., United Kingdom). Tryptic peptides were fractionated by HPLC (Waters 6000 System) and sequenced by automated Edman degradation on an Applied Biosystems 470A gasphase peptide sequenator as describedpreviously (15). For N-terminal sequence analysis, approximately 50-100 pg of the purified complex was separated as above, and the protein components were blotted against Immobilon PVDV transfer membranes (Millipore Corp.). A filter piece containing the 64-kDa protein was excised and used as starting material for automated Edman degradation (20).
cDNA Library Screening-A human testis cDNA library in h g t l l (Clontech, Palo Alto, CA), consisting of 1 X lo6 independent clones with insert sizes ranging from 0.7 to 3.3 kb, was plated out at a density of 5 x lo4 plaque-forming units per 90-mm plate and screened with anti-@-galactosidase antibodies as described previously (21). Anti~y-positive clones were rescreened with oligonucleotide probes labeled at the 5' end with 32P using y3'P and polynucleotide kinase (22). The probes were synthesized on an Applied Biosystems 381A oligonucleotide synthesizer. Hybridization and washing conditions were as described (23). DNA Sequencing-HPGa(S) and H@Ga(L) cDNAs and their restriction fragments were subcloned into plasmid vectors pTZl8 and pTZ19. Nucleotide sequences on both strands were obtained by the dideoxy chain termination methods of Sanger et al. nucleotide were used. Sequence data were analyzed using the program Isotation and Sequencing of Genomic @ -G~c t o s~~e h Clones-A human EMBL-3 h library (kindly provided by Dr. G. Grosveld, Erasmus University, Rotterdam), derived from DNA of leukocytes of a chronic myeloid leukemia patient, was screened with the 5' 850-bp EcoRI fragment of cDNA clone H@Ga(L). The inserts of three overlapping X clones were subcloned into the plasmid vector pTZ18. Sequences of genomic exons were determined by the chain termination method on double-stranded DNA, using synthetic oligonucleotide primers derived from the @-galactosidase cDNA sequence.
RNA Iso~a~ion and Northern Blot ~ybr~dization-Total RNA was isolated from cultured fibroblasts as described (27). Polysomal mRNA, immunoselected using antibodies raised against purified placental complex, was obtained following the procedure of Myerowitz and Proia (28). RNA samples were electrophoresed on a 1% agarose gel containing 0.66 M formaldehyde as described (29) and blotted onto nylon membranes (Zeta-Probe). The filter was hybridized with the cDNA urobe labeled according to the procedure of Feinberg and Vogelstein i30).

I
P~l v m e r~e Chain Reaction-10-15 u p of total RNA and about 50 ng-of polysomal RNA were reverse transcribed into single-stranded cDNA using two antisense oligonucleotide primers and avian myeloblastosis virus reverse transcriptase. Subsequently, partial cDNAs were amplified in the presence of a third sense primer and Tap polymerase as described (31), using a programmable DNA incubator (BioExcellence). Amplified material was separated on 2% agarose gels and blotted onto Zeta-Probe membranes. Filters were hybridized using either type-specific oligonucleotide probes or a 90-bp PstI DNA fragment. These probes were labeled as mentioned above.
Transient Expression of 6-Galactosidase cDNAs in COS-I Cells-Subcloning of the two cDNAs into a derivative of the mammalian expression vector pCD-X and conditions of transfections of pCDH@Ga constructs to COS-1 cells were as described previously (15). Labeling with [35S]methionine was carried out in the presence or absence of NHaC1 (19). Radiolabeled cDNA-encoded p-galactosidase proteins were immunoprecipitated from cell extracts and medium concentrates according to the method of Proia et at. (32). Immunoprecipitated proteins were resolved on SDS-PAGE under reducing conditions. Radioactive bands were visualized by fluorography of gels impregnated with Amplify (Amersham Corp.). Apparent molecular weights were calculated with conventional marker proteins. @-Galactosidase activity in COS-1-transfected cells was measured with artificial 4-methylumbelliferyl substrate using standard assay conditions (7).
Uptake Studies in Human Cells-The preparation of conditioned media used in uptake studies and the experimental conditions were as reported (15). Human recipient cells were from an infantile GMlgangliosidosis patient (Fig. 5, patient 11). They were seeded on 6-well plates 4 days before addition of con~tioned media. The uptake was carried for a further 3 days. Cells were harvested by trypsinization and homogenized by vortexing in double-distilled water. Enzyme activities were measured in cell homogenates using 4-methylumbelliferyl substrates (7).
Indirect Immunofluorescence-For light microscopy, COS-1 cells were transfected with pCDH@Ga constructs as above, but omitting the labeling step. Twelve hours before harvesting, transfected cells were reseeded at a low density on coverslips. Fixation and immunolabeling were performed according to Ref. 33 using anti-8-galactosidase antibodies and goat anti-(rabbit XgG) conjugated with fluorescein in the second incubation step.

Partial Amino Acid Sequence and Isolation of Antibodies-
The @-galactosidase, neuraminidase, protective protein complex was purified from human placenta, and its components were separated by SDS-PAGE under reducing conditions. The 64-kDa @-galactosidase, electroeluted from the gel, was used to raise monospecific polyclonal antibodies in rabbit. This antibody preparation, tested in biosynthetic labeling experiments and Western blots, precipitates both mature and precursor forms of @-galactosidase (data not shown). In addition, a gel slice containing the 64-kDa protein was digested in situ with trypsin, and the resulting peptides were fractionated by reverse-phase HPLC. Five of the oligopeptides were subjected to automated Edman degradation, but only three of them gave an unambiguous amino acid sequence (Fig. L4). We also sequenced the N terminus of intact mature 64-kDa @-galactosidase. A stretch of 18 amino acid residues was obtained in this case (Fig. L4, N-ter).
Isolation and Characterization of cDNA Clones-One tryptic peptide sequence (T3) and the N-terminal sequence were used to synthesize two oligonucleotide probes complementa~ to the mRNA (Fig. 1B). Probe 1, a unique 45-mer, was constructed on the basis of codon usage frequencies in mammalian proteins, whereas probe 2, a 17-mer, was degenerated. A human testis X g t l l cDNA expression library was first screened with anti-@-galactosidase antibodies. Several recombinant clones were isolated and rescreened with both oligonucleotide probes. One clone, XHPGa39, with a total insert size of 1.7 kb, carried an internal EcoRI site which released, upon digestion with EcoRI, two fragments of 500 and 1200 bp, hybridizing with probe 2 and 1, respectively. These results supported the identity of the cDNA and defined its orientation. Partial nucleotide sequencing of this cDNA revealed the presence of a putative ATG translation start codon, but the absence of a polyadenylation signal. I n vitro translation of total RNA from cultured fibroblasts established a molecular mass of about 73 kDa for the non-glycosyia~d @-galactosidase preproform.' Therefore, XHPGa39 could not contain the entire coding region of @-galactosidase precursor. Rescreening of the library with this cDNA probe yielded a clone, XH@Ga(L), that consisted of a 5' EcoRI fragment of 850 bp and a 3' fragment of 1550 bp. Both cDNAs were subcloned into pTZ18 and pTZ19, subjected to restriction endonuclease analysis, and A

Sample
Aminoacid sequence 4 . sequenced using the dideoxy chain termination method (24). In Fig. 2 a compendium of the partial restriction maps of the two cDNAs is depicted together with the nucleotide sequencing strategy used. The complete sequences of HPGa39 and HPGa(L) are combined in Fig. 3.

T1 ~E A V A X X L Y D I L
A common ATG translation initiation codon is found at the 5' end of both cDNAs (Fig. 3, position 51). This ATG represents the beginning of an open reading frame for HPGa(L) of 2031 nucleotides, which is interrupted by three consecutive stop codons, and it is flanked at the 3' end by a 318-nucleotide untranslated region. A putative polyadenylation signal (AATAAA) is present at position 2379. The sequences of the two cDNAs from their internal EcoRI site toward the 3' end are identical, except that HPGa39 misses the last 412 nucleotides including 94 bp of coding sequence and the 3"untranslated region with the polyadenylation signal. Although there is no direct proof that the 3' ends of the mRNAs specifying the two cDNAs are the same, S1 nuclease protection analysis of this region did not reveal the presence of differentially spliced transcripts (data not shown). Therefore, it is likely that HPGa39 is a partial cDNA truncated at the 3' end. In contrast, a comparison of the 5' ends of the two clones revealed significant differences. The EcoRI fragment encompassing the 5' end of HPGa39 is 393 nucleotides shorter than the corresponding fragment of HPGa(L1. The missing sequences comprise two stretches (boxed in Fig. 31, one of 212 nucleotides, between positions 295 and 508, and one of 181 nucleotides, between positions 602 and 784 (referred to as regions 1 and 2, respectively). Sequences immediately flanking these regions are completely identical in the two clones. If translation starts at the common ATG initiation codon, the exclusion of region 1 causes a -1 frameshift mutation in the open reading frame of HPGa39 which is reverted by a +1 frameshift due to the exclusion of region 2. In order to obtain a full length cDNA bearing the short 5' end, we have substituted the 3' end EcoRI fragment of HPGa39 for the HPGa(L). The resulting cDNA construct, HPGa(S), has an open reading frame of 1638 nucleotides, which starts at the same ATG (position 51) and is interrupted by the same stop codon as the long cDNA. These surprising findings imply the existence of two @-galactosidase mRNA templates encoding proteins that are translated in different frames in the 95-nucleotide stretch between the two regions.
To verify whether these two mRNAs arise by alternative splicing, we have isolated genomic X clones spanning the area of interest. The entire sequence of the exons encoding nucleotides 296-784 in H@Ga(L) cDNA (Fig. 3) was determined. In Fig. 4 the exons involved are schematically shown together with their exonlintron boundaries. Region 1 in the long cDNA is encoded by two exons of 151 and 61 bp, respectively, and region 2 by one exon of 181 bp. A separate exon specifies the 95-bp sequence between these two regions. The exact mapping of the different exons within the gene has not been determined. These results confirm that the two @-galactosidase transcripts derive from alternative splicing of the precursor mRNA.
Predicted Primary Sequences of @-Galactosidase and P-Galactosidase-related Proteins-As shown in Fig. 3, the two cDNA clones encode polypeptides of 677 and 546 amino acids, respectively, which have the first 82 N-terminal residues in common. These are followed, in the predicted sequence of the long cDNA-encoded @-galactosidase, by two noncontiguous sequences (boxed in Fig. 3) of 71 amino acids (residues 83-153) and 61 amino acids (residues 185-245), which do not occur in the short protein, referred to as P-galactosidaserelated, because of splicing out of regions 1 and 2. Consequently, a unique stretch of 32 amino acids is found in the 8galactosidase-related protein (residues 83-114), which is different from the sequence between regions 1 and 2 (residues 154-185) in the long molecule.
All tryptic peptides as well as the N-terminal sequence of 64-kDa placental P-galactosidase are found in the amino acid sequence deduced from the cDNAs (Fig. 3, thick line). The only disagreement is at residue 1 of T1 where the experimentally determined residue is aspartic acid (Fig. M), whereas the amino acid predicted from the nucleotide sequence of the two cDNAs is threonine. Both cDNA-encoded proteins start with a putative signal peptide which is characterized by an I T C T~C C~A C G~~~   10  30  50 110

7'74 L Y M F I G G T N F A Y W N G A N S P Y A A Q P T S Y D Y D A P L S E A G D L T 3 4 4
990 1010 1030 1050 1070 T 3 8 4   1090  1110  1130  1150  1170  1190 1210 -1450    N-terminal region including a positively charged residue (Arg-7), a highly hydrophobic core, and a polar C-terminal domain. The most probable site for signal peptidase cleavage is Gly-23 (34). Seven potential N-linked glycosylation sites are present in the predicted primary sequence (Fig. 3, thin line). The glycosylation site a t position 26 is located immediately after the signal peptide, and it is followed by 18 amino acids (residues 29-46) that are colinear with the chemically determined N terminus of the purified placental enzyme. The predicted M, of unglycosylated &galactosidase and @-galactosidase-related protein, including the signal peptide, are 76.091 and 60.552, respectively. Their amino acid sequences were compared with other sequences present in the NBRF (release 19.0, December 31, 1988) and EMBL (release 18, February 1989) data base. No significant homology was found.

G C A C A G T A A C G T G C A T A C A T A T C T G C A~~G G M T G G~G~A A A G G T~A G T G A~A~G G A A G~T C
RNA Hybridization Studies-The H@Ga(L) cDNA insert was labeled by random priming and used to probe total and polysomal RNA isolated from cultured fibroblasts of normal individuals, four GM,-gangliosidosis patients, and one heterozygote. As shown in Fig. 5, an mRNA of about 2.5 kb is the major transcript detected in normal fibroblasts. The same hybridization pattern was obtained with total human testis RNA (data not shown). When immunoselected polysomal RNA is applied a faint minor band of about 2.0 kb becomes visible. It is clear that this 2.0-kb species is present in a much lower amount than the long mRNA. This difference in amount is also reflected by the amount of respective cDNA clones found in the library (1 uersw 12).
The 2.5-kb mRNA is also detected in total RNA from fibroblasts of the adult GMl-gangliosidosis patient (Fig. 5). However, the three infantile forms of the disease exhibit a very different expression pattern. In the first patient (I), a faint broad band is visible. In some gels this band can be resolved into two, one of which is slightly larger than 2.5 kb (data not shown). The mother of this patient displays a hybridizing band of normal size but somewhat less intense. There is no detectable P-galactosidase transcript in the second infantile patient (11), whereas in the third patient (111) the 2.5-kb mRNA is present in a much lower quantity than in controls. The Northern blot was rehybridized with a probe recognizing the glyceraldehyde-3-phosphate dehydrogenase mRNA (35). Signals of equivalent intensity corresponding to this 1.2-kb message were detected in all samples (data not shown). Taken together these results demonstrate that different mutations must be involved in apparently similar GM,gangliosidosis clinical phenotypes.
Detection of Two mRNA Transcripts by PCR Amplification-Since it is difficult to visualize the small mRNA mole- cule on Northern blots, we decided to use the polymerase chain reaction (PCR) to increase the detection level and to screen specific regions of P-galactosidase mRNA(s) for the presence or absence of regions 1 and 2. The strategy applied in these experiments is depicted in Fig. 6B. Three oligonucleotide primers were designed according to distinct complementary DNA sequences present in the two @-galactosidase clones (sequences are given in the legend to Fig. 6). Their positions, flanking or within regions 1 and 2, were chosen to direct the synthesis and amplification of cDNA fragments representative for the two different mRNA species. Total RNA from cultured fibroblasts and from human testis as well as polysomal mRNA from fibroblasts were reverse transcribed into single-stranded cDNA using either antisense primer 1 or 3. The polymerase chain reactions were subsequently performed by adding the sense primer 2. Escherichia coli tRNA was used in separate reactions as a negative control. Amplified material was separated on agarose gels and Southern-blotted. In order to unequivocally distinguish between amplified fragments originating from the short or the long mRNA, type-specific probes were used (Fig. 6B, cross-hatched bars). Two 20-mers were synthesized on the basis of sequences of the H@Ga(S) cDNA, which are colinear with the 10 nucleotides flanking each end of regions 1 and 2 of H@Ga(L) (sequences are given in the legend to Fig. 6). These 20-mers hybridize, under stringent conditions, only to the cDNA fragment derived from the short mRNA. On the other hand the cDNA fragment specifying the long mRNA is detected by a 90-bp PstI probe present in region 2. As shown in Fig. 6A, fragments of 169 and 498 bp, representing the short and the long mRNA, respectively, are amplified in all samples and are identical in the two tissues tested (lanes 1- 3 and 6-8). The identity of much fainter smaller bands present in lanes 6 and 8 is unknown. No hybridizing bands me visible in the tRNA lanes (lanes 4 and 5). It is noteworthy thkt the aforementioned cDNA fragments can also be amplified from polysomal RNA. This implies that the short transcript undergoes translation.

Transient Expression of @-Galactosidase cDNAs in COS-I
Cells-H@Ga(S) and HPGa(L) cDNAs were cloned in sense and antisense orientations into a derivative of the mammalian expression vector pCD-X and transfected separately to COS-1 cells. After 48 h, normal and transfected cells were incubated for an additional 16 h, with [%]methionine. In some instances the labeling step was done in presence of NH,Cl to induce maximal secretion of lysosomal protein precursors (19). Radiolabeled proteins from cells and media were immunoprecipitated with anti-@-galactosidase antibodies. The results are shown in Fig. 7 Labeled proteins from cells and media were immunoprecipitated with anti-8-galactosidase antibodies, analyzed on a 12% SDS-polyacrylamide gel, and visualized by fluorography. Molecular sizes were calculated by comparison with protein markers. Exposure time for lanes 1- 6 was 1 day and for lanes 7-9 was 1 week. sense construct (lanes 4 and 5). It appears, therefore, that the antibody preparation used in these experiments hardly recognizes COS-1 endogenous @-galactosidase, since untransfected cells also do not show any cross-reactive bands (lane 6). As seen in lanes 1 and 2, the cDNA-derived 85-kDa @galactosidase precursor is poorly processed into the mature 64-kDa form in transfected cells. This is due to the transfection procedure, as observed before (15). A &fold increase in @-galactosidase activity above the endogenous COS-1 values is measured only in cells transfected with the pCDH@Ga(L)sense construct (Table I). Using the same assay conditions, the @-galactosidase-related molecule is apparently not active.
We also tested whether the cDNA-encoded proteins were able to correct @-galactosidase activity in GM1-gangliosidosis cells. For this purpose,medium from COS-1 cells transfected with sense or antisense pCDH@Ga constructs as well as medium from mock-transfected cells were collected and concentrated. Aliquots of the different conditioned media were added to the culture medium of fibroblasts from an infantile G M~gangliosidosis patient (patient I1 in Fig. 5). After 2 days of uptake, activities were measured in cell homogenates using 5methylumbelliferyl substrate. As shown in Table 11, COS-1 cell-derived 85-kDa precursor taken up by GMl-gangliosidosis cells corrects @-galactosidase activity. In a similar uptake experiment carried out using radiolabeled secretions from COS-1-transfected cells, we could demonstrate that the 85-kDa precursor and the 68-kDa @-galactosidase-related protein were taken up by the mutant cells, but only the 85-kDa precursor was further processed to the mature 64-kDa form (data not shown).
In order to determine the intracellular distribution of the two proteins, indirect immunofluorescent staining was performed on transfected cells using anti-@-galactosidase antibodies and fluorescein-labeled second antibodies (Fig. 8). A typical lysosomal distribution as well as uniformly diffuse perinuclear labeling of @-galactosidase is observed in COS-1 cells transfected with the pCDH@Ga(L)-sense construct (Fig.  8A). However, a strong fluorescent labeling restricted to the perinuclear region is present in cells transfected with the short construct (Fig. 8B). Adjacent untransfected cells react poorly with the human antibodies. Taken together, these results demonstrate that the long and short cDNAs direct the  Mock-transfected 1.7 Not transfected 1.5 One milliunit of enzyme activity is defined as the activity that releases 1 nmol of 4-methylumbelliferone per min. synthesis of two proteins, one of which behaves as the classic lysosomal @-galactosidase, whereas the other is not enzymatically active at the pH value and substrate concentration used. This @-galactosidase-related protein also has a different subcellular localization.

DISCUSSION
We have isolated and characterized two distinct cDNA clones encoding human lysosomal @-galactosidase and a @galactosidase-related protein. In total RNA from normal human fibroblasts, a major mRNA of 2.5 kb is recognized by cDNA probes. A minor transcript of about 2.0 kb is detectable only in immunoselected polysomal RNA. The 2.5-kb @-galactosidase mRNA is also present in fibroblasts from the adult GMl-gangliosidosis patient, but it is either absent or reduced in amount in cells from three patients with the infantile form of the disease. The pattern of expression of this P-galactosidase mRNA in patients I and I1 is consistent with data from immunoprecipitation studies that established the absence of cross-reactive material for @-galactosidase in fibroblasts from these patients.' Apparently, other infantile GM1-gangliosidosis patients, not yet analyzed at the molecular level, do synthesize @-galactosidase precursor (36). The adult and the third infantile patient studied here were previously reported to synthesize a @-galactosidase precursor that did not get phosphorylated (37). This might still hold true for these two patients, but the assumption made by Hoogeveen et al. (37) that all GM1-gangliosidosis variants are phosphorylation mutants is not substantiated by the results presented here. Patients I and 11, for instance, may represent splicing and/or promoter mutants. Obviously, different or even the same clinical phenotypes are caused by distinct genetic lesions, and further studies are needed to define the clinical and biochemical heterogeneity observed in GMl-gangliosidosis patients.
The nucleotide sequences of the two cDNAs comprise open reading frames that begin at a common ATG translation initiation codon and terminate at the same stop codon. However, HOGa(L) is 393 bp longer than H@Ga(S). Its nucleotide sequence is colinear with the human placental @-galactosidase cDNA recently isolated by Oshima et al. (18). The only sequence differences we find are at nucleotide positions 79 (T instead of C), 650 (G instead of C), and 651-653 (CGC instead of GCG), resulting in the following amino acid changes: Leu-10 instead of Pro-10 and Arg-201 instead of Ala-201. These discrepancies may represent true allelic variations and/or mistakes introduced by cDNA cloning procedures. The sequence of the short cDNA is virtually identical to the former, but it misses two noncontiguous protein-encoding sequences, regions 1 and 2, present in the long clone. Furthermore, the exclusion of region 1 in this cDNA introduces a frameshift in its 3"flanking sequence which is subsequently restored by the exclusion of region 2. These unusual findings imply the existence of two distinct mRNA templates which, most remarka-  bly, are read in different frames only in the 95-nucleotide stretch between regions 1 and 2. To our knowledge this is the first example of such a configuration in a mammalian gene.
By sequencing genomic @-galactosidase clones, we could demonstrate that nucleotides 296-784 of the 2.5-kb mRNA, spanning regions 1 and 2 as well as their intermediate sequence, are encoded by four separate exons. As shown by the sequence of the exonlintron borders, all four exons obey the GT/AG rule (38). These results strongly indicate that the short mRNA is generated by a differential splicing process that involves three exons. An increasing number of genes are known to create protein diversity through the use of differential splicing (reviewed in Ref. 39). Among lysosomal proteins this phenomenon has been observed for human @-glucuronidase mRNA (40). The genomic data also rule out the possibility that the short cDNA is the product of a cloning artifact. The amount of the short mRNA, however, must be less than %o of the long one, if we consider the signal obtained on Northern blots. Therefore, the existence of the two @galactosidase transcripts was further proven by PCR amplification of partial cDNA fragments specifying the two mRNAs. The short transcript does not seem to be testisspecific, since it is also detected in fibroblast total and poly-soma1 mRNA, indicating that this transcript is actively translated in fibroblasts. It is not excluded, however, that the two mRNAs may be expressed in differential amounts in other tissues.
The open reading frames of the long and short @-galacto-sidase cDNAs code for 677 and 546 amino acids, respectively, with the first 23 residues in common representing a typical signal peptide (34). Both proteins carry seven potential Nlinked glycosylation sites at identical positions. One of them is located immediately after the signal sequence and precedes the N-terminal sequence of mature 64-kDa placental P-galactosidase. From its location we can infer that the substantial proteolytic processing of the 85-kDa @-galactosidase precursor observed in human fibroblasts (8) as well as in mouse kidney cells and macrophages (41,42) must occur nearly exclusively at the C terminus. The two cDNAs direct the synthesis in COS-1 cells of immunoprecipitable polypeptides, which are also recovered extracellularly. The molecular mass of the long protein, 85 kDa, is in agreement with the apparent size of @-galactosidase glycosylated precursor immunoprecipitated from human fibroblasts (8). The 68-kDa protein derived from the short cDNA is a form that was not detected previously. Whether or not this protein has a defined biological function is not known. Although both polypeptides are recognized by the antibodies, the @-galactosidase-related protein is not catalytically active under the assay conditions used. The same holds true for the short @-glucuronidase protein (40). Furthermore, even though both cDNA-encoded proteins, 85 and 68 kDa, are as efficiently endocytosed by GM1-gangliosidosis fibroblasts, only the 85-kDa precursor is further processed intracellularly and corrects P-galactosidase activity.
The subcellular localization of COS-1-derived P-galactosidase and @-galactosidase-related proteins is different. The long P-galactosidase has a clear lysosomal distribution, whereas the short molecule is found only in the perinuclear region. The latter is likely to reach the Golgi apparatus, since it is secreted into the extracellular space even without the addition of NH4C1. The differential subcellular distribution of the two proteins might explain their distinct catalytic behavior. Further studies are needed to define the function and substrate specificity of the @-galactosidase-related protein. It will be of interest to analyze the domains that are either missing or different in the two polypeptides.
This work together with our studies on the other components of the complex, the protective protein and neuraminidase, will enable us to gain more insight in the fine mechanisms of mutual cooperation between these lysosomal glycoproteins.