Human cr-IV-Acetylgalactosaminidase-Molecular Cloning, Nucleotide Sequence, and Expression of a Full-length cDNA HOMOLOGY WITH HUMAN a-GALACTOSIDASE A SUGGESTS EVOLUTION FROM A COMMON ANCESTRAL GENE*

the lysosomal glycohydrolase that cleaves a-N-acetylgalactosaminyl moieties from gly-coconjugates, is encoded by a gene localized to chro- mosome The deficient activity of a- GalNAc is the enzymatic defect in Schindler disease, neuroaxonal dystrophy.

corresponding to a-Gal A exons 1 through 6, while the comparable exon 7 sequence (pAGB-3 codons 320-411) had only 15.8% homology with numerous gaps. These findings implicate the genomic region at and surrounding codon 3 19 as a potential site for the abnormal processing of a-GalNAc transcripts as well as for a recombinational event in the evolution and divergence of a-Gal A and a-GalNAc. The availability of the full-length cDNA for human (Y-GalNAc will permit studies of the genomic organization and evolution of this lysosomal gene, as well as the characterization of the molecular lesions causing Schindler disease. In the early 197Os, several investigators demonstrated the existence of two a-galactosidase isozymes, designated A and B, which hydrolyzed the a-galactosidic linkages in 4-MU-l and/or p-NP-cr-D-galactopyranosides (l-7). In tissues, about 80-90% of total cu-galactosidase (a-Gal) activity was due to a thermolabile, myoinositol-inhibitable a-Gal A isozyme, while a relatively thermostable, a-Gal B, accounted for the remainder. The two "isozymes" were separable by electrophoresis, isoelectric focusing, and ion-exchange chromatography. After neuraminidase treatment, the electrophoretie migrations and p1 values of a-Gal A and B were very similar (l), initially suggesting that the two enzymes were the differentially glycosylated products of the same gene. The finding that the purified glycoprotein enzymes had similar physical properties including subunit molecular mass (-46 kDa), homodimeric structures, and amino acid compositions also indicated their structural relatedness (8)(9)(10)(11)(12)(13)(14). However, the subsequent demonstration that polyclonal antibodies against a-Gal A or B did not cross-react with the other enzyme (8, ll), that only U-Gal A activity was deficient in hemizygotes with Fabry disease (l-8), and that the genes for a-Gal A and B mapped to different chromosomes (7, 15), clearly demonstrated that these enzymes were genetically distinct. Thus, it was not surprising when a-Gal B was shown in 1977 to be an cY-N-acetylgalactosaminidase (a-GalNAc), a homodimeric ' (7).
Purified cu-GalNAc has reported native and subunit molecular masses of 90-117 kDa and 46-48 kDa, respectively (8 14). Kinetic studies demonstrated that the enzyme was inhibited by a-N-acetylgalactosamine (Ki -2.1 mM) and hydrolyzed synthetic substrates with either terminal a-hr-acetylgalactosaminide (K, -l-2 mM) or oc-D-galactoside moieties (Km -7-10 mM) (8)(9)(10)(11)(12)(13)(14). Biosynthetic studies performed with cultured fibroblasts indicated that the human enzyme was synthesized as a 65kDa glycosylated precursor which was processed to a mature 48-kDa lysosomal form; both the precursor and mature forms had high mannose type oligosaccharide chains, but only the precursor's mannose residues were phosphorylated (16). The deficient activity of a-GalNAc was demonstrated in two brothers with Schindler disease (17,18), a newly recognized form of infantile neuroaxonal dystrophy (18). The affected brothers excreted increased amounts of O-linked glycopeptides and oligosaccharides containing a-l\r-acetylgalactosaminyl moieties which were detectable in urinary screening profiles (17,19). Biochemical and immunologic studies revealed that neither Lu-GalNAc activity or enzyme protein was present in fibroblast lysates from the affected sibs (18). Thus, efforts were undertaken to isolate and express a full-length a-GalNAc cDNA in order to determine the nature of the molecular lesions in patients with Schindler disease and to characterize the genomic organization and expression of the human gene encoding this lysosomal hydrolase.
While expression studies of a hybrid a-GalNAc sequence were in progress, Tsuji et al. (20) reported the isolation of a human a-GalNAc cDNA. Unlike the full-length pAGB-3 o(-GalNAc cDNA sequence reported here, their clone, pcD-HS1204, contained a 70-bp insertion after pAGB-3 nt 957 which altered the reading frame for pAGB-3 residues 330-411 and resulted in a truncated polypeptide of only 358 residues. Although their predicted amino acid sequence did not include our tryptic peptide containing residues 335-344, the 70-bp insertion may have resulted from alternative splicing. Thus, efforts were directed to determine if such an alternatively spliced transcript occurs in man. In this paper, we report the isolation, nucleotide sequence, and transient expression of a full-length cDNA encoding a-GalNAc. Genomic sequencing did not reveal the presence of the putative 70-bp insertion, thereby affirming that the expressible pAGB-3 transcript is authentic.
In addition, remarkable homology between the predicted cu-GalNAc and a-Gal A amino acid sequences was identified, suggesting the evolutionary relatedness of the autosomal and X-linked genes encoding these lysosomal hydrolases.

Affinity
Purification, Microsequencing, and Antibody Production-Human lung tu-GalNAc was purified to homogeneity, polyclonal rabbit anti-human n-GalNAc antibodies were produced and purified, and cell supernatants were immunoblotted as described previously (18,21,22      calcium-phosphate precipitation (31). Cells were harvested at 24-h intervals after transfection and assayed for a-GalNAc activity as previously described (18). One unit of enzymatic activity is equal to that amount of enzyme required to hydrolyze 1 nmol of 4-MU-a-GalNAc/h. Protein concentrations were determined by the fluorescamine method (21).
Northern Hybridization and Cap Site Analyses-Total RNA was isolated from human lymphoblasts, fibroblasts, and placentae, and Northern hybridization was performed using the nick-translated pAGB-3 insert as probe (26). Alternatively, the pAGB-3 insert was subcloned into pGEM-4Z (Promega, Madison, WI), and radiolabeled cu-GalNAc riboprobe, rAGB-3, was generated using the Promega riboprobe system and used for Northern hybridization. For identification of the a-GalNAc cap site, two unique, overlapping 30-mer oligonucleotide primers were synthesized corresponding to regions 60 and 75 bp from the 5'-end of the pAGB-3 cDNA and end-labeled (26). Each primer (100 ng) was used to extent 10 pg of total placental RNA with the BRL cDNA Synthesis Kit (Bethesda Research Laboratories). First strand synthesis was terminated by phenol extraction and ethanol precipitation. The pellet was washed three times with 70% ethanol, resuspended in 6 ~1 of H20, and then mixed with 6 ~1 of loading dye (0.3% xylene cyanol, 0.3% bromphenol blue, 0.37% EDTA, pH 7.0, in formamide). The RNA/DNA heteroduplexes were denatured at 65 "C for 3 min, and an aliquot was electrophoresed on a standard 8 M urea, 8% polyacrylamide sequencing gel.

Constructionofp9lcu-GaL46/ol-GalNAc7and
Characterizationof the Hybrid Protein-A plasmid containing a-Gal A exons 1 through 6 from pcDAG-126 (32) was fused to the 3' region of pAGB-3 cu-GalNAc insert which corresponded in position to a-Gal A exon 7. The hybrid cDNA, designated pa-GalAG/cY-GalNAc7 was constructed with the sense and antisense primers indicated above using a PCR-based method (33) and sequenced. The pcu-GalAG/cu-GalNAc7 insert was subcloned into the expression vector, p91023(B), and the construct was transiently expressed in COS-1 cells as described above. The cy-Gal A expression construct, p91-cu-AGA (34), which contained the entire a-Gal A cDNA, also was transiently expressed. The a-Gal A and a-GalNAc enzymatic activities and enzyme proteins were detected with 4-MU substrates and by immunoblotting or immunoprecipitation with the respective polyclonal antibodies as described above. For immunoprecipitation studies, transfected COS-1 cells were radiolabeled with 100 &i of [3"S]methionine (Amersham Corp.) per 100 mM dish at 48-h post-transfection. The cells were harvested at 72 h after transfection and immunoprecipitated as described (26)  Extension-For PCR amplification of the putative alternatively spliced region, the 30-mer sense and antisense primers (described above) were used to amplify the 1) reverse-transcribed mRNA from various human sources, 2) cDNA inserts from clones pAGB-4 to 34, and 3) the gAGB-1 genomic sequence. DNAs from pAGB-4 to 34 cDNA clones and the gAGB-1 genomic clones were isolated as described (24, 26). cDNA was synthesized from 10 pg of lymphoblast, fibroblast, and placental total RNA or 2.5 pg of brain poly(A)' mRNA (Clontech, Palo Alto, CA) using the BRL cDNA Synthesis Kit. Bacteriophage DNA (-0.1 pg), reverse-transcribed mRNA (-0.1 pg) or genomic cosmid DNA (-1 pg) was PCR-amplified using 20 pM of each primer and the GeneAmp DNA Amplification Reagent Kit (Perkin-Elmer Cetus, Norwalk, CT). Each PCR cycle consisted of 1 min of denaturation at 94 "C, 2 min of annealing at 37 "C, and a 7-min extension at 60 "C. The PCR products were phenol extracted, ethanol precipitated, and resuspended in 20 ~1 of H20. An aliquot (2 ~1) of each PCR reaction was analyzed by electrophoresis on agarose gel using HindIII-digested X and Hoe111 digested 4X174 DNAs as size standards. For identification of potential stops during reverse transcription of the region surrounding the pcD-HS1204 insertion, a unique 32-mer, 5'-AGTAGTAAGCTTTCATATATCA CAGACCCGGT-3', was used to extend 10 pg of total placental RNA or 1 fig of rAGB-3 generated in vitro by the Promega riboprobe system as described above.

AND DISCUSSION
Purification and Characterization of Human cu-GalNAc-Human a-GalNAc was purified to homogeneity (specific activity = -370,000 units/mg protein) as assessed by the presence of only the 48-and 117-kDa species on NaDodSO,/ PAGE ( Fig. 1, inset). The 117-kDa species was not reduced by boiling or by dialysis against 8 M urea in the presence of /3-mercaptoethanol. The 27 microsequenced N-terminal residues of the electroeluted 117-kDa species were identical to those of the 48-kDa species. Further evidence that the 117-  Lane 1, mock transfection; lane 2, p91-a-Gal A transfection; lanes 3 and 4, two independent constructs of p-cuGalAG/cu-GalNAc7 transfections; lane 5, p91-AGB-3 transfection. Lanes 1-4 were immunoprecipitated with rabbit anti-human a-Gal A, and lane 5 was immunoprecipitated wLh rabbit anti-human ol-GalNAc.
kDa species was a homodimer of the 48-kDa glycoprotein subunit was the finding that the tryptic digests (and chymotryptic digests, not shown) of both species had essentially identical HPLC profiles (Fig. 1). Microsequencing of the N terminus and seven tryptic peptides from the 48-kDa species identified a total of 129-non-overlapping a-GalNAc residues. For library screenings, synthetic oligonucleotide mixtures (17to 26-mers) were constructed to contain all possible codons for selected amino acid sequences from the N terminus and three internal tryptic peptides (Figs. 1 and 2). Isolation, Characterization, and Expression of a Full-length cDNA--Screening of 2 x lo6 recombinants from the pcD human fibroblast cDNA library with a 26-mer oligonucleotide mixture of 576 species corresponding to internal peptide T-106A detected two putative positive clones. pAGB-1, which hybridized with all four oligonucleotide mixtures, had a 1.8kb insert with an open reading frame of 1242 bp, a 514-bp 3'untranslated region, and a poly(A) tract, but no apparent 5'untranslated sequence. Authenticity was established by colinearity of the pAGB-1 insert's predicted amino acid sequence with 129 microsequenced residues of the purified protein. In order to isolate a full-length cDNA, the 0.9-kb 5'-BamHI fragment from the pAGB-1 insert was radiolabeled and used to screen a human placental cDNA library. Of 32 putative positive clones (pAGB-3 to 34), pAGB-3 contained the longest insert and was sequenced in both orientations. As shown in Fig. 2, the 2158-bp pAGB-3 insert had a 344-bp 5'-untranslated region, a 1236-bp open reading frame which encoded 411 amino acids, a 514-bp 3'-untranslated region and a 64-bp poly(A) tract. An upstream, inframe ATG occurred at -192 nt, but there were inframe termination codons at -141, -135, and -120 nt, indicating that the -192 ATG was nonfunctional. A single consensus polyadenylation signal (AATAAA) and a consensus recognition sequence (CACTG) for the U4 small nuclear ribonucleoprotein (35) were located 16 and 65 bp from the poly(A) tract, respectively. In retrospect, the partial cDNA, pAGB-1 had the entire 1236-bp coding region as well as 6 bp of 5'-untranslated sequence.
Analysis of the deduced amino acid sequence of pAGB-3 indicated a signal peptide sequence of 17 residues since Leu-18 was the N-terminal residue of the microsequenced mature enzyme. When the weight-matrix method of von Heijne (36) was used to predict the peptidase cleavage site, the preferred site, between Ala-13 and Gln-14, had a score of 4.34, whereas cleavage after Met-17 had a score of 2.38. The predicted molecular mass of the 394 residue mature, unglycosylated enzyme subunit (Mr = 44,700) was consistent with that (48 kDa) estimated by NaDodSOJPAGE of the purified glyco-sylated enzyme. These findings suggest that the mature glycoprotein subunit had at least two N-linked oligosaccharide chains, although there were six putative N-glycosylation sites at Asn residues 124, 177, 201, 359, 385, and 391 (Fig. 2).
For transient expression, the pAGB-3 full-length cDNA insert was subcloned into the eukaryotic expression vector p91023(B) and the construct, p91-AGB-3, was transfected into COS-1 monkey kidney cells. Compared with the endogenous mean a-GalNAc activity in mock transfected COS-1 cells (35 units/mg, range: 23-50 units/mg, n = 6), the transfected cells had a mean activity of 600 units/mg (range, 104-2400 units/mg, n = 6) 72 h after transfection, or about 17 times the endogenous activity. The expressed human enzyme protein also was detected by immunoblot analysis using rabbit-anti-human a-GalNAc antibodies, whereas the endogenous monkey enzyme was variably visible as a faint band at -40 kDa or was not detectable (Fig. 3). The expressed human enzyme subunit had a molecular weight of -48 kDa, indicating that it was glycosylated.
Northern Hybridization and Cap Site Analyses-Northern hybridization analyses revealed two transcripts in total, cytoplasmic, or poly(A)+ RNA of about 2.2 and 3.6 kb, which were present in similar amounts (not shown). The cap site of the 2.2-kb transcript was determined to be at -347, or 3 nt beyond the 5'-end of the pAGB-3 cDNA insert by primer extension of total placental RNA using two overlapping oligonucleotide probes. The identification of the 3.6-kb transcript stimulated efforts to isolate the corresponding cDNA. Screening of a human retinal cDNA library with the radiolabeled 2.2-kb pAGB-3 cDNA resulted in the isolation of pAGB-35. This cDNA had a 3598-bp insert containing a coding region identical to that in pAGB-3 and an additional 125 and 1379 bp of 5'-and 3'-untranslated sequences, respectively (Fig. 2). The occurrence of a second polyadenylation signal (AATAAA) at nt 3100-3105 (Fig. 2) downstream from the signal at nt 1729-1734 in the pAGB-3 cDNA indicated that alternative polyadenylation was responsible for the generation of the larger transcript. The occurrence of the 3.6 kb transcript indicates that the a-GalNAc gene has at least another transcriptional start site upstream from the -347 cap site of the 2.2-kb transcript.
Sequence Homology between a-GalNAc with a-Gal A-Computer-assisted searches of nucleic acid and protein data bases revealed no significant amino acid sequence similarities between a-GalNAc and that of any other DNA or protein sequence except for human a-Gal A (32). Comparison of the nucleic acid and deduced amino acid sequences of the fulllength a-GalNAc and a-Gal A cDNAs revealed 55.8 and 46.9% overall homology, respectively. Since the intron/exon junctions and the entire genomic sequence encoding human a-Gal A have been determined (32, 36), it was possible to compare the cu-GalNAc amino acid sequence with that deduced from each of the seven a-Gal A exons (Fig. 4). Notably, there was remarkable identity (56.4%) between the a-GalNAc sequences corresponding to those of a-Gal A exons 1 through 6. For example, all 8 cysteine residues in a-GalNAc were present in the identical positions in a-Gal A. Of the 14 proline and 23 glycine residues in a-Gal A, 10 and 20 were conserved in identical positions in a-GalNAc, respectively. Furthermore, all four of the a-Gal A N-glycosylation sites were conserved in a-GalNAc Putative functional domains were suggested by shorter stretches of amino acid homology shared by LY-GalNAc, a-Gal A, yeast a-galactosidase (Mel I) (38) and/or Escherichia coli a-galactosidase (Mel A) (39) in a-Gal A exons 1 through 6. In contrast, there was little, if any, similarity in the predicted a-GalNAc carboxyl-terminal amino acid se-  after residue 319 which corresponded to a-Gal A exon 7 (15.8% homology with numerous gaps). In addition, there were no significant similarities for the cDNAs encoding other human lysosomal polypeptides, with the exception of a short ru-GalNAc sequence (residues 365-371) in which six out of seven amino acids were identical to residues 194-200 in the ,8-hexosaminidase a-chain, a lysosomal polypeptide with Nacetylgalactosaminidase specificity (40). These findings suggested that a cDNA construct containing a-Gal A exons l-6 joined to a-GalNAc exon 7 might express a hybrid protein with both a-Gal A and a-GalNAc activities. Therefore, a hybrid cDNA containing a-Gal A exons 1 through 6 (nt -6O-1029) and a-GalNAc exon 7 (nt 958-1814) was constructed and expressed in COS-1 cells; however, neither immunoreactive protein by immunoblot analysis nor enzymatic activity for either a-Gal A or cr-GalNAc was detected. Therefore, more sensitive immunoprecipitation experiments were undertaken to determine if the hybrid protein was expressed by the fusion construct. Transient expression was performed in the pres-ence of [?S]methionine, and the radiolabeled proteins were immunoprecipitated with rabbit-anti-human a-Gal A and 01-GalNAc antibodies. As shown in Fig. 5, the a-Gal A and the a-Gal A/cu-GalNAc hybrid glycopeptides were detected with molecular masses of about 48 and 50 kDa, respectively. The anti-cy-GalNAc antibodies did not detect the hybrid glycopeptide (not shown), although endogenous a-GalNAc was observed. The slightly higher molecular weight of the hybrid protein may reflect the presence of an additional oligosaccharide moiety since cr-GalNAc exon 7 contains two N-glycosylation consensus sequences, while the deleted a-Gal A exon 7 sequence has one consensus sequence (Asn-Pro-Thr) which presumably is not functional due to the proline. The fact that the fusion protein was synthesized, but did not hydrolyze either a-Gal A or oc-GalNAc fluorogenic substrates, could be due to the alteration of the active sites for both enzymes which may be in this region, or that the addition of an oligosaccharide chain altered the conformation of this region such that the subunits could not dimerize, or that the forma-Full-length cDNA Encoding Human a-N-Acetylgalactosaminidae tion of an enzyme-substrate complex was prevented by steric hinderance. Future mutagenesis studies of this region in (Y-Gal A and a-GalNAc may provide further information concerning these possibilities.
The finding of extensive homology between cu-GalNAc and a-Gal A suggested that they evolved by duplication and divergence of an ancestral sequence for a-Gal A exons 1 through 6. Although there is little, if any, homology among the other lysosomal amino acid sequences (i.e. no "lysosomal domains"), there are notable examples of lysosomal enzyme subunits, pseudogenes, or gene families which presumably evolved by duplication and divergence (e.g. [40][41][42]. Future comparison of the cr-GalNAc and a-Gal A intron/exon boundaries should provide further information on the evolution of these lysoso-ma1 genes which encode structurally related, but functionally specific glycobydrolases. Primer Extension 2nd PCR and Sequence Analyses of cDNA and Genomic Sequences-During the course of these studies, Tsuji et al. (20) reported a similar human a-GalNAc cDNA sequence which differed from pAGB-3 by a 70-bp insertion after nt 957 (Fig. 6A) and by several substitutions (nt 493, 494,524,614, and 667). The 70-bp insertion consisted of three inverted repeats (nt 919-926, 919-936, and 919-944) and a direct repeat (nt 940-957) from the pAGB-3 coding sequence nt 919-957. Analysis of the pAGB-3 cDNA sequence from nt 760-1053 using an RNA-folding program (29) predicted a stem and loop structure from nt 918 to 937 (Fig. 6A) which could stall or stop reverse transcription of the a-GalNAc mRNA during cDNA synthesis. To determine if this secondary structure could cause cDNA synthesis errors in library construction, a 32-mer oligonucleotide primer was used to extend total placental RNA and Lu-GalNAc transcripts generated in vitro with the riboprobe construct, rAGB-3. Stops of varying intensity were observed from nt 903 and 1009, including two weak stops at the 3' base (nt 940) and 5'-end (nt 921) of the stem and loop structure (Fig. 6A). However, there were no strong stops in this region. Although the actual mechanism is unknown, these findings were consistent with the 70-bp insertion resulting from a complex abnormality involving an RNA-DNA duplex in cDNA library construction (43). Another possibility would be an insertion due to a complex strand-switching event involving DNA polymerase I (45).
Alternatively, this 70-bp insertion may have resulted from alternative splicing, although the insertion predicts a truncated a-GalNAc polypeptide of 358 residues. To investigate the possible occurrence of a-GalNAc transcripts with a 70-bp insertion after pAGB-3 nt 957, PCR was used to amplify this region in 1) reverse-transcribed mRNA from various sources, 2) the cDNA inserts from clones pAGB-4 to 34, and 3) the gAGB-1 genomic clone. If the cDNA inserts or reverse-transcribed RNAs contained the 70-bp insert, a 290-bp PCR product would be observed, whereas the absence of the insert would result in a 220-bp PCR product.
Only the 220-bp product was observed in PCR-amplified reverse-transcribed total RNA from human lymphoblasts, fibroblasts, and placenta, or in poly(A)+ mRNA from brain (not shown). Thus, these analyses did not detect longer or shorter transcripts. All of the pAGB-4 through 34 cDNA inserts had only the 220-bp PCR product with the exception of pAGB-13, which had an inframe 45-bp deletion after pAGB-3 nt 957 (i.e. deleted nt 958-993). A short direct repeat (ACAAG) was present at both breakpoint junctions.
Notably, the deletion occurred at the identical 5' site of the 70-bp insertion in pcD-HS1204 (14) (Fig. 6A). Subsequent sequencing of the region including pAGB-3 codons 254-351 in the genomic clone, gAGB-1, revealed a 204%bp sequence containing a 1754-bp intron between pAGB-3 nt 957 and 958. The intronic sequence had no homology with a-Gal A intron 6, contained two Alu-repetitive sequences in reverse orientation and did not have the 70-bp insertion in either orientation (Fig. 6B). It was remarkable that both the pAGB-13 deletion and the pcD-HS1204 insertion occurred at the 5' donor splice site of this intron (i.e. after exonic nt 957). Perhaps the location of the consensus lariat branch point sequences in the intron far upstream (94 and 199 bp) from the 3' splice site may impair splicing (44). This concept is supported by the pAGB-13 deletion in which the more closely positioned cryptic lariat branch point and 3' splice site were used. Thus, this intron or surrounding region may have a unique sequence and/or secondary structure that impairs the fidelity of hnRNA processing. Since the intron/ exon junction after coding nt 957 also is the site of divergence between the a-Gal A and a-GalNAc sequences, this region also may be mechanistically important in the evolution of human a-GalNAc.
In conclusion, the availability of an authentic full-length cDNA encoding human a-GalNAc should permit the characterization of the structure/function and evolutionary relationships of a-GalNAc and a-Gal A as well as the identification of the molecular lesions that cause Schindler disease.