Primary Structure of Gal@1,3(4)GlcNAc a2,3-Sialyltransferase Determined by Mass Spectrometry Sequence Analysis and Molecular Cloning EVIDENCE FOR A PROTEIN MOTIF IN THE SIALYLTRANSFERASE GENE FAMILY*

The Gal@l,3(4)GlcNAc a2,3-sialyltransferase forms the NeuAca2,3Galb1,3(4)GlcNAc sequences found in terminal carbohydrate groups of glycoproteins and glycolipids. High energy collision-induced dissociation analysis of tryptic peptides from only 300 pmol of the purified Gal@l,3(4)GlcNAc a2,3-sialyltransferase provided 25% of the total amino acid sequence and led to the successful cloning of this enzyme. The peptide se- quence information was used to design short degenerate primers for use in the polymerase chain reaction. A long specific cDNA fragment was amplified which was used to isolate a clone from a rat liver cDNA library. The cloned cDNA encodes a 374-amino acid protein containing an amino-terminal signal-anchor sequence characteristic of all cloned glycosyltransferases and produced sialyltransferase activity when transiently expressed in COS-1 cells. pGIR-199 in a fusion of the sialyltransferase in-frame to the insulin signal sequence present in the pGIR vector (46). The resulting fusion protein was inserted into the XbaI-SmaI sites of the expression vector pSVL to yield the expression plasmid pBD122. Expression of the Soluble Form of the Sialyltransferase and Assaying Enzyme Actiuity-For transient expression in COS-1 cells, the expression plasmid pBD122 (20 pg) was transfected into COS-1 cells on 100-mm plates using lipofectin as suggested by the manufacturer (Bethesda Research Laboratories). After 48 h, the cell culture media was collected and concentrated by ultrofiltration using a Centricon 10 (Amicon). The concentrated media was assayed for sialyltransfer- ase activity using oligosaccharides as acceptor substrates. Transfer of sialic acid to the oligosaccharide was monitored using ion-exchange chromatography (27,47). described (48). Samples of total RNA and a size standard of RNA Northern Analysis-Total RNA from rat tissues was prepared as ladder (BRL) were electrophoresed in a 1.5% agarose gel containing formaldehyde. Northern blot and hybridization were performed as reported (47). The cDNA insert of clone ST3N-1 subcloned in Blue- script was gel-purified, radiolabeled (1 X 10gcpm/mg), and used as a probe.

The Gal@l,3(4)GlcNAc a2,3-sialyltransferase forms the NeuAca2,3Galb1,3(4)GlcNAc sequences found in terminal carbohydrate groups of glycoproteins and glycolipids. High energy collision-induced dissociation analysis of tryptic peptides from only 300 pmol of the purified Gal@l,3(4)GlcNAc a2,3-sialyltransferase provided 25% of the total amino acid sequence and led to the successful cloning of this enzyme. The peptide sequence information was used to design short degenerate primers for use in the polymerase chain reaction. A long specific cDNA fragment was amplified which was used to isolate a clone from a rat liver cDNA library. The cloned cDNA encodes a 374-amino acid protein containing an amino-terminal signal-anchor sequence characteristic of all cloned glycosyltransferases and produced sialyltransferase activity when transiently expressed in COS-1 cells. When compared with two other cloned sialyltransferases, the primary structure of Galbl,3(4)GlcNAc a2,3-~ialyltransferase revealed a homologous region in all three enzymes consisting of a stretch of 55 amino acids located in their catalytic domains. This feature together with lack of homology in the remaining 85% of the sequence of the three sialyltransferases defines a pattern of sequence homology not found in cloned cDNAs of other glycosyltransferase families. ~~~~~~ The sialyltransferase family consists of 10-12 enzymes which transfer sialic acid from CMP-sialic acid to terminal positions on the oligosaccharide chains of glycoproteins and glycolipids (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11). Sialic acids are key determinants of many carbohydrate structures involved in biological recognition events, such as binding of influenza virus to host cells during * This work was supported in part by United States Public Health Service Grant GM27904. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted

M97754.
to the GenBankTM/EMBL Data Bank with accession number infection (12), clearance of asialoglycoproteins from circulation (13), cell-cell adhesion during the development of the nervous system mediated by N-CAM (14, 151, and adhesion of leukocytes to endothelial cells mediated by selectins (16)(17)(18)(19). Although the sialyltransferases use a common donor substrate, they exhibit specificity for the sequence of their oligosaccharide acceptor substrate and the anomeric linkage formed between the sialic acid and the sugar to which it is attached. To date cDNAs encoding a Galpl,4GlcNAc a2,6sialyltransferase (20,21) and a Galpl,3GalNAc a2,3-sialyltransferase (22) have been cloned. These two enzymes were found to share a region of homology consisting of 45 amino acids with 65% identity (22). In this report, we describe the cloning, sequencing, and expression of a rat Galpl,3(4)GlcNAc a2,3-sialyltransferase which forms the NeuAca2,3Galpl,3GlcNAc and NeuAca2,3Galfil,4GlcNAc sequences typically found to terminate complex type N-linked oligosaccharide chains. These sequences are also found in 0-linked sugar chains (23)(24)(25) and in glycolipids (26). The Galpl,3(4)GlcNAc a2,3-sialyltransferase was first purified 800,000-fold from rat liver nearly 10 years ago by Weinstein et al. (27); however, the purification yielded only small amounts of protein (10 pg/kg tissue). Although several attempts were made to obtain amino acid sequence information or to raise an antibody against the enzyme using conventional methods, they failed because of the small amounts of dilute protein that could be purified. As an alternative, mass spectrometry was considered since it plays an increasingly important role in the rapid structural elucidation of biologically important macromolecules (for reviews see Refs. 28 and 29). The development of new ionization methods and instrumentation has revolutionized the accessible mass range and operational sensitivity. High performance tandem mass spectrometry is now established as a powerful sensitive technique for protein sequencing (30)(31)(32) and is the method of choice for determining post-translational (33-35) and chemical modifications (36, 37). Thus, using only 300 pmol (12 pg) of the purified Galpl,3(4)GlcNAc a2,3-sialyltransferase, peptide sequence information accounting for 25% of the total protein sequence was obtained. This was sufficient information to clone the cDNA of the enzyme using PCR' and established cloning methods. The obtained sequences also The abbreviations used are: PCR, polymerase chain reaction; HPLC, high performance liquid chromatography; LSIMS, liquid secondary ion mass spectrometry; CID, collision-induced dissociation; kb, kilobase pair(s). 21011 helped to establish the correctness of the final cDNA-deduced protein sequence.
The reduction was carried out at 60 "C, under argon, for 1.5 h. Sodium iodoacetate (1.32 mg) was added in 2.5 pl of 0.2 M Tris.HC1 buffer to the enzyme mixture and the alkylation was carried out at room temperature, under argon, in the dark, for 1.5 h.
Dialysis-The resulting reduced and carboxymethylated Galpl,3(4)GlcNAc a2,3-sialyltransferase was dialyzed against 4 liters of 50 mM N-ethylmorpholine acetate buffer, pH 8.1, using a Bethesda Research Laboratories (BRL) Microdialysis System with BRL Prepared Dialysis Membrane having a molecular mass cutoff of 12-14 kDa. 10% SDS was added to the dialysis wells, bringing SDS concentration to approximately 0.1%. The contents of the wells were pooled and dried using a SpeedVac Concentrator (Savant). To remove the SDS and the remainder of Triton CF-54, acetone precipitation was carried out (39).
Solvent B was 0.08% trifluoroacetic acid in 70% acetonitrile, 30% water. The system operated at a flow rate of 50 pl/min. 10 min after the injection, the percentage of solvent B was increased from 0 to 50% over 90 min, then up to 100% in 30 min. Peptides were detected using an AB1 783A absorbance detector, set at 215 nm. Some of the fractions were esterified using a HCl/n-hexanol mixture (40).
Mass Spectrometry-Liquid secondary ion mass spectrometry (LSIMS) experiments were carried out using a Kratos MS 50s double focusing mass spectrometer, employing a cesium ion LSIMS source (41) and a 23-kG magnet. Approximately one-fifth of each collected fraction was used to measure the molecular weight of peptides eluting in each fraction respectively. One microliter of a glycerol/thioglycerol, 1:1, mixture acidified with 1% trifluoroacetic acid was used as the liquid matrix for the LSIMS experiments. The most abundant molecular ions observed were chosen for subsequent high energy collision-induced dissociation (CID) analysis. These experiments were performed on a Kratos Concept IIHH four sector mass spectrometer, equipped with an electro-optical multichannel array detector, which can record sequential 4% segments of the mass range simultaneously under computer control. The collision energy was set at 4 keV, using helium as the collision gas. The helium gas pressure was adjusted to attenuate the abundance of the selected ' ' C isobar of the molecular precursor ion to 30% of its initial value (42). The remainder of each sample was loaded, 1 p1 of the above mentioned liquid matrix was added, and the CID spectrum then recorded. The high energy CID spectra were interpreted as reported elsewhere (43).
PCR Amplification of a Sialyltransferase cDNA-Based on the amino acid sequences of eleven of the 14 peptides derived from the Galpl,3(4)GlcNAc a2,3-sialyltransferase, 22 degenerate oligonucleotides of both sense and antisense strands were synthesized (Genosys). Initial PCR experiments were designed based on the observation that peptide T-27 and peptide T-7B were homologous to a region located near the center of two previously cloned sialyltransferases (see "Results", Fig. 4), the Galp1,lGlcNAc a2,6-sialyltransferase (20) and the Galpl,3GalNAc a2,3-sialyltransferase (22). Two groups of PCR experiments were performed using either a sense primer to peptide T-27 or an antisense primer to peptide T-7B paired with oligonucleotide primers to the other peptides and first strand cDNA synthesized from rat liver total RNA as a template. Beginning with a template melting step (5 min at 94 'C), the amplification was carried out, using GeneAmp'" DNA amplification reagent kit with AmpliTaq'" DNA polymerase (Perkin-Elmer Cetus), by cycling 35 times, 1 min at 94 "C, 1 min at 37 "C, and 2 min at 72 "C, and ended with a final extension step (15 min at 72 "C). Several cDNA fragments were generated from these PCR reactions. Assuming that peptides T-27 and T-7B represented a continuous stretch of amino acids, additional sets of PCR experiments were carried out utilizing a nested primer strategy (44) in order to identify specific cDNA fragments. Using this approach a specific cDNA fragment, T-27 sense-T-43 antisense (T-27s-T-43as), was identified. The sequences of primers T-27 sense and T-43 antisense are 5'-GGAAGCTTATYGAYGAYTAYGAYATYGT-3' and The T-27s-T-43as cDNA fragment was subcloned into Bluescript plasmid (Stratagene) and sequenced using universal primers (Stratagene) and Sequenase Version 2.0 kit (United States Biochemical Corp.). Cloning of the Sialyltransferase-A cDNA library was constructed from rat liver poly(A)+ RNA using a cDNA synthesis kit from Pharmacia LKB Biotechnology Inc. (45). Oligo(dT)-primed cDNA was synthesized and ligated to EcoRI-Not1 linkers. cDNAs were then ligated into EcoRI-cleaved X g t l O DNA (Promega). After in vitro packaging with a DNA packaging extract (Stratagene), phage were plated out on host strain Escherichia coli C600 hfl-(Promega). Approximately 1 million plaques were screened with the T-27s-T-43as cDNA probe (45). Two positive clones (ST3N-1 and ST3N-2) were plaque-purified and subcloned into Bluescript plasmid vector (Stratagene) for sequencing.
Construction of a Soluble Form of the Sialyltransferase in Expression Vector-In order to produce a soluble form of the sialyltransferase for enzymatic characterization, a fusion protein containing the catalytic domain of the enzyme and the insulin cleavable signal sequence was constructed in the mammalian expression vector pSVL (Pharmacia). Specifically, the catalytic domain of the sialyltransferase was amplified by PCR using a 5' primer at the position +182 (Fig. 5), downstream of the transmembrane domain, with a BamHI site and a 3' primer located in 3"untranslated region upstream of the polyadenylation site with a EcoRI site. PCR reactions were carried out as described above with annealing temperature at 55 "C. The PCR 5"CCGGATCCTTRAANCCRGCNAGWACRAA-3' (R = A + G, product was subcloned into BamHI-EcoRI sites of pGIR-199 (a gift of Dr. K. Drickamer, Columbia University), resulting in a fusion of the sialyltransferase in-frame to the insulin signal sequence present in the pGIR vector (46). The resulting fusion protein was inserted into the XbaI-SmaI sites of the expression vector pSVL to yield the expression plasmid pBD122.

Expression of the Soluble Form of the Sialyltransferase and Assaying Enzyme Actiuity-For transient expression in COS-1 cells, the expression plasmid pBD122 (20 pg) was transfected into COS-1 cells on 100-mm plates using lipofectin as suggested by the manufacturer (Bethesda Research Laboratories)
. After 48 h, the cell culture media was collected and concentrated by ultrofiltration using a Centricon 10 (Amicon). The concentrated media was assayed for sialyltransferase activity using oligosaccharides as acceptor substrates. Transfer of sialic acid to the oligosaccharide was monitored using ion-exchange chromatography (27,47). described (48). Samples of total RNA and a size standard of RNA Northern Analysis-Total RNA from rat tissues was prepared as ladder (BRL) were electrophoresed in a 1.5% agarose gel containing formaldehyde. Northern blot and hybridization were performed as reported (47). The cDNA insert of clone ST3N-1 subcloned in Bluescript was gel-purified, radiolabeled (1 X 10gcpm/mg), and used as a probe.

RESULTS
Purification and NH2-terminal Sequencing-The GalP1, 3(4)GlcNAc a2,3-sialyltransferase was purified by using a modification of the previously published method (27). Analysis of the purified sialyltransferase on SDS-polyacrylamide gel electrophoresis revealed a single band with an apparent molecular mass of 48 kDa. NHz-terminal sequencing gave the Xaa-Gln-Thr-Leu-Gly-(Glu or His or Thr)-Glu-Tyr-Asp-Arg-(Val or Leu)-Gly-sequence. A second amino acid sequence, which could be deduced from secondary peaks in the cycles of the Edman degradation, gave evidence for the presence of peptide species with two additional amino acids at the NHz terminus.
Mass Spectrometry-Approximately 12 pg of protein estimated by amino acid analysis was used for the remaining sequence determinations described in this paper. The Galal,3(4)GlcNAc a2,3-sialyltransferase was isolated in a buffer necessary to preserve the enzymatic activity. The buffer contained glycerol, NaC1, sodium cacodylate, and Triton CF-54, which are undesirable substances due to their expected interferences with chromatographic separation and mass spectrometric detection. Because of difficulties anticipated in dealing with this buffer mixture, preliminary experiments were carried out using authentic GalB1,4GlcNAc a2,6-sialyltransferase as a model to work out suitable conditions for the reduction, alkylation, and enzymatic digestion, as well as for the removal of all the undesired chemicals from the enzyme solution. Following a protocol established from these preliminary experiments, the GalPlb(4)GlcNAc a2,3-sialyltransferase enzyme was reduced, carboxymethylated, and dialyzed against N-ethylmorpholine acetate solution. SDS was added after dialysis to the dialysis wells to prevent protein losses due to precipitation or adhesion to the membrane. The SDS and any remaining Triton CF-54 was removed by acetone precipitation (39). The protein pellet was redissolved in a buffer containing urea and digested with trypsin for 18 h. The resulting peptides were separated by microbore reverse phase HPLC, 60 fractions were collected manually for analysis by LSIMS and high performance tandem mass spectrometry (Fig. 1).
Prior to mass spectrometry analysis, the early eluting, relatively hydrophilic, fractions from HPLC were esterified to increase their peptide's hydrophobicity, thus improving their sputtering efficiency (40). LSIMS analysis of each fraction revealed multiple molecular ions, indicating the presence of more than one peptide per fraction. The most abundant 30 molecular ions (out of a total of approximately 100) were chosen for high energy CID analysis. In several cases two molecular ions were analyzed in the same fraction. In these experiments, the "C isotope peak of the pseudomolecular ion was selected in the first mass spectrometer. Only the fragments of this vibronically activated species resulting from the dissociation induced by collision with helium in the collision cell, situated between the two mass spectrometers, are detected and recorded by the second mass spectrometer. High energy CID processes occur both along the peptide backbone and within the side chain of certain amino acids, yielding information related directly to amino acid sequence and amino acid identity. Bond cleavages along the peptide chain generate series of ions which differ by amino acid residue weights, thus the corresponding amino acid sequence can be deduced. Additional high energy modes of fragmentation provide also information about the individual amino acid composition of any given peptide (immonium ions (49)). Side chain fragmentation, permitting differentiation between the isobaric Leu-Ile amino acid pair (50), can be observed usually when there is a basic amino acid, i.e. Arg, Lys, or His, in the sequence (51). Preferential protonation of the basic amino acid residues directs the sequence ion series observed. Thus the presence of basic residues at or near the NHZ terminus results in the formation of a predominantly NHz-terminal sequence ions. Similarly, peptides containing basic amino acids at or close to the COOH terminus will exhibit primarily COOH-terminal fragment ions. Thus trypsin has an advantage for an initial digestion since COOH-terminal sequence analysis of the tryptic peptides can be readily carried out. Interpretation of high energy CID spectra yielded 14 sequences (See Table I and Fig. 2). In some cases it was not possible to distinguish between isobaric amino acids from the CID spectra (See Table I). Therefore, to differentiate between leucine and isoleucine at the NH2-terminal and the penultimate positions in peptide T-27, we relied on homology with the Galpl,4GlcNAc a2,6-~ialyltransferase. Thus, we proposed an NH2-terminal isoleucine and a penultimate leucine. The cDNA sequence was consistent with an NHz-terminal isoleucine; however, it showed an isoleucine where we proposed a leucine residue.
The analysis of the CID spectra revealed the presence of modified peptides from side reactions which occurred during the tryptic digestion. These tryptic peptides were found to be carbamylated at their NH2 termini, probably due to the long incubation time. Although such side reactions would block Edman degradation, such NHz-terminal modifications can even be useful in mass spectrometric sequencing in confirming some of the sequences deduced, in particular cases provided information which permitted differentiation between NHzterminal leucine and isoleucine (Fig. 3). In addition, two peptides resulting from trypsin autolysis were identified at m/z 659.3 and 1153.6.
Isolation of cDNA Clones-The Dayhoff Protein Database was used to screen the peptide sequences for homology with known proteins. This search provided evidence of homology between the GalPl,3(4)GlcNAc a2,3-sialyltransferase and other cloned sialyltransferases. From this analysis, peptide T-27 (Table I) was found to be homologous to sequences present in both the rat and human P-galactoside a-2,6-sialyltransferases (these two enzymes are 88% conserved, Refs. 20 and 21). When this analysis was extended to include the sequence of the as yet unpublished porcine Galpl,3GalNAc a2,3-sialyltransferase (229, an additional peptide, peptide T-7B (Table I), was found to be homologous to sequences in both the cloned sialyltransferases. The alignment of these Sixty fractions were collected and subjected to mass spectrometric analysis. Fractions labeled yielded sequences by CID analysis (Table I) or the eluting peptides were later identified by molecular weight based on the cDNA-deduced amino acid sequence (Table 11). CID spectrum of peptide T-22B from the Galal, 3(4)GlcNAc a2,3-sialyltransferase. The peptide sequence is Leu-Thr-Pro-Ala-Leu-Asp-Ser-Leu-His-Cys*-Arg, MHf = 1283.6. Cys* represents carboxymethylcysteine. Ions with charge retention at the NH, terminus are labeled as a, b, c ions, and the COOH-terminal analogs are designated as x , y, and z (55). The a and x ions are products of a cleavage between the a carbon and the carbonyl group. Ions y and b are formed when the peptide bond is cleaved. Ions c and z are present due to the cleavage between the amino group and the a carbon. The numbering of these fragments is always initiated at the appropriate peptide terminus. The side chain fragmentation occurs between the p and y carbons of the amino acids, yielding the socalled d (NH,-terminal) and w (COOH-terminal) ions. COOH-terminal ui ions are formed via a,@ bond cleavage from a number of ions, for example, from yi ions. The immonium ions and the side chain losses from the protonated molecular ion are indicated by the oneletter code of the corresponding amino acids. Internal fragment ions are labeled similarly. The nominal masses of all sequence ions observed are shown together with the sequence deduced above the spectrum.
peptides with the sequences present in the previously cloned sialyltransferases suggested that these two peptides represented a continuous stretch of amino acids that had been cleaved at the arginine residue during the trypsin digestion (Fig. 4).
The finding that peptides T-27 and T-7B exhibited homology to a sequence previously identified as a homologous region in the center of two other cloned sialyltransferases provided the basis for our cloning strategy. We assumed that peptides T-27 and T-7B might be near the center of the protein, thus PCR experiments were designed to generate a long cDNA probe. Based on the amino acid sequences of the 14 sialyltransferase peptides, degenerate oligonucleotide primers of both sense and antisense were synthesized for use in PCR experiments. In these experiments, primer T-27 sense and T-7B antisense were paired with other primers in an attempt to amplify long cDNA fragments of the Galpl,3(4)GlcNAc T-27 and T-7B were then used in a nested primer strategy (44) to identify specific cDNA fragments. The fragment amplified using the primers T-27 sense and T-43 antisense was nearly the same size as the fragment amplified using the T-7B sense and T-43 antisense primers, suggesting that the fragment produced was the result of specific annealing by the primers and not an artifact.
Cloning and sequencing of the T-27 sense-T-43 antisense fragment found that peptides T-27 and T-7B are indeed continuous. Comparison of the sequence of the cDNA fragment with the two cloned sialyltransferases (20,22) showed that the homology extends from peptide T-7B and continues for 18 amino acids. This homology strongly suggested that the T-27s-T-43as cDNA fragment was amplified from a sialyltransferase mRNA. The sequence also verified that the cDNA fragment was not derived from the mRNA of the Galp1,4GlcNAc a2,6-sialyltransferase which is abundant in rat liver (27).
The T-27 sense-T-43 antisense fragment was used to screen an oligo(dT)-primed rat liver cDNA library from which two positive clones were obtained from 1 million plaques screened. Characterization of the positive clones revealed that clone ST3N-1 contained a 2.1-kb insert, whereas clone ST3N-2 was 1.5 kb in length. Northern analysis indicated that the Galpl,3(4)GlcNAc a2,3-sialyltransferase mRNA was 2.5 kb (see below), suggesting that clone ST3N-1 was near full length.  (Table I). In addition, 14 other peptides were identified by molecular weight in the tryptic digest of the purified enzyme (Table 11). This confirms that the cDNA of clone ST3N-1 is indeed that of the sialyltransferase. As observed for other cloned glycosyltransferases (38), the Galp1,3(4)GlcNAc a2,3-sialyltransferase is predicted to have a short NH2-terminal cytoplasmic tail, a signal anchor sequence approximately of 20 residues, and a large COOHterminal region that comprises the catalytic domain of the enzyme. The NH2-terminal sequence of the purified enzyme was found to represent amino acids 49-60 overlapping with peptide T-32 (Table I). Peptide T-13 with MH+ at m/z 1139.6 corresponds to amino acids 49-58 (Table 11). This shows that the Galp1,3(4)GlcNAc a2,3-sialyltransferase is also isolated as a proteolytic fragment lacking the membrane-spanning TABLE I Amino acid sequences of peptides derived from the Gal(31,3(4)GlcNAc a2,3-.sialyltransferase Peptides sequenced from the tryptic digest of the enzyme are listed in the order they eluted. The numbers correspond to the HPLC fractions in which they were detected. If more than one peptide was present in a single HPLC fraction, the peptides are labeled according to their increasing molecular weight. The underlined sequences show homology to the other known sialyltransferase enzymes (25,26). Amino acids in italics represent cases where high energy CID analysis did not unambigously differentiate between isobaric amino acids. C* is carboxymethylcysteine.

A Protein Motif in the Sialyltransferases Gene Family-
Comparison of the primary structures of the three cloned sialyltransferases revealed a region of extensive homology (Fig. 6). This region consists of 55 amino acids from residue 156 to residue 210 of the GalPl,3(4)GlcNAc a2,3-sialyltransferase with 42% of the amino acids identical and 58% of the amino acids conserved between all three enzymes. The sequences of all three sialyltransferases have no significant homology outside this region. Since this region of homology is located near the center of the catalytic domain of the sialyltransferases, it likely represents a conserved structure necessary for their enzymatic activity. Enzymatic Characterization of the Siulyltransferase-As a final demonstration that clone ST3N-1 encoded a functional Galpl,3(4)GlcNAc a2,3-~ialyltransferase, we sought to express the clone in COS-1 cells. Amino acid sequence of clone ST3N-1 revealed that the protein contains an amino-terminal signal-anchor sequence which is predicted to anchor the enzyme to the Golgi apparatus in the cell (38, 52, 53). Furthermore, the purified enzyme had full catalytic activity without the signal anchor. To facilitate functional analysis of the enzyme, it was desirable to produce a soluble form of the enzyme which when expressed would be secreted from the cell. A fusion protein was constructed using the cleavable insulin signal sequence to replace the signal-anchor sequence at the amino terminus of the sialyltransferase as described under "Experimental Procedures." When the expression plasmid pBD122 was expressed in COS-1 cells, the enzyme was secreted from the cells and exhibited sialyltransferase activity.
The enzymatic properties of the rat liver GalPl,3(4)GlcNAc cy2,3-~ialyltransferase were previously found to utilize P-galactoside acceptors containing either the Gal@1,3GlcNAc or the GalPl,4GlcNAc sequences as acceptor substrates forming the NeuAc~2,3Gal~l,3GlcNAc and NeuAccu2,3GalPl,4Glc-NAc sequences often found to terminate complex type Nlinked oligosaccharides (6). As shown in Table 111, the soluble recombinant enzyme secreted from cells transfected with the expression plasmid pBD122 was capable of utilizing P-galactoside acceptors containing either the GalP1,BGlcNAc or the GalPl,4GlcNAc sequences; as a control cells transfected with the parental vector secreted no such sialyltransferase activity.  CTC TGC CTC TTT CTG GTC CTG  GGA TTT TTG TAT TAT  T C I GCC TGG AAG CTA 90 1 M e t G l y Leu Leu V a l Phe Val Arg [A.sn L e u L e u L e u Ala L e u Cys Leu Phe L e u Val Leu G l y Phe Leu Tyr Tyr S e r Ala T r p l L y s Leu 30  'Peptide T-50 resulted from a chymotryptic cleavage at Tyr274, peptide 27011e-Tyr274 eluted in fraction 16 and was sequenced by CID analysis ( Table I).
The secreted enzyme was also capable of sialylating asialo-al acid glycoprotein. These data are consistent with the enzymatic properties of the purified Galpl,3(4)GlcNAc a2,3-sialyltransferase.
The precise structure of the 2.0-kb mRNA presently is not clear. However, the 2.0-kb mRNA is very likely produced from the GalS1,3(4)GlcNAc a2,3-~ialyltransferase gene by alternative splicing, alternative promoter, or alternative polyadenylation site utilization mechanism as observed for the multiple transcripts of the Galp1,4GlcNAc cw2,6-sialyltransferase gene (55).
amino-terminal signal-anchor sequence and a large carboxylterminal catalytic domain oriented to the lumen of the Golgi apparatus (38). Comparisons of the sequences within each family have revealed variable homology. For instance, no significant primary sequence similarity was found between the two cloned galactosyltransferases, the Galpl,4GlcNAc a1,3-galactosyltransferase (73) and the GlcNAc P1,Cgalactosyltransferase (72), or between the two fucosyltransferases, the p-galactoside a1,2-fucosyltransferase (74) and the 8-galactoside al,3/1,4-fucosyltransferase (75). On the other hand, recent molecular cloning efforts have allowed the isolation of two additional al,3-fucosyltransferase genes or cDNAs (76)(77)(78). Comparisons of the primary sequences of these two a1,3fucosyltransferases with that of the al,3/1,4-fucosyltransferase revealed substantial structural similarity, 57-91% identity within the catalytic domain of the enzymes. Homology has even been demonstrated for glycosyltransferases representing different families, as found in the allelic genes which determine the A, B, and 0 blood groups. Indeed, the Fuca1,ZGal al,3-N-acetylgalactosaminyltransferase (Histo-blood group A transferase) and the Fuca1,BGal al,3-galactosyltransferase (Histo-blood group B transferase) differ in only 4 amino acids resulted from a few single-base substitutions (79,80).
The cDNAs of three members of the sialyltransferase family have been cloned to date, including the Galpl,3(4)GlcNAc a2,3-~ialyltransferase reported here. As shown in Fig. 6, comparison of the deduced amino acid sequences has revealed a pattern of homology not observed among the other glycosyltransferase families. The region of homology is restricted to a 55-amino acid stretch in the center of the molecule representing only 15% of the total sequence. Within this region, 23 amino acids are identical for all three enzymes (40%), and identity between any two of the three sequences ranges from 45 to 56%. In contrast, outside this conserved region little or no homology can be detected between any of the three sialyltransferase sequences. The conserved region is not found in any other proteins reported in the GeneBank databases. Thus, it appears that this conserved protein motif is unique to the sialyltransferase gene family.
Protein motifs of proteins with related functions are often involved in catalysis and ligand binding (56)(57)(58). The three cloned sialyltransferases catalyze the transfer of sialic acid from CMP-sialic acid in a2,3 or a2,6 linkage to terminal galactose of an acceptor carbohydrate to form the following sequences.

NeuAca2,3Ga1@1,3(4)GlcNAc-(ST3N)
NeuAca2,3Gal@1,3GalNAc- NeuAca2,6Gal@1,4GlcNAc- Thus, the protein motif shared by all three enzymes could in principle contain either the CMP-sialic acid binding site or the galactose acceptor binding site or both. There are several observations about the protein motif that are consistent with this hypothesis. First, more than 50% of the amino acid residues conserved in all three enzymes are either charged or polar amino acids as would be expected for a binding pocket at the protein surface. Second, 6 of the charged residues in this region are identical in all three sialyltransferases. It is therefore likely that they are part of a common structure feature such as a binding site for CMP-sialic acid or galactoside acceptor rather than simply contributing to surface charge distribution.
As an alternative, a structure element which is a common feature of a family of molecules could also be responsible for the formation of a polypeptide framework, such as the carbohydrate recognition domain of the calcium-dependent car-bohydrate-binding proteins known as C-type animal lectins (59). Just recently, the structure of the C-type lectin domain from a rat mannose-binding protein was determined (60) which reveals an unusual folding pattern consisting of nonregular secondary structure. The structure suggests the conserved residues in the C-type carbohydrate recognition domains are responsible for the folding pattern which would be common for these domains.
Protein motifs often are characteristic for families of proteins that have diverged from a common ancestor. Among the three cloned sialyltransferases the @-galactoside a2,6-sialyltransferase gene is the only gene which has been characterized (61,62). The a2,6-sialyltransferase gene consists of six exons. Exon 1 contains only the 5"untranslated sequence. The coding sequence is contained in exons 2-6 (61,62). The conserved region observed in the three sialyltransferases is related to precisely entire exon 3 and part of exon 2 in the @-galactoside a2,6-~ialyltransferase gene. This correlation can be considered to support the Gilbert and Blake hypothesis (63,64) that exons correspond to units of discrete protein structure and/ or function and that proteins have evolved by the combination of domains carrying particular functions. One such example is the Na+,K+-ATPase a111 gene which consists of 23 exons.
The nucleotide-binding domain of the Na+,K+-ATPase a subunit is encoded by six whole exons and by the beginning of the seventh exon (65)(66)(67).
Protein motifs are often used to identify other members in the same gene family (68-71). Using oligonucleotides corresponding to a consensus sequence encoding the DNA-binding domain of steroid and thyroid hormone receptor superfamily as probes has led to the isolation of cDNAs of this superfamily (68-70). Similarly, homeobox-containing genes have been isolated by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from the homeodomain (71). In principle, other sialyltransferases could be cloned by using an oligonucleotide complementary to the most highly conserved group of amino acids as a probe or alternatively using primers complementary to the highly conserved groups of amino acids in the conserved region to amplify a specific cDNA fragment unique to previously identified members of this family.