Organization of the @-Galactoside &,6=Sialyltransferase Gene EVIDENCE FOR THE TRANSCRIPTIONAL REGULATION OF TERMINAL GLYCOSYLATION*

Little is currently known about the mechanisms by which the cellular glycosylation machinery is regulated to produce cell type-specific glycosylation sequences on glycoprotein and glycolipid sugar chains. Previously, we have shown that one enzyme involved in terminal glycosylation, beta-galactoside alpha 2,6-sialyltransferase, is expressed in a tissue-specific fashion, with the highest enzyme activity as well as mRNA levels being found in the liver. In addition, the liver mRNA was found to be 4.3 kilobases (kb) in size as compared to a larger message of 4.7 kb in other tissues. To understand the cellular regulation of expression of this sialyltransferase, we have cloned the rat gene encoding the 4.3-kb liver mRNA and found that it spans 40 kb of genomic DNA and contains 6 exons. The gene was found to be very similar in size and exon organization to the murine beta 1,4-galactosyltransferase gene, even though this enzyme has no sequence homology to alpha 2,6-sialyltransferase. The promoter responsible for the production of the liver alpha 2,6-sialyltransferase mRNA is approximately 50-fold more active in a hepatoma cell line known to express this enzyme (HepG2) than in a cell line shown not to express this enzyme (Chinese hamster ovary) and contains consensus binding sites for the liver restricted transcription factors HNF-1 and DBP as well as the transcription factors AP-1 and AP-2. These observations are in accord with the restricted expression of the 4.3-kb mRNA, and provides evidence for the cellular regulation of glycosylation at the level of transcription.

Little is currently known about the mechanisms by which the cellular glycosylation machinery is regulated to produce cell type-specific glycosylation sequences on glycoprotein and glycolipid sugar chains. Previously, we have shown that one enzyme involved in terminal glycosylation, &galactoside a2,6-sialyltransferase, is expressed in a tissue-specific fashion, with the highest enzyme activity as well as mRNA levels being found in the liver.
In addition, the liver mRNA was found to be 4.3 kilobases (kb) in size as compared to a larger message of 4.7 kb in other tissues. To understand the cellular regulation of expression of this sialyltransferase, we have cloned the rat gene encoding the 4.3-kb liver mRNA and found that it spans 40 kb of genomic DNA and contains 6 exons. The gene was found to be very similar in size and exon organization to the murine B1,4-galactosyltransferase gene, even though this enzyme has no sequence homology to a2,6+ialyltransferase.
The promoter responsible for the production of the liver a2,6+ialyltransferase mRNA is approximately 50-fold more active in a hepatoma cell line known to express this enzyme (HepG2) than in a cell line shown not to express this enzyme (Chinese hamster ovary) and contains consensus binding sites for the liver restricted transcription factors HNF-1 and DBP as well as the transcription factors AP-1 and AP-2. These observations are in accord with the restricted expression of the 4.3-kb mRNA, and provides evidence for the cellular regulation of glycosylation at the level of transcription.
Sugar chains of glycoproteins and glycolipids comprise an enormous diversity of structures which are increasingly appreciated for their roles in biological recognition (l-3). Much of this diversity stems from the terminal glycosylation sequences which are, known to be expressed in a cell typespecific manner (1,4,5) and are regulated in their expression during development and oncogenic transformation (3,(6)(7)(8)(9). These structures mediate a variety of biological processes including protein targeting (10, ll), cell-cell recognition and adhesion (2,3,12,13), and signaling events in development (14,15). * This work was supported by National Institutes of Health Grant GM-27904. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC. Section 1734 solely to indicate this fact. Although there is a large body of structural evidence to suggest that the glycosylation machinery is regulated to produce cell type-specific glycosylation sequences (1,2), little is known about the possible mechanisms of such regulation. Because glycosylation is a post-translational modification of glycoproteins, it is generally believed that the terminal glycosylation sequences produced by a cell are determined by the specificity of glycosyltransferases which are present in the Golgi apparatus of the cell. Indeed, this would account for differences in terminal sequences of N-linked carbohydrate groups of both natural and recombinant glycoproteins produced in different cell types (1,4,5). Further support for this view comes from the demonstration that the terminal glycosylation sequences of a cell can be altered by transfection with an expression vector containing the cDNA of a glycosyltransferase which elaborates a terminal sequence not normally produced by the cell (16)(17)(18)(19).
Such observations suggest that cell type-specific glycosylation may be due to differential expression of the loo-150 different glycosyltransferases estimated to be required to elaborate all known carbohydrate structures produced by mammalian cells. Until recently, however, the lack of cloned glycosyltransferase genes has delayed the study of the control of cell type-specific expression of glycosyltransferases. Now that several glycosyltransferase cDNAs are available, this limitation should give way to rapid progress (20).
Previously, using direct enzyme assays of tissue homogenates of the rat, we provided evidence that three sialyltransferases involved in terminal glycosylation are expressed in a tissue-specific fashion, independent of one another (21). Of these three enzymes, one had been cloned and was amenable to further molecular analysis, the ,&galactoside a2,6-sialyltransferase (EC 2.4.99.1) which elaborates the terminal NeuAccu2,6Gal@1,4GlcNAc sequence on N-linked sugar chains (21,22).
For the /3-galactoside cY2,6-sialyltransferase, the level of enzyme activity was found to be highest in liver, with lo-loofold lower activities in other tissues such as ovary, brain, and heart. Northern analysis revealed two mRNAti of 4.3 and 4.7 kb' which were highly homologous along their entire length. The levels of these two mRNAs correlated with enzymatic activity, suggesting that cellular regulation of P-galactoside cu2,6-sialyltransferase activity occurs at the level of transcription. Interestingly, the abundant 4.3-kb mRNA was only found in liver. In contrast, the 4.7-kb mRNA was expressed in lesser amounts (lo-loo-fold less) in six tissues examined. A third mRNA of 3.6 kb was also detected with the pgalactoside Lu2,6+ialyltransferase cDNA probe, but it was 20864 Organization of the a2,6-Sialyltransferase Gene expressed only in kidney, and did not hybridize to probes corresponding to the 5' half of the coding sequence. Thus, it was concluded that it did not code for the P-galactoside (~2,6sialyltransferase (21,23). To begin to understand the basis for the tissue-specific expression of this gene, we have cloned and analyzed the rat P-galactoside cu2,6-sialyltransferase gene corresponding to the liver-specific 4.3-kb message. Here we present the organization of the gene, spanning 40 kb, and describe a preliminary analysis of the promoter region responsible for the production of this transcript.
In keeping with the observed tissue-restricted production of the 4.3-kb message, it is of interest that the promoter is 50-fold more active in a cell line known to express this sialyltransferase (HepG2) than in one shown previously not to express it (CHO) (4,5,16,24). Moreover, the sequence of the promoter region contains consensus binding sites for the liver restricted transcription factors HNF-1 and DBP (25,26 (22)) was kinased at its 5' end using [-r-32P]ATP and polynucleotide kinase. This oligonucleotide was then hybridized to 20 pg of liver total RNA in a total volume of 5 ~1 for 14 h at 50 "C as described (29 Exons are labeled El-E6 and EcoRI restriction sites are shown as hash marks. Below is shown the relative position and overlap of the three cosmid clones isolated.

Genomic Analysis and Mapping of the cY2,6-Sialyltransferase
Gene-In order to investigate the transcriptional regulation of P-galactoside a2,6-sialyltransferase, it was necessary to isolate the genomic sequences containing this gene. A rat genomic library in the cosmid vector pWE15 was constructed and screened using /3-galactoside a2,6-sialyltransferase cDNA as described above. Three independent clones (7C2,8Cl, and lOC1) were isolated from a library of 1 x lo6 recombinants. These clones were restriction mapped and found to overlap as shown in Fig. 1. Fortuitously, the combination of lOC1 and 8Cl were found to span the entire P-galactoside cu2,6-sialyltransferase gene, containing all of the exons corresponding to the 4.3-kb /3-galactoside cu2,6-sialyltransferase cDNA. The @-galactoside cr2,6-sialyltransferase cDNA was found to be divided into 6 exons, ranging in size from 97 bp to 2.8 kb and spanning 40 kb of genomic DNA (Fig. 1). The locations of the P-galactoside cr2,6-sialyltransferase exons were found using Southern blotting of cosmid clone DNA. Sequencing was done to determine the exact size of the exons as reported in Table I 2-4, respectively).
In both panels, lane 1 is the "T" lane of a sequencing ladder of PST-Luc used for product size determination (generated using the radiolabeled primer extension probe) and arrows show the position of the 69-and 70-bp fragments. Panel C shows the genomic map near exon 1 and the origin of the primer extension and Sl primers.
as reported in Table II. Sequences adjacent to the exon junctions have at most two mismatches with the consensus splice site sequence reported by Breathnach and Chambon (34).
Identifying the Transcriptional Start Site-The site of transcription initiation in the liver was determined by both primer extension and Sl analysis as shown in Fig. 2A. For the primer extension experiment shown in lane 2, a 25-bp primer with the sequence complementary to the p-galactoside a2,6-sialyltransferase mRNA from -148 to -123 (22) was synthesized, kinased, hybridized, and extended with reverse transcriptase. The autoradiograph of the resolving gel reveals two major bands at 69 and 70 bp and three other bands at 61, 60, and 48 bp. The 69-and 70-bp bands are consistent with a transcription start site corresponding to a 12-13-bp extension of the cloned cDNA of the 4.3-kb message (22). The three lower bands are most likely due to incomplete extensions by the reverse transcriptase, but it is possible that they represent alternative start sites of this promoter. Parallel experiments carried out with 20 pg of yeast tRNA showed no such bands ( Fig. 2A, lane 3).
Confirmation of the start site of transcription was obtained by Sl analysis, performed as shown schematically in Fig. 2C. A lOO-bp oligonucleotide was synthesized starting at the same point in the P-galactoside cY2,6-sialyltransferase gene as the primer extension probe and extending 80 bp into the 5'flanking region. This probe also has 20 bp of a random, noncomplementary sequence added to its 3' end in order to monitor the digestion by Sl nuclease during the experiment. After kinasing, the oligonucleotide was hybridized, digested, Organization of the a2,6-Sialyltransferae Gene   GATTnATCRC  GTGTTCCTTC  CCMXTGGTT  TTGTAATGCC  "BP   TTCTAGAATA  TTCTTATCTG  GCnTGAATTC  TTCTCTAGGC   GTTCATAGAT  CT-T  TC&TGTGTGT  TTGGCCTTTA  and resolved to reveal both the 69-and 70-bp fragments obtained by the primer extension experiment. As a third method to map the transcriptional start site, we used the "PCR RACE" method (28) to amplify the 5' end of rat liver 4.3-kb mRNA. A single product was identified and sequenced using Taq polymerase. The sequence obtained was identical to that previously reported (22) with the addition of 12-15 bp extending the 5' end of the mRNA and having a sequence identical to the genomic sequence of exon 1 (Fig. 3). Thus, all three methods map the transcription start site to the same place (Fig. 3, arrow).
Sl analysis also revealed a ladder of fragments 61-64 bp in size. These fragments correspond to a consensus splice acceptor site 9 bp downstream of the transcriptional start site and would be consistent with an mRNA species whose sequence diverges at this point (Fig. 3, star). Sl analysis was also carried out using total RNA from spleen and ovary as shown in Fig.  223. The results indicate the complete absence of the 69-and 70-bp protected fragments and show an RNA species protecting the 61-64-bp fragments found with liver total RNA. Although this mRNA species could be due to an alternative start site of transcription, it most likely represents splicing of an upstream exon into this site. Indeed, preliminary results indicate that the 4.7-kb message seen previously in spleen, ovary, kidney, and other tissues is also produced in small amounts in liver and contains a 5' extension joined at this splice site.' Taken together, the results of the transcriptional start site mapping, in combination with previously published Northern analysis (21), suggests that transcription initiation for the 4.3-kb mRNA is tissue restricted and corresponds to a transcription start site extending the previously published cDNA sequence by 12-13 bp (Fig. 3).
Consensus Binding Sites for HNF-1 and DBP Are Found in the fi-Galactoside c~2,6-Sialyltransferase Promoter-The sequence of the 5'-flanking region of the ,&galactoside c&6sialyltransferase gene is shown in Fig. 3. The start site of transcription is marked by an arrow in the figure. Analysis of the sequence of the promoter region reveals only a weak homology to a TATA box (34) located 30 bp 5' of the transcriptional start site. In addition, there are consensus binding sites for the transcription factors HNF-1, DBP, AP-1, and AP-2 (25,26,35,36) as shown in Table III  strated to confer liver specificity to the albumin gene (26,37). Demonstration of Promoter Actiuity-To test the region 5' of exon 1 for promoter activity, 2.6 kb of the 5'-flanking sequences were subcloned into the Hind111 site of the luciferase vector pXP1 (30). This construct (PST-Luc) contained 100 bp of 5'-noncoding sequence of exon 1 as well as the predicted transcription start site and all the identified transcription factor binding sites. It was transiently transfected into HepG2 cells, which have been shown to produce pgalactoside cu2,6sialyltransferase mRNA, and CHO cells, which are thought not to express this gene because they lack (u2,6 N-linked sialic acid on their cell surface glycoproteins and do not have detectable levels of /I-galactoside a2,6-sialyltransferase mRNA (4,5,16,24). Cell lysates were harvested 48 h later and assayed for luciferase activity. As controls, the pXP1 vector alone and the pSV2aLA5' vector containing the SV40 promoter (33) were also used to transiently transfect these cell lines. The results are shown in Fig. 4 with luciferase activity being reported in relative light units (RLU). While Organization of the cu2,643ialyltransferase Gene the activity of this promoter is low relative to the SV40 promoter, it promotes transcription 250-fold over background in HepG2 cells. Most importantly, the promoter is approximately 50-fold more active in a hepatoma cell line (HepG2) known to express this enzyme than in a fibroblast cell line (CHO) known not to express it (4,5,16,24). These results demonstrate that the promoter region reported here is not equally active in all cell types. Notably, the activity observed in HepG2 cells provides evidence for transcriptional regulation as the basis for the observed liver restricted expression of the 4.3-kb p-galactoside cY2,6sialyltransferase mRNA.

DISCUSSION
Elucidating the mechanisms by which cells regulate the expression of terminal glycosylation sequences is of fimdamental importance to understanding the roles of carbohydrate groups as information containing molecules in cell-cell interactions (2,3,(12)(13)(14)(15). As an initial step in this direction we have investigated the transcriptional regulation of the pgalactoside cY2,6-sialyltransferase which elaborates the terminal NeuAccu2,6Gal/31,4GlcNAc sequence on N-linked sugar chains of many serum glycoproteins (39). Because this enzyme was previously shown to exhibit tissue-specific expression (21), the analysis of its transcriptional regulation offered the potential to gain general insights into the larger question of the regulation cell type-specific glycosylation.
Accordingly, we have cloned the /3-galactoside a2,6+ialyltransferase gene to determine its organization, and have identified the promoter responsible for the restricted expression of an abundant 4.3kb mRNA.
To date, the intron/exon structure of only one other glycosyltransferase gene has been determined, that of UDPgalactose: N-acetylglucosamine /31,4-galactosyltransferase, which by coincidence synthesizes the Gal@l,4GlcNAc acceptor sequence of the /3-galactoside a2,6-sialyltransferase (40). A comparison of the exon structure of the P-galactoside (u2,6sialyltransferase gene and the pl,4-galactosyltransferase gene is shown in Fig. 5A. Both genes are of similar size (40 uersuS 50 kb) and are divided into 6 exons. In each gene, exon 6 contains the translation termination signal and all of the rather long 3'-untranslated region (2.8 kb). Although no amino acid homology exists between these two proteins, both of the glycosyltransferases are thought to share a common membrane topology (20). This common topology includes a short NH&erminal cytoplasmic domain, a transmembrane domain, a "stem" region, and a lumenal catalytic domain as shown in Fig. 5B. Notably, the predicted protein domains are distributed in a similar manner within each gene. Both the cytoplasmic, transmembrane, and stem domains are contained in one exon in each of these genes, while the catalytic domain is divided between a number of relatively small exons (30-80 amino acids). Although there is little correlation between exon structure and known protein domains within each gene, there are other domains of these proteins which have not been fully defined such as the nucleotide sugar binding pocket or acceptor substrate binding region. As more details of glycosyltransferase protein structure are discovered, new correlations between exon organization and protein domains may be drawn.
The @-galactoside a2,6-sialyltransferase and pl,4-galactosyltransferase genes differ markedly in their pattern of expression. As evidenced by Northern analysis, the @1,4-galactosyltransferase gene is expressed in similar levels in most all tissues examined (41). In keeping with this pattern of expression, the sequence of its 5'-flanking region is characteristic of a constitutive or "housekeeping" promoter since it contains a number of potential SPl binding sites and no TATA box motif (40). In contrast, the expression of the fl-galactoside Lu2,6-sialyltransferase gene is more complex, with a 4.7-kb message expressed in most tissues, and two tissue-specific messages, an abundant 4.3-kb mRNA in liver and a 3.6-kb mRNA in kidney (21).
Characterization of the promoter region responsible for the production of the 4.3-kb liver-specific mRNA revealed consensus binding sites for the liver restricted transcription factors HNF-1 and DBP (25,26). HNF-1 has been shown to be expressed primarily in the liver and kidney, and to a much lesser extent in the intestine and spleen (42). It has also been shown to bind and activate the promoters of a number of liver-specific genes (43,44). DBP has recently been cloned and shown by Western analysis to be present only in liver of the four tissues examined (26). Moreover, it has been shown that another cloned transcription factor, C/EBP, is able to bind the DBP binding site and activate transcription (38). C/ EBP, like HNF-1 and DBP, also exhibits a limited tissue distribution, with high levels detected in liver, fat, and placenta (45). Together, these factors have been shown to bind and confer liver specificity to the albumin promoter (26,37).
As shown in this report, the promoter of the 4.3-kb mRNA is differentially active in the cell lines tested, exhibiting 50fold higher activity in HepG2 cells, a hepatoma cell line known to express this gene, compared to CHO cells, a fibroblast cell line which is believed to lack expression of P-galactoside a2,6sialyltransferase (16,24). However, the activity of this promoter is still relatively low when compared to that of the SV40 promoter and may be due in part to relatively low levels of one or all of liver restricted transcription factors in the HepG2 cell line. Indeed, DBP has been shown to be absent in regenerating liver, while McKnight and colleagues (26,38) have shown that HepG2 cells contain much lower levels of C/ EBP than in uiuo hepatocytes. Studies are currently underway to access the relative contribution each of these transcription factors to the activity of this promoter, and to identify other factors which might also play a role in the regulation of this promoter.
Previous work has shown that expression of the P-galactoside cu2,6-sialyltransferase gene is inducible in a liver cell line (H4IIE) 3-4-fold by the addition of 1 pM dexamethasone (46).