Sequence of the cloned Escherichia coli K1 CMP-N-acetylneuraminic acid synthetase gene.

The Escherichia coli CMP-N-acetylneuraminic acid (CMP-NeuAc) synthetase gene is located on a 3.3-kilobase (kb) HindIII fragment of the plasmid pSR23 which contains the genes for K1 capsule production (Vann, W. F., Silver, R. P., Abeijon, C., Chang, K., Aaronson, W., Sutton, A., Finn, C. W., Lindner, W., and Kotsatos, M. (1987) J. Biol. Chem. 262, 17556-17562). The CMP-NeuAc synthetase gene expression was increased 10-30-fold by cloning of a 2.7-kb EcoRI-HindIII fragment onto the vector pKK223-3 containing the tac promoter. The complete nucleotide sequence of the gene encoding CMP-NeuAc synthetase was determined from progressive deletions generated by selective digestion of M13 clones containing the 2.7-kb fragment. CMP-NeuAc synthetase is located near the EcoRI site on this fragment as indicated by the detection of an open reading frame encoding a 49,000-dalton polypeptide. The amino- and carboxyl-terminal sequences of the encoded protein were confirmed by sequencing of peptides cleaved from both ends of the purified enzyme. The nucleotide deduced amino acid sequence was confirmed by sequencing several tryptic peptides of purified enzyme. The molecular weight is consistent with that determined from sodium dodecyl sulfate-gel electrophoresis. Gel filtration and ultracentrifugation experiments under nondenaturing conditions suggest that the enzyme is active as a 49,000-dalton monomer but may form aggregates.

The Escherichia coli CMP-N-acetylneuraminic acid (CMP-NeuAc) synthetase gene is located on a 3.3-kilobase (kb) HindIII fragment of the plasmid pSR23 which contains the genes for K1 capsule production (Vann, W. F., Silver, R. P., Abeijon, C., Chang, K., Aaronson, W., Sutton, A., Finn, C. W., Lindner, W., and Kotsatos, M. (1987) J. Biol. Chem. 262, 17556-17562). The CMP-NeuAc synthetase gene expression was increased 10-30-fold by cloning of a 2.7-kb EcoRI-Hind111 fragment onto the vector pKK223-3 containing the tac promoter. The complete nucleotide sequence of the gene encoding CMP-NeuAc synthetase was determined from progressive deletions generated by selective digestion of M I 3 clones containing the 2.7kb fragment. CMP-NeuAc synthetase is located near the EcoRI site on this fragment as indicated by the detection of an open reading frame encoding a 49,000dalton polypeptide. The amino-and carboxyl-terminal sequences of the encoded protein were confirmed by sequencing of peptides cleaved from both ends of the purified enzyme. The nucleotide deduced amino acid sequence was confirmed by sequencing several tryptic peptides of purified enzyme.
The molecular weight is consistent with that determined from sodium dodecyl sulfate-gel electrophoresis. Gel filtration and ultracentrifugation experiments under nondenaturing conditions suggest that the enzyme is active as a 49,000-dalton monomer but may form aggregates.
N-Acetylneuraminic acid is an important molecule in biological interactions. Encapsulated strains of Escherichia coli producing an a(2-€4)-linked polysaccharide of N-acetylneuraminic acid (NeuAc) (1) are often associated with neonatal meningitis (2), septicemia, and childhood pyelonephritis (3,4). This capsular polysaccharide is a virulence factor. This polysaccharide is identical to the poly-N-acetylneuraminic attached to some eucaryotic proteins such as the neural cell adhesion protein (N-Cam) (5,6).
The poly-N-acetylneuraminic capsular polysaccharides of E. coli are formed by the activation and transfer of NeuAc as * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

505023.
§ CMP-NeuAc to acceptors to form extracellular sialyl polymers (7). The activation of NeuAc by CMP-NeuAc synthetase: CTP + NeuAc -+ CMP-NeuAc + PPi has been detected in mammalian tissues as well as in bacteria (8,9). The product of this reaction is essential for the formation of sialylated glycoconjugates which play essential roles in numerous biological processes such as cell recognition, viral infection, and toxin binding (10). The enzyme isolated from mammalian sources has been used to synthesize CMP-NeuAc derivatives as substrates for oligosaccharide specific sialyltransferases (11,12,13).
The genes for the production of the E. coli K1 capsule have been cloned from the E. coli genome onto the plasmid pSR23 (14). The gene encoding CMP-NeuAc synthetase, a 50,000dalton polypeptide, has been located on a 3.3-kb' HindIII fragment of this plasmid (15). The enzyme has been purified to homogeneity and a partial amino-terminal amino acid sequence determined (15). The present study reports the subcloning of the DNA fragment encoding CMP-NeuAc synthetase into the expression vector pKK223-3, the nucleotide sequence of the fragment that encodes CMP-NeuAc synthetase, and confirms the DNA sequence using protein sequencing methods.

MATERIALS AND METHODS
Plasmids and Strains-The plasmid pSR35 containing the gene encoding CMP-NeuAc synthetase has been described (14, 15). The plasmid vectors m13mp18, m13mp19, and pKK223-3 and the host JM105 were obtained from Pharmacia LKB Biotechnology Inc.
Plasmid DNA Technique-Plasmid DNA was purified by cesium chloride-ethidium bromide density gradient centrifugation as described by Maniatis (16). Restriction endonuclease digestions were performed with premixed reagents according to protocols supplied by Bethesda Research Laboratories for EcoRI, BamHI, HindIII, and HincII, Boehringer Mannheim for KpnI and United States Biochemical Corp. for PstI. Electrophoresis of DNA was carried out on vertical 0.7% agarose gels as described by Davis et al. (17), using bacteriophage X DNA standards as molecular weight markers. DNA transformation into E. coli strain JM105 was carried out as described by Dagert and Ehrlich (18). Promega's Erase-A-Base system was used for construction of M13 subclones as described by Henikoff (19).
Cloning Strategy for Unidirectional Deletions with Exonuclease ZIZ-The plasmid pSR35 ( Fig. 1) was digested with EcoRI and the overhang at the cleavage site filled by treatment with Klenow DNA polymerase and deoxynucleotide triphosphates (16,17). The resulting DNA was digested with HindIII to yield a 2.7-kb fragment with a 5' overhang at the cleavage site. The 2.7-kb fragment was isolated by electrophoresis in low melting point agarose (Seaplaque, FMC Bio-' The abbreviations used are: kb, kilobase(s); HPLC, high performance liquid chromatography; DTT, dithiothreitol; CMP-KDO synthetase, CMP-ketodeoxyoctonic acid synthetase; FPLC, fast protein liquid chromatography. products, Rockland, ME) and ligated into HindIII-HincII-digested M13mp19. The resulting clone pWG4 contains the 2.7-kb fragment oriented in such a way that a collection of unidirectional deletions can be created with exonuclease 111 in the DNA region coding for CMP-NeuAc synthetase (Fig. 1). A similar strategy was used to create the vector pWG5. Vector pWG4 was digested with HindIII and the resulting overhang filled in with deoxynucleotide triphosphates by Klenow DNA polymerase. The 2.7-kb fragment was then excised by EcoRI digestion, isolated by low melting point agarose, and ligated into EcoRI-SmaI-digested M13mp19. The resulting vector pWG5 ( Fig. 1) was used to generate unidirectional deletions in the opposite direction.
Preparation of Unidirectional Exonuclease Clones-The plasmid pWG4 was ethanol-precipitated, dried, redissolved, and digested with KpnI. The reaction mixture was phenol-extracted and purified by gel filtration. A second digestion was performed with BamHI, followed by phenol extraction and ethanol precipitation. The double-digested protein was then subjected to exonuclease 111 digestion as described in Promega's Erase-a-Base system. A similar procedure was used to digest the pWG5 plasmid but PstI and HincII were the enzymes used to prepare pWG5 for Ex0111 digestion.
DNA Sequencing and Peptide Sequencing-Nucleotide sequencing was performed by the procedure of Sanger et al. (20) using Sequenase (United States Biochemical Corp.). The polyacrylamide-urea gel electrophoresis system described by Biggen et al. (18) was used for the separation of DNA fragments. Data were analyzed by the DNA analysis programs of Staden on a MicroVax minicomputer. The sequence of purified peptides was determined by using an Applied Biosystems, Inc. automated gas phase sequencer according to the procedures of the manufacturer.
Construction ofp WAl-Plasmid pSR35 was enzymatically digested with EcoRI and HindIII. The resulting 2.7-kb fragment containing CMP NeuAc synthetase and P7 was purified by electrophoresis in digested with EcoRI and HindIII. The 2.7-kb fragment and the low melting point agarose. The expression vector pKK223-3 was also pKK223-3 plasmid were ligated overnight and the new plasmid pWAl used to transform E. coli JM105 cells.
Enzyme Purification-CMP-NeuAc synthetase was purified by ion exchange and affinity chromatography techniques as described previously (15). Isolation of enzyme from reaction mixtures and exchange of buffers was achieved by HPLC using a Superose 12 gel permeation column (Pharmacia), equilibrated with 0.05 M ammonium bicarbonate. The fractions containing CMP-NeuAc synthetase peak were lyophilized and used for the preparation of peptides.
Cyanogen Bromide Cleauage-The purified protein was dissolved in 100 pl of a 70% formic acid and cleavage was initiated by adding 20 pl of CNBr (9 g/ml) in acetonitrile. The mixture was sealed under nitrogen and placed at 37 "C overnight. The reaction was terminated by adding 2 ml of water and lyophilized. The sample was redissolved in 0.1% trifluoroacetic acid/water mixture and the peptide fragments separated by reversed-phase HPLC chromatography on a Waters Nova-pak (2-18 column (8 mm X 10 cm) using a linear gradient from 0 to 70% acetonitrile (1 ml/min, 90 min, in 0.1% trifluoroacetic acid).
Hydroxylamine Cleauage-Hydroxylamine cleavage reagent was prepared by mixing 3.7 M hydroxylamine HCI, 1 M KzC03, and 50% NaOH in the ratio 3:2:1, v/v/v, and adjusted to pH 10.5. The purified protein (500 pg) was dissolved in 250 pl of water and 250 pl of hydroxylamine cleavage reagent. The mixture was incubated at 37 "C overnight and the reaction stopped by freezing at -20 "C. The peptides were separated on an Aquapore RP-300 C-8 column (2.1 mm X 10 cm) using a linear gradient from 10 to 45% acetonitrile (1 ml/min, 90 min, in 0.1% trifluoroacetic acid).
Acetylation and Performic Acid Oxidation-Lyophilized CMP-NeuAc synthetase (1 mg) was dissolved in 100 pl of water and 200 p1 of pyridine and acetylated with 50 p1 of acetic anhydride at room temperature overnight. Unreacted reagents were removed by rotatory evaporation. The dried sample was oxidized as described by Moore (22): a performic acid solution was made by mixing 1 ml of 30% HzOz with 9 ml of formic acid. The mixture was kept at room temperature for 1 h and 500 pl of the mixture was added to the dried protein followed by incubation for 5 h. The sample was then evaporated to dryness using a rotatory evaporator.
Trypsin Cleauage-Trypsin (50 pg) was added to acetylated and performic acid oxidized CMP-NeuAc synthetase (1 mg) dissolved in 500 pl of 1% ammonium bicarbonate and the solution incubated overnight at 37 "C. The reaction was stopped by freezing the solution at -20 "C. The tryptic peptides were separated by HPLC on a Vydac C-18 column (0.21 X 25 cm) using a linear gradient from 0 to 70% acetonitrile (1 ml/min, 90 min, in 0.1% trifluoroacetic acid). The peptides were collected and rechromatographed on a Zorbax RP1 C-8 reversed-phase column using a linear gradient from 13 to 45% acetonitrile (1 ml/min, for 60 min, in 0.1% trifluoroacetic acid and 0.1% morpholine).
Gel Permeation Molecular Weight Determination-A Superose 12 gel permeation column (1 X 30 cm) from Pharmacia was used to determine the apparent molecular weight of CMP-NeuAc synthetase on two different buffers: ( a ) 0.05 M Tris, 0.3 M NaCl, 1 mM DTT, 10% glycerol, pH 7.6, and (b) 0.1 M Tris, 1 mM DTT, 10 mM MgC12, pH 9.0. The column was equilibrated for 1 h at a flow rate of 1 ml/ min and calibrated with commercial gel permeation standards (Bio-Rad) equilibrated in the appropriate buffer. CMP-NeuAc synthetase was concentrated and equilibrated in buffer using Centricon-10 filters.
CMP-NeuAc Synthetase Assay-The enzyme was assayed for the formation of CMP-NeuAc by the thiobarbituric acid assay as described (15).

RESULTS AND DISCUSSION
The gene encoding CMP-NeuAc synthetase is located on the plasmid pSR35 (14, 15). In order to determine its nucleotide sequence, a 2.7-kb HindIII-EcoRI fragment of pSR35 containing the CMP-NeuAc synthetase gene was subcloned into the multiple cloning sites of the sequencing vectors M13mp18 and M13mp19. These plasmids pWG4 and pWG5 (Fig. 1)  region 0-800 bp and 850-1300 bp. A primer derived from the sequence at 664-680 was synthesized to bridge the gap. Subsequently, two deletions of pWG5 were isolated and sequenced to confirm the overlap. Inspection of Fig. 2 indicates sufficient overlap of sequence of both strands with the exception of the region between 1100 and 1200 bp. The correctness of sequence in this region was confirmed by automated Edman degradation of peptides derived from the purified enzyme as described below. Sequence Analysis-An open reading frame was found between nucleotide 24 and nucleotide 1269. The initiation codon for CMP-NeuAc synthetase is an AUG codon preceded 4 bases upstream by a Shine-Dalgarno (AGGGGGA) sequence (24). Codon preference and the previously reported aminoterminal sequence suggested that this open reading frame codes for CMP-NeuAc synthetase (Fig. 3).
Translation of the nucleotide sequence predicts that CMP-NeuAc consists of 418 amino acids and has a molecular mass of 48,621 daltons and an isoelectric point at pH 5.94. These values agree with those from the purified protein (15). The amino acid composition obtained from the DNA sequence agrees well with the amino acid analysis of purified CMP-NeuAc synthetase. As was determined from the purified protein, CMP-NeuAc synthetase has 5 methionine residues. Interestingly, only 2 cysteine residues are present.
Confirmation of the DNA Sequence by Protein Sequence-A partial amino-terminal sequence (20 residues) of purified CMP-NeuAc synthetase was previously determined (15). The amino-terminal and downstream DNA sequence was confirmed from trypsin-and cyanogen bromide-generated peptides. The peptides were fractionated by HPLC and sequenced. The sequences of the peptide are shown in Table I.
The position of these peptides with respect to the DNA sequence is shown in Fig. 2. The beginning of the nucleotide sequence was confirmed by analysis of tryptic peptide TI. The agreement of the sequence determined for peptides T2, T3, T4, and CN2 and the predicted amino acid sequence confirms a significant portion the nucleotide sequence. Comparison of the sequence of the tryptic peptides Tl-T4 with the predicted amino acid sequence agrees with the presence of arginine at positions 2,193,242, and 358. The peptide CN2 co-purified with a minor component, CN1. Automated Edman degradation revealed that CN1 originated from the carboxyl terminus predicted by the nucleotide sequence and is also predicted to have a molecular weight and a PI similar to those of CN2. The methionine residues at positions 25, 200, and 393 were located by comparison of the sequence of peptides CN1, CN2, and T2 (Table I) with the predicted amino acid sequence.
In a separate experiment the carboxyl-terminal sequence of CMP-NeuAc synthetase was confirmed using hydroxylamine to chemically cleave asparaginyl-glycine bonds. There are only two potential cleavage sites for hydroxylamine cleavage in CMP-NeuAc synthetase as predicted by the nucleotide derived amino acid sequence (arrows in Fig. 3). Cleavage at these two sites (Asn-405 near the carboxyl terminus and Asn-173 near the middle of the sequence) should yield a 15-amino acid carboxyl-terminal peptide in addition to two larger peptides. CMP-NeuAc synthetase was treated with hydroxylamine and the resulting peptides fractionated by HPLC. The sequence of the low M, peptide was determined by automated Edman degradation to be GYTVLENEIAEIVK. Th' 1s sequence is identical to residues 406-418 in the predicted amino acid sequence in Fig. 3 and confirms the presence of a lysine as the carboxyl-terminal amino acid in CMP-NeuAc synthetase.
Comparison of Predicted Amino Acid Sequences of CMP-NeuAc Synthetase and GMP-KDO Synthetase-The amino acid and nucleotide sequences determined above (Fig. 3) were used to search the September 1988 National Biomedical Research Foundation and November 1988 GenBank data bases for homologous sequences. The best fit obtained by the Lipman-Pearson algorithm (25) was with the translated nucleotide sequence of the E. coli kdsB gene product (26), CMP-KDO synthetase with 19.2% homology. An alignment was made using the program GAP and suggests a 40% similarity of amino acids allowing five gaps. Both CMP-NeuAc and CMP-KDO synthetase catalyze the transfer of an a-keto acid to form a cytidine-monophosphate sugar nucleotide. As illustrated in Fig. 4, several regions of homology were observed. The regions of strongest homology occurred at residues 8-12 (IIPAR) as reported previously and at residues 46-55 (EK-VIVTTDSE) at the amino terminus of both proteins. The region of homology observed with CMP-KDO synthetase and elongation factor Tu (26) was not observed with the sequence of CMP-NeuAc synthetase. No significant homology was detected between the protein sequences of E. coli CMP-NeuAc synthetase and rat liver 0-galactosyl a2,6-sialyltransferase (27). Calculations as described by Chou and Fasman (28) and Gamier et al. (29) were used to predict secondary structure of the two proteins. The CMP-NeuAc synthetase is predicted to have 47% a helix and 36% /3 structures; and the CMP-KDO synthetase is predicted to have 54% cy helix and 31% p structures. Both proteins are predicted to have 0 structures between residues 3 and 12 and cy helix near residue 40. These data suggest that common functional residues may occur in the amino-terminal regions of these proteins.
Native Molecular Weight-The polypeptide molecular Me~~gThrLYsIleIleRlaIleIleProAlaArgSe~lySerLysGlyLeuArgAsnLysAsnAla  ~T~T  A  A  T  G  A  G  A  A  C  A  A  A  A  A  T  T  A  T~G  T  G  10  20  30  40  50  60  70  80  90 FIG. 3. Nucleotide sequence of the CMP-NeuAc synthetase gene, and translated amino acid sequence.

T T A W L C C A T T A G R~T T A C P C G A C A C T G T C m T m G A C C T
4 60 470 480 4 90 500 510 520 530 540

A T G C A G C T A T A r n A T A G C P A A T A A G C A G C A T T A T C P P W L T A
550 560 570 580 590 600 610 620 630 640  650  660  670  680  690  700  710 720  Peptides were isolated from digests of a purified sample of CMP-NeuAc synthetase using two reversed-phase columns as described under "Materials and Methods." The isolated peptides were then sequenced using an Applied Biosystems, Inc. gas sequencer.

PeDtide
Seauence Method   T1  TKIIAIIPAR  Trypsin  T2  YSLAYIMDKESSLDIDDR  Trypsin  T3  NEFDSVSDITL  Typsin  T4  IINDLNSYLR  Typsin  H1  GYTVLENEIAEIVK  Hydroxylamine  CN1  YTYDGXXFNSNGXXVLEN  CnBr  CN2 LIDKFLLAYXIEAALQ CnBr weight is known from the amino acid sequence. The native molecular weight of the CMP-NeuAc synthetase was therefore estimated in order to determine the enzyme subunit structure. The apparent molecular mass of the major activity fraction in a partially purified preparation was determined to be 65,000 daltons on Sephacryl S-300 gel filtration. Purified CMP-NeuAc synthetase was determined to have a molecular weight of 65,000-80,000 on the FPLC column Superose 12 in nondenaturing buffers (data not shown). Sedimentation equilib-rium was used as an independent method for determining molecular weight ( Table 11). The partial specific volume was estimated from the amino acid composition predicted by the sequence (30). The enzyme was dialyzed against nondenaturing buffers and centrifuged. A major species with an M, of 43,000 in 10% glycerol, 0.3 M NaCl at pH 7.6 and 49,000 in 20 mM M$+ and 1 M ammonium sulfate, pH 9.0, was observed.
The lower molecular weight in the glycerol buffer might be explained by preferential solvation effects (31, 32). A dimer at 92,500 was observed only in 20 mM M$+ at pH 9.0. Enzyme subjected to ultracentrifugation was active after centrifugation. The molecular weight of the enzyme, after treatment with buffers used for ultracentrifugation, was estimated by gel filtration on Superose 12 in 20 mM M$+. A single peak of activity and protein was observed with a molecular mass of 70,000-80,000 daltons (Table 11). Varying amounts of high molecular mass aggregates (>400,000 daltons) were observed in the ultracentrifuge in all buffers. We interpret these results to mean that CMP-NeuAc synthetase is active as M, 49,000 monomer and may also exist as a dimer or a higher aggregate in some buffers. The cause of this aggregation was not investigated.
Overproduction of CMP-Newlc Synthetase-The gene cod-     Expression of CMP-NeuAc synthetase A soluble fraction was prepared from overnight cultures grown in tryptic soy broth containing 500 pg/ml ampicillin (except RS218) and assaved as described (15).

Strain
Activity ing for CMP-NeuAc synthetase was subcloned from the plasmid pSR35 into the EcoRI-Hind111 sites of the expression vector pKK223-3 (Fig. 1). The resulting plasmid designated pWAl contains the (tuc) promoter, the 2.7-kb EcoRI-Hind111 insert, and a strong ribosomal transcription terminator. Uninduced enzyme activity levels in cell lysates of the lac Iq host E. coli JM105 harboring the plasmid pWAl were 10-30-fold higher than in cell lysates of either pSR35 containing strains (data not shown) or the parent E. coli strain RS218 containing the chromosomal gene (Table 111). Induction with isopropylthiogalactopyranoside only slightly increased the production of synthetase. Similarly, transfer of the plasmid pWAl to a host lacking the lac Iq gene did not cause a dramatic change in enzyme levels (Table 111). Since production of CMP-NeuAc synthetase at these levels caused only a slight decrease in total cell mass and did not require isopropylthiogalactopyranoside, uninduced cultures of JM105:pWAl were used for preparation of purified enzyme. The lack of a large difference in enzyme levels between induced and uninduced cultures may reflect the low number of repressor molecules relative to the high copy number of the plasmid as was observed in the attenuated expression of phosphotidyl glycerophosphate phosphatase (33).
A considerable body of evidence has been accumulated on the importance of sialylated glycoconjugates in biological phenomena. Changes in sialylated oligosaccharides have been implicated as a marker for malignant transformation (34). Although CMP-NeuAc is commercially available, it is expensive. Because of the difficulty of synthesis of CMP glycosides of NeuAc and their potential utility in the preparation of oligosaccharide receptors, inhibitors, and specific oligosaccharide labels (35) there is interest in the enzymatic synthesis of the sugar nucleotide. Analogs and 0-acetylated derivatives of NeuAc (11,36) have been activated to CMP-NeuAc by preparations of bovine CMP-NeuAc synthetase. Research on the biological roles of sialylation should be facilitated by the availability of high levels of CMP-NeuAc synthetase from a readily available source such as E. coli.